Search Results for author: Jiajun Song

Found 6 papers, 6 papers with code

Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering

1 code implementation • 21 May 2024 • Hiba Maryam, Ling Fu, Jiajun Song, Tajrian ABM Shafayet, Qidi Luo, Xiang Bai, Yuliang Liu

The development of Urdu scene text detection, recognition, and Visual Question Answering (VQA) technologies is crucial for advancing accessibility, information retrieval, and linguistic diversity in digital content, facilitating better understanding and interaction with Urdu-language visual data.

Information Retrieval Question Answering +4

Paper
Code

Synthesizing Knowledge-enhanced Features for Real-world Zero-shot Food Detection

1 code implementation • 14 Feb 2024 • Pengfei Zhou, Weiqing Min, Jiajun Song, Yang Zhang, Shuqiang Jiang

The complexity of food semantic attributes further makes it more difficult for current ZSD methods to distinguish various food categories.

Attribute Generalized Zero-Shot Object Detection +2

Paper
Code

SeeDS: Semantic Separable Diffusion Synthesizer for Zero-shot Food Detection

1 code implementation • 7 Oct 2023 • Pengfei Zhou, Weiqing Min, Yang Zhang, Jiajun Song, Ying Jin, Shuqiang Jiang

To tackle this, we propose the Semantic Separable Diffusion Synthesizer (SeeDS) framework for Zero-Shot Food Detection (ZSFD).

Ranked #1 on Generalized Zero-Shot Object Detection on MS-COCO

Denoising Food recommendation +3

Paper
Code

Uncovering hidden geometry in Transformers via disentangling position and context

1 code implementation • 7 Oct 2023 • Jiajun Song, Yiqiao Zhong

Given embedding vector $\boldsymbol{h}_{c, t} \in \mathbb{R}^d$ at sequence position $t \le T$ in a sequence (or context) $c \le C$, extracting the mean effects yields the decomposition \[ \boldsymbol{h}_{c, t} = \boldsymbol{\mu} + \mathbf{pos}_t + \mathbf{ctx}_c + \mathbf{resid}_{c, t} \] where $\boldsymbol{\mu}$ is the global mean vector, $\mathbf{pos}_t$ and $\mathbf{ctx}_c$ are the mean vectors across contexts and across positions respectively, and $\mathbf{resid}_{c, t}$ is the residual vector.

Dictionary Learning POS +1