no code implementations • 13 Mar 2024 • Ben Athiwaratkun, Sujan Kumar Gonugondla, Sanjay Krishna Gouda, Haifeng Qian, Hantian Ding, Qing Sun, Jun Wang, Jiacheng Guo, Liangfu Chen, Parminder Bhatia, Ramesh Nallapati, Sudipta Sengupta, Bing Xiang
In our study, we present bifurcated attention, a method developed for language model inference in single-context batch sampling contexts.
no code implementations • 2 Jan 2024 • Fan Lyu, Wei Feng, Yuepan Li, Qing Sun, Fanhua Shang, Liang Wan, Liang Wang
The goal of Continual Learning (CL) is to continuously learn from new data streams and accomplish the corresponding tasks.
no code implementations • 12 Nov 2023 • Qing Sun, Shuai Niu, Minrui Fei
In this work, an innovative data-driven moving horizon state estimation is proposed for model dynamic-unknown systems based on Bayesian optimization.
no code implementations • 12 Nov 2023 • Shuai Niu, Qing Sun, Minrui Fei, Xuqian Ju
Deriving precise system dynamic models through traditional numerical methods is often a challenging endeavor.
no code implementations • 5 Jul 2023 • Prateek Yadav, Qing Sun, Hantian Ding, Xiaopeng Li, Dejiao Zhang, Ming Tan, Xiaofei Ma, Parminder Bhatia, Ramesh Nallapati, Murali Krishna Ramanathan, Mohit Bansal, Bing Xiang
Large-scale code generation models such as Codex and CodeT5 have achieved impressive performance.
no code implementations • 9 Mar 2023 • Xiaokai Wei, Sujan Gonugondla, Wasi Ahmad, Shiqi Wang, Baishakhi Ray, Haifeng Qian, Xiaopeng Li, Varun Kumar, Zijian Wang, Yuchen Tian, Qing Sun, Ben Athiwaratkun, Mingyue Shang, Murali Krishna Ramanathan, Parminder Bhatia, Bing Xiang
Such large models incur significant resource usage (in terms of memory, latency, and dollars) as well as carbon footprint.
no code implementations • ICCV 2023 • Fan Lyu, Qing Sun, Fanhua Shang, Liang Wan, Wei Feng
In Parallel Continual Learning (PCL), the parallel multiple tasks start and end training unpredictably, thus suffering from training conflict and catastrophic forgetting issues.
2 code implementations • 26 Oct 2022 • Ben Athiwaratkun, Sanjay Krishna Gouda, Zijian Wang, Xiaopeng Li, Yuchen Tian, Ming Tan, Wasi Uddin Ahmad, Shiqi Wang, Qing Sun, Mingyue Shang, Sujan Kumar Gonugondla, Hantian Ding, Varun Kumar, Nathan Fulton, Arash Farahani, Siddhartha Jain, Robert Giaquinto, Haifeng Qian, Murali Krishna Ramanathan, Ramesh Nallapati, Baishakhi Ray, Parminder Bhatia, Sudipta Sengupta, Dan Roth, Bing Xiang
Using these benchmarks, we are able to assess the performance of code generation models in a multi-lingual fashion, and discovered generalization ability of language models on out-of-domain languages, advantages of multi-lingual models over mono-lingual, the ability of few-shot prompting to teach the model new languages, and zero-shot translation abilities even on mono-lingual settings.
1 code implementation • 25 Sep 2022 • Qing Sun, Fan Lyu, Fanhua Shang, Wei Feng, Liang Wan
Continual Learning (CL) sequentially learns new tasks like human beings, with the goal to achieve better Stability (S, remembering past tasks) and Plasticity (P, adapting to new tasks).
1 code implementation • 13 Apr 2022 • Griffin Adams, Han-Chin Shing, Qing Sun, Christopher Winestock, Kathleen McKeown, Noémie Elhadad
In real-world scenarios with naturally occurring datasets, reference summaries are noisy and may contain information that cannot be inferred from the source text.
no code implementations • 29 Sep 2021 • Qing Sun
Deep neural networks have achieved impressive performance on a variety of domains.
no code implementations • 29 Sep 2021 • Qing Sun, Fan Lyu, Fanhua Shang, Wei Feng, Liang Wan
Traditionally, the primary goal of LL is to achieve the trade-off between the Stability (remembering past tasks) and Plasticity (adapting to new tasks).
no code implementations • Findings (ACL) 2021 • Qing Sun, Parminder Bhatia
Our gazetteer based fusion model is data efficient, achieving +1. 7 micro-F1 gains on the i2b2 dataset using 20% training data, and brings + 4. 7 micro-F1 gains on novel entity mentions never presented during training.
1 code implementation • EMNLP 2020 • Kristjan Arumae, Qing Sun, Parminder Bhatia
However, in order to achieve state-of-the-art performance on out of domain tasks such as clinical named entity recognition and relation extraction, additional in domain pre-training is required.
no code implementations • 23 Aug 2020 • Qing Sun, James Cross
In this paper, we provide an in-depth analysis of KL-divergence minimization in Forward and Backward orders, which shows that learners are reinforced via on-policy learning in Backward.
no code implementations • 25 Sep 2019 • Qing Sun, James Cross, Dmitriy Genzel
Sequence-to-sequence models such as transformers, which are now being used in a wide variety of NLP tasks, typically need to have very high capacity in order to perform well.
no code implementations • CVPR 2017 • Qing Sun, Stefan Lee, Dhruv Batra
We develop the first approximate inference algorithm for 1-Best (and M-Best) decoding in bidirectional neural sequence models by extending Beam Search (BS) to reason about both forward and backward time dependencies.
25 code implementations • 7 Oct 2016 • Ashwin K. Vijayakumar, Michael Cogswell, Ramprasath R. Selvaraju, Qing Sun, Stefan Lee, David Crandall, Dhruv Batra
We observe that our method consistently outperforms BS and previously proposed techniques for diverse decoding from neural sequence models.
no code implementations • NeurIPS 2015 • Qing Sun, Dhruv Batra
This paper formulates the search for a set of bounding boxes (as needed in object proposal generation) as a monotone submodular maximization problem over the space of all possible bounding boxes in an image.