no code implementations • 1 Mar 2024 • Sayak Ray Chowdhury, Anush Kini, Nagarajan Natarajan
Our experiments on IMDb sentiment generation and Anthropic's helpful-harmless dataset show that rDPO is robust to noise in preference labels compared to vanilla DPO and other heuristics proposed by practitioners.
no code implementations • 31 Oct 2023 • Daman Arora, Anush Kini, Sayak Ray Chowdhury, Nagarajan Natarajan, Gaurav Sinha, Amit Sharma
Given a query and a document corpus, the information retrieval (IR) task is to output a ranked list of relevant documents.