Texts

Tasksource

Introduced by Sileo in tasksource: A Dataset Harmonization Framework for Streamlined NLP Multi-Task Learning and Evaluation

Huggingface Datasets is a great library, but it lacks standardization, and datasets require preprocessing work to be used interchangeably. tasksource automates this and facilitates reproducible multi-task learning scaling.

Each dataset is standardized to either MultipleChoice, Classification, or TokenClassification dataset with identical fields. We do not support generation tasks as they are addressed by promptsource. All implemented preprocessings are in tasks.py or tasks.md. A preprocessing is a function that accepts a dataset and returns the standardized dataset. Preprocessing code is concise and human-readable.

Homepage