LLMs in the Loop: Leveraging Large Language Model Annotations for Active Learning in Low-Resource Languages
arxiv(2024)
摘要
Low-resource languages face significant barriers in AI development due to
limited linguistic resources and expertise for data labeling, rendering them
rare and costly. The scarcity of data and the absence of preexisting tools
exacerbate these challenges, especially since these languages may not be
adequately represented in various NLP datasets. To address this gap, we propose
leveraging the potential of LLMs in the active learning loop for data
annotation. Initially, we conduct evaluations to assess inter-annotator
agreement and consistency, facilitating the selection of a suitable LLM
annotator. The chosen annotator is then integrated into a training loop for a
classifier using an active learning paradigm, minimizing the amount of queried
data required. Empirical evaluations, notably employing GPT-4-Turbo,
demonstrate near-state-of-the-art performance with significantly reduced data
requirements, as indicated by estimated potential cost savings of at least
42.45 times compared to human annotation. Our proposed solution shows promising
potential to substantially reduce both the monetary and computational costs
associated with automation in low-resource settings. By bridging the gap
between low-resource languages and AI, this approach fosters broader inclusion
and shows the potential to enable automation across diverse linguistic
landscapes.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要