Are Large Language Models Reliable Argument Quality Annotators?
arxiv(2024)
摘要
Evaluating the quality of arguments is a crucial aspect of any system
leveraging argument mining. However, it is a challenge to obtain reliable and
consistent annotations regarding argument quality, as this usually requires
domain-specific expertise of the annotators. Even among experts, the assessment
of argument quality is often inconsistent due to the inherent subjectivity of
this task. In this paper, we study the potential of using state-of-the-art
large language models (LLMs) as proxies for argument quality annotators. To
assess the capability of LLMs in this regard, we analyze the agreement between
model, human expert, and human novice annotators based on an established
taxonomy of argument quality dimensions. Our findings highlight that LLMs can
produce consistent annotations, with a moderately high agreement with human
experts across most of the quality dimensions. Moreover, we show that using
LLMs as additional annotators can significantly improve the agreement between
annotators. These results suggest that LLMs can serve as a valuable tool for
automated argument quality assessment, thus streamlining and accelerating the
evaluation of large argument datasets.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要