讲座时间:2024年7月1日(周一)10:00-11:30
讲座地点:立德楼1826报告厅
主讲人:Charles Clarke, University of Waterloo, Canada
报告题目:A Comparision of Methods for Evaluating Generative IR
报告摘要:Unlike traditional information retrieval systems, Retrieval Augmented Generation (RAG) systems, and other Generative IR (Gen-IR) systems, may not respond to queries with items from a fixed collection of documents or passages. The response to a query may be entirely new text. Since traditional IR evaluation methods break down under this model, I will explore various methods that extend traditional offline evaluation approaches to the Gen-IR context. Offline IR evaluation traditionally employs paid human assessors, but increasingly LLMs are replacing human assessment, demonstrating capabilities similar or superior to crowdsourced labels. Given that Gen-IR systems do not generate responses from a fixed set, we can assume that methods for Gen-IR evaluation must largely depend on LLM-generated labels. Along with methods based on binary and graded relevance, I will discuss methods based on explicit subtopics, pairwise preferences, and embeddings. We first validate these methods against human assessments on several TREC Deep Learning Track tasks; we then apply these methods to evaluate the output of several purely generative systems. For each method we consider both its ability to act autonomously, without the need for human labels or other input, and its ability to support human auditing. To trust these methods, we must be assured that their results align with human assessments. To do so, evaluation criteria must be transparent, so that outcomes can be audited by human assessors.
嘉宾介绍:Charles Clarke is a Professor in the School of Computer Science and an Associate Dean for Innovation and Entrepreneurship at the University of Waterloo, Canada. His research focuses on data intensive tasks involving human language data, including search, ranking, and question answering. Clarke is an ACM Distinguished Scientist and leading member of the search and information retrieval community. From 2013 to 2016 he served as the Chair of the Executive Committee for the ACM Special Interest Group on Information Retrieval (SIGIR). From 2010-2018 he was Co-Editor-in-Chief of the Information Retrieval Journal. He was Program Co-Chair for the SIGIR main conference in 2007 and 2014, and he was elected to the SIGIR Academy in 2022. His research has been funded by Google, Microsoft, Meta, Spotify, and other companies and granting agencies. Along with Mark Smucker, he received the SIGIR 2012 Best Paper Award. Along with colleagues, he received the SIGIR 2019 Test of Time Award for their SIGIR 2008 paper on novelty and diversity in search. In 2006 he spent a sabbatical at Microsoft, where he was involved in the development of what is now the Bing search engine. From August 2016 to August 2018, while on leave from Waterloo, he was a Software Engineer at Meta, where he worked on metrics and ranking for Facebook Search. He is a co-author of the textbook Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, 2010.
检测到您当前使用浏览器版本过于老旧,会导致无法正常浏览网站;请您使用电脑里的其他浏览器如:360、QQ、搜狗浏览器的速模式浏览,或者使用谷歌、火狐等浏览器。
下载Firefox