时间:2023年8月12日14:00-16:00
地点:立德楼1826报告厅
#腾讯会议:760 885 285
报告一:Leveraging Conversation Context for Conversational Search
报告摘要:
Search interface is evolving from a single short query to more natural and interactive ones such as conversational interface. The most distinctive characteristic in conversational search is the dependency of the search intent on the past conversation history. The query should often be reformulated to incorporate the conversation context information. The existing literature has found that conversational search can be improved by query rewriting based on a generative language model, or by simply concatenating all the historical queries. In this talk, we will see that the conversational context is very noisy: some conversation turns are unrelated to the current query, thus should be discarded. We propose a selection process to incorporate only the related historical queries based on their potential usefulness. To this end, an automatic labeling approach is used to label the historical queries according to their impact on the retrieval effectiveness. A selection model is then trained. The experiments show that such a selection can improve the effectiveness of conversational search. This work also demonstrates the necessity of developing specific approaches for conversational IR.
主讲人:聂建云 加拿大蒙特利尔大学教授
主讲人简介:
Jian-Yun Nie is a professor at the Department of Computer Science and Operations Research, University of Montreal, and Canada research chair on natural language processing and applications. His research focuses on various problems of information retrieval and natural language processing, including information retrieval models, web search, cross-language information retrieval, recommendation systems, query suggestion, question answering and dialogue. Jian-Yun Nie has published over 250 papers in the main journals and conferences in IR and NLP. He is an associate editor of 4 journals. He has served as general chair, PC chair and local organization chair for SIGIR conferences, as well as for several other conferences and workshops. He regularly serves as senior PC members of major conferences such as SIGIR, CIKM, ACL, EMNLP, COLING, WWW. He received several best paper awards, including a Best paper award and a Test-of-Time honorable mention award from SIGIR. He is inducted to the ACM SIGIR Academy in 2022 for his contributions to the IR field.
报告二:Natural Language Processing for Materials Science
报告摘要:
In materials science, large amounts and heterogeneous data are being produced every day, such as scientific publications, lab reports, manuals, tables, and so on. Natural language processing (NLP) is therefore playing a key role in understanding and unlocking the rich datasets in materials science, especially for understanding scientific literature and extracting useful information from them. Capturing unstructured information from the vast and evergrowing number of scientific publications has substantial promise to enable the creation of experimental-based databases currently lacking and meet the various needs in the materials domain. However, directly applying NLP techniques developed in the general domain to the materials science domain cannot give us satisfactory performance in different tasks. The reasons include but are not limited to the following: i) the content and style of materials science literature are different from general domain texts such as news articles, which leads to degraded performance for NLP tasks; ii) understanding the literature requires significant in-domain expert knowledge; and iii) we lack high-quality and large-scale labeled training datasets for NLP tasks in the materials science domain. In this talk, we will introduce our recent works on NLP for materials science. We first present a natural language benchmark (MatSci-NLP) and study various BERT-based models based on it to understand the impact of pretraining strategies on understanding materials science text. Then we introduce an instruction-based process for trustworthy data curation in materials science (MatSci-Instruct), which we then apply to finetune a LLaMa-based language model (HoneyBee). MatSci-Instruct helps alleviate the scarcity of relevant, high-quality materials science textual data available in the open literature, and HoneyBee is the first billion-parameter language model specialized to materials science.
主讲人:刘邦 加拿大蒙特利尔大学助理教授
主讲人简介:
Bang Liu is an Assistant Professor in the Department of Computer Science and Operations Research (DIRO) at the University of Montreal. He is a core member of the RALI laboratory (Applied Research in Computer Linguistics) of DIRO, an associate member of Mila – Quebec Artificial Intelligence Institute, and a Canada CIFAR AI (CCAI) Chair. He received his B.Engr. degree in 2013 from University of Science and Technology of China (USTC), as well as his M.S. degree and Ph.D. degree from University of Alberta in 2015 and 2020, respectively. His research interests primarily lie in the areas of natural language processing, multimodal & embodied learning, theory and techniques for AGI (e.g., understanding and improving large language models), and AI for science (e.g., health, materials science, XR).
检测到您当前使用浏览器版本过于老旧,会导致无法正常浏览网站;请您使用电脑里的其他浏览器如:360、QQ、搜狗浏览器的速模式浏览,或者使用谷歌、火狐等浏览器。
下载Firefox