美国西北大学计算机系助理教授 Manling Li (Assistant Professor at Northwestern University)
Toward Factuality in Information Access: Multimodal Factual Knowledge Acquisition
Recent years witness great success in multimodal foundation models. However, although such models achieve decent scores on various benchmarks, we see that these models understand images as bags of words. In detail, they use object understanding as a shortcut but lacks ability to capture abstract semantics such as verbs. To learn physical world knowledge, we first categorize it according to its temporal dynamics (static -> dynamic) and by its horizon (short/fast thinking -> long/slow thinking). My research aims to bring this deep factual knowledge view to the multimodal world. Such a transformation poses significant challenges: (1) understanding multimodal semantic structures that are abstract (such as events and semantic roles of objects): I will present our solution of zero-shot cross-modal transfer, an effective way to inject event-level knowledge into vision-language foundation models; (2) understanding long-horizon temporal dynamics: I will introduce typical ways to handle long-horizon reasoning, which empower machines to capture complex temporal patterns. (3) After that, we will also briefly analyze the reason of hallucinations and the potential way to ensure factuality via knowledge-driven methods, with example applications like meeting summarization, timeline generation, and question answering. I will then lay out how I plan to promote factuality and truthfulness in multimodal information access, through a structured knowledge view that is easily explainable, highly compositional, and capable of long-horizon reasoning.
Manling Li is an Assistant Professor at Northwestern University (full-time starting at Fall 2024) and a postdoc at Stanford University. She obtained PhD degree in Computer Science at University of Illinois Urbana-Champaign in 2023. Her research interest lies in natural language processing, especially its interaction with multiple modalities including images, videos, speech and robotics. Her work on multimodal knowledge extraction won the ACL'20 Best Demo Paper Award, and the work on scientific information extraction from COVID literature won NAACL'21 Best Demo Paper Award. She was a recipient of Microsoft Research PhD Fellowship in 2021, an EE CS Rising Star in 2022, etc. She led 19 students to develop the UIUC information extraction system and ranked 1st in NIST SM-KBP evaluation in 2019 and 2020. She serves as Area Chair of ACL and EMNLP, and delivered tutorials about event-centric multimodal knowledge at ACL'21, AAAI'21, NAACL'22, CVPR'23, etc. Additional information is available at https://limanling.github.io/.