您所在的位置: 首页- 新闻公告- 学术讲座-

学术讲座

BDAI重点实验室研究生沙龙第17期:Social Network Inspired Long Document Modeling
日期:2021-12-07访问量:

大数据管理与分析方法研究北京市重点实验室(BDAI)研究生沙龙由中国人民大学高瓴人工智能智能学院与信息学院联合定期举行,本周BDAI重点实验室研讨会由信息学院博士生周雨佳、胡安文分别介绍各自的研究工作。欢迎同学们积极参与研讨!

BDAI第17期.png

讲座题目:Social Network Inspired Long Document Modeling

报告人简介:周雨佳,博士三年级

导师:窦志成

研究方向:信息检索,个性化搜索

Abstract:Utilizing pre-trained language models such as BERT has achieved great success for neural document ranking in Information Retrieval. Limited by the computational and memory requirements, long document modeling becomes a critical issue. Recent works propose to modify the full attention matrix in Transformer by designing sparse attention patterns. However, most of them only focus on local connections of terms within a fixed-size window to model semantic dependencies. How to build suitable remote connections between terms to better model document representation remains underexplored. In this paper, we propose the model Socialformer, which introduces the characteristics of social networks into designing sparse attention patterns for long document modeling in document ranking. Specifically, we consider two document-standalone and two query-aware patterns to construct a graph like social networks. Endowed with the characteristic of social networks, most pairs of nodes in such a graph can reach with a short path while ensuring the sparsity. To facilitate efficient calculation, we segment the graph into multiple subgraphs to simulate friend circles in social scenarios. This pruning allows us to implement a two-stage information transmission model with the transformer encoder. Experimental results on two document ranking benchmarks confirm the effectiveness of our model on long document modeling.

讲座题目:Question-controlled Text-aware Image Captioning

报告人简介:胡安文,博士三年级

导师:金琴

研究方向:多模态学习,图像描述生成

Abstract:For an image with multiple scene texts, different people may be interested in different text information. Current text-aware image captioning models are not able to generate distinctive captions according to various information needs. To explore how to generate personalized text-aware captions, we define a new challenging task, namely Question-controlled Text-aware Image Captioning (Qc-TextCap). With questions as control signals, this task requires models to understand questions, find related scene texts and describe them together with objects fluently in human language. Based on two existing text-aware captioning datasets, we automatically construct two datasets, ControlTextCaps and ControlVizWiz to support the task. We propose a novel Geometry and Question Aware Model (GQAM). GQAM first applies a Geometry-informed Visual Encoder to fuse region-level object features and region-level scene text features with considering spatial relationships. Then, we design a Question-guided Encoder to select the most relevant visual features for each question. Finally, GQAM generates a personalized text-aware caption with a Multimodal Decoder. Our model achieves better captioning performance and question answering ability than carefully designed baselines on both two datasets. With questions as control signals, our model generates more informative and diverse captions than the state-of-the-art text-aware captioning model. Our code and datasets are publicly available at https://github.com/HAWLYQ/Qc-TextCap.

检测到您当前使用浏览器版本过于老旧,会导致无法正常浏览网站;请您使用电脑里的其他浏览器如:360、QQ、搜狗浏览器的速模式浏览,或者使用谷歌、火狐等浏览器。

下载Firefox