BDAI重点实验室研究生沙龙第29期：Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models

新闻公告

您所在的位置：首页- 新闻公告- 学术讲座-

BDAI重点实验室研究生沙龙第29期：Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models

日期：2022-05-31访问量：

大数据管理与分析方法研究北京市重点实验室（BDAI）研究生沙龙由中国人民大学高瓴人工智能学院师生组织定期举行。6月1日研讨会由学院博士后高泽峰和准聘助理教授陈旭老师指导的学生王振磊介绍自己的研究工作。欢迎同学们积极参与研讨！

BDAI 299(1).jpg

报告标题：Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models

报告人：高泽峰，高瓴人工智能学院，博士后

研究方向：模型压缩、数据压缩、预训练语言模型

报告摘要：The state-of-the-art Mixture-of-Experts (short as MoE) architecture has achieved several remarkable successes in terms of increasing model capacity. However, MoE has hindered widespread adoption due to complexity, memory consumption, and training instability. Here we propose to construct a novel parameter-efficient MoE architecture by sharing information from different experts. Specifically, we use matrix product operators (MPO, a tensor decomposition from quantum many-body physics) to reconstruct the parameter matrix in the expert layer, and increase model capacity for pre-trained language models by sharing the central tensor (containing the core information) among different experts and keeping the auxiliary tensor (complementing the central tensor) of different experts. We also design the gradient mask strategy for the tensor structure of MPO to alleviate the overfitting problem. Extensive experiments based on T5 and GPT show improved performance and efficiency in increasing pre-trained language model capacity (27.2x fewer parameters for the comparable model performance, compared with the Switch Transformers). We additionally demonstrate an improvement in the positive transfer effects of our approach for multi-task learning.

报告标题：Unbiased Sequential Recommendation with Latent Confounders

报告人：王振磊，博士二年级，导师：陈旭

研究方向：推荐系统，因果推断

报告摘要：Sequential recommendation holds the promise of understanding user preference by capturing successive behavior correlations. Existing research focus on designing different models for better fitting the offline datasets. However, the observational data may have been

contaminated by the exposure or selection biases, which renders the learned sequential models unreliable. In order to solve this fundamental problem, in this paper, we propose to reformulate the sequential recommendation task with the potential outcome framework, where we are able to clearly understand the data bias mechanism and correct it by re-weighting the training instances with the inverse propensity score (IPS). For more robustness modeling, a clipping strategy is applied to the IPS estimation to reduce the variance of the learning objective. To make our framework more practical, we design a parameterized model to remove the impact of the potential latent confounders. At last, we theoretically analyze the unbiasedness of the proposed framework under both vanilla and clipping IPS estimations. To the best of our knowledge, this is the first work on debiased sequential recommendation. We conduct extensive experiment based on both synthetic and real-world datasets to demonstrate the effectiveness of our framework.

新闻公告

学术讲座

BDAI重点实验室研究生沙龙第29期：Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models

友情链接

联系

关注我们