邀请人： 刘勇 中国人民大学高瓴人工智能学院准聘副教授
主讲人姓名： 王文冠 苏黎世理工学院博士后研究员
主讲人简介： Dr. Wenguan Wang is currently as a Fellow Scientist in ETH Zurich. His research interests include Semantic Segmentation, Video Analysis, Human-Centric Visual Understanding, and Embodied AI. He has published over 60 journal and conference papers such as TPAMI, TIP, TVCG, TCSVT, CVPR, ICCV, ECCV, AAAI, and Siggraph Asia, including one CVPR Best Paper Finalist, one CVPR workshop Best Paper, and 11 top-conference Oral papers. He also serves as Associate Editor for TCSVT and Neurocomputing, and Guest Managing Editor for Pattern Recognition. He has more than 8700 Google Scholar citations with a 42 H-index. He has won awards in 14 international academic competitions. He has obtained several honors including Elsevier Highly Cited Chinese Researchers 2020, World Artificial Intelligence Conference Youth Outstanding Paper Award 2020, China Association of Artificial Intelligence Doctoral Dissertation Award 2019, ACM China Doctoral Dissertation Award 2018, and Baidu Scholarship 2016.
报告题目： Rethinking Training Paradigm and Network Design in Semantic Segmentation
报告摘要： As a fundamental task in computer vision, semantic segmentation has achieved tremendous progress, driven by rapid evolution of segmentation network architectures (e.g., FCN, Transformer).
Modern segmentation approaches focus only on mining “local” context, i.e., dependencies between pixels within individual images, by specifically-designed, context aggregation modules (e.g., dilated convolution) or structure-aware optimization objectives (e.g., IoU-like loss). However, they ignore “global” context of the training data, i.e., rich semantic relations between pixels across different images. In this talk, we will introduce a pixel-wise metric learning paradigm for semantic segmentation, by explicitly exploring the structures of the whole training dataset.
Moreover, prevalent segmentation solutions, despite their different network designs (FCN based or Transformer based) and mask decoding strategies (parametric softmax based or pixel-query based), can be placed in one category, by considering the softmax weights or query vectors as learnable class prototypes. In light of this prototype view, I will discuss several fundamental limitations of such parametric segmentation regime, and introduce a nonparametric alternative based on non-learnable prototypes.