Two papers from GSAI accepted by CCF A-category Conference KDD

Date:2021-06-12 Visits:

Two papers from Gaoling School of Artificial Intelligence, Renmin University of China (GSAI) was accepted by ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), according to the schedule released on May 16th. Recommended by the China Computer Federation (CCF) as an A-category international conference, KDD is one of the top international conferences in the field of data mining. In 2021, KDD received a total of 1541 valid submissions, of which 238 papers were accepted, putting the acceptance rate at 15.44%.

Since January 2021, GSAI has published (including those being accepted) 49 papers in CCF A-category international journals or conferences, 8 papers in CCF B-category journals and conferences. Among them, 52 papers have GSAI students or faculties listed as their first or corresponding authors.

Paper introduction

Paper Title: Approximate Graph Propagation

Authors: Hanzhi Wang, Mingguo He, Zhewei Wei,Sibo Wang,Ye Yuan, Xiaoyong Du, Jirong Wen

Corresponding author: Zhewei Wei

Paper Overview:

Efficient computation of node proximity queries such as transition probabilities, Personalized PageRank, and Katz are of fundamental importance in various graph mining and learning tasks. In particular, several recent works leverage fast node proximity computation to improve the scalability of Graph Neural Networks (GNN). However, prior studies on proximity computation and GNN feature propagation are on a case-by-case basis, with each paper focusing on a particular proximity measure.

In this paper, we propose Approximate Graph Propagation (AGP), a unified randomized algorithm that computes various proximity queries and GNN feature propagations, including transition probabilities, Personalized PageRank, heat kernel PageRank, Katz, SGC, GDC, and APPNP. Our algorithm provides a theoretical bounded error guarantee and runs in almost optimal time complexity. We conduct an extensive experimental study to demonstrate AGP’s effectiveness in two concrete applications: local clustering with heat kernel PageRank and node classification with GNNs. Most notably, we present an empirical study on a billion-edge graph Papers100M, the largest publicly available GNN dataset so far. The results show that AGP can significantly improve various existing GNN models’ scalability without sacrificing prediction accuracy.

Paper Title: Data Poisoning Attack against Recommender System Using Incomplete and Perturbed Data

Authors: Hengtong Zhang, Changxin Tian, Yaliang Li, Lu Su, Jing Gao, Nan Yang, Xin Zhao

Paper Overview: 

Recent studies reveal that recommender systems are vulnerable to data poisoning attack due to their openness nature. In data poisoning attack, the attacker typically recruits a group of controlled users to inject well-crafted user-item interaction data into the recommendation model’s training set to modify the model parameters as desired. Thus, existing attack approaches usually require full access to the training data to infer items’ characteristics and craft the fake interactions for controlled users. However, such attack approaches may not be feasible in practice due to the attacker’s limited data collection capability and the restricted access to the training data, which sometimes are even perturbed by the privacy preserving mechanism of the service providers. Such design-reality gap may cause failure of attacks. In this paper, we fill the gap by proposing two novel adversarial attack approaches to handle the incompleteness and perturbations in user-item interaction data. First, we propose a bi-level optimization framework that incorporates a probabilistic generative model to find the users and items whose interaction data are sufficient and have not been significantly perturbed, and leverage these users and items’ data to craft fake user-item interactions. Moreover, we reverse the learning process of recommendation models and develop a simple yet effective approach that can incorporate context-specific heuristic rules to handle data incompleteness and perturbations. Extensive experiments on two datasets against three representative recommendation models show that the proposed approaches can achieve better attack performance than existing approaches.