Recently, a paper co-authored by Professor Zhewei Wei's team from Gaoling School of Artificial Intelligence, Renmin University of China, Professor Zengfeng Huang from Fudan University, and Dr. Feifei Li from Alibaba Group has been accepted by the prestigious VLDB 2024 conference. Professor Zhewei Wei is the corresponding author of the paper, with his advised MPhil student Hanyan Yin and Ph.D students Dongxie Wen and Jiajun Li serving as the first, second, and third authors, respectively. The VLDB (International Conference on Very Large Data Bases) conference is one of the top three international academic conferences in the field of data management and databases, and is recommended as an A-level international conference by the China Computer Federation (CCF). VLDB 2024 will be held in Guangzhou from August 26 to 30, 2024.
Paper Title: Optimal Matrix Sketching over Sliding Windows
Authors: Hanyan Yin, Dongxie Wen, Jiajun Li, Zhewei Wei, Xiao Zhang, Zengfeng Huang, Feifei Li
Corresponding Author: Zhewei Wei
Paper Abstract: Matrix sketching, aimed at approximating a matrix A consisting of vector streams of length N with a smaller sketching matrix B has garnered increasing attention in fields such as large-scale data analytics and machine learning. A well-known deterministic matrix sketching method is the Frequent Directions algorithm, which achieves the optimal space bound and provides a covariance error guarantee of . The matrix sketching problem becomes particularly interesting in the context of sliding windows, where the goal is to approximate the matrix , formed by input vectors over the most recent N time units. However, despite recent efforts, whether achieving the optimal space bound on sliding windows is possible has remained an open question.
In this paper, we introduce the DS-FD algorithm, which achieves the optimal space bound for matrix sketching over row-normalized, sequence-based sliding windows. We also present matching upper and lower space bounds for time-based and unnormalized sliding windows, demonstrating the generality and optimality of DS-FD across various sliding window models. This conclusively answers the open question regarding the optimal space bound for matrix sketching over sliding windows. Furthermore, we conduct extensive experiments with both synthetic and real-world datasets, validating our theoretical claims and thus confirming the correctness and effectiveness of our algorithm, both theoretically and empirically.
Arxiv Link: https://arxiv.org/abs/2405.07792