Recently, two papers of professor Zhewei Wei were accepted by ACM Transactions on Database Systems (TODS). TODS is a top-tierinternational academic journalin the areas of data abstraction, data modeling, and designing data management systems. Topics include storage and retrieval, transaction management, distributed and federated databases, semantics of data, intelligent databasesand operations and algorithms relating to these areas. TODS received the A* ranking(i.e., the highest possible ranking) in the CORE rankings of computer science journals and conferences.The China Computer Federation (CCF) recommends TODS as a Category A (i.e., the top level) international academic journal. The number of papers accepted by TODS each year is no more than 30 worldwide.
Title: PersistentSummaries
Authors: Tianjing Zeng, Zhewei Wei, Ge Luo, Ke Yi, Xiaoyong Du, Ji-Rong Wen
Corresponding Author: Zhewei Wei
A persistent data structure, also known as a multi-version data structure in the database literature, is a data structure that preserves all its previous versions as it is updated over time. Every update (inserting, deleting, or changing a data record) to the data structure creates a new version, while all the versions are kept in the data structure so that any previous version can still be queried.
Persistent data structures aim at recording all versions accurately, which results in a space requirement that is at least linear to the number of updates. In many of today’s big data applications, in particular for high-speed streaming data, the volume and velocity of the data are so high and are hard to be stored completely. Therefore, streaming algorithms have received a lot of attention in the research community, which uses only sublinear space by sacrificing slightly on accuracy.
All streaming algorithms work by maintaining a small data structure in memory, which is usually called a sketch, summary, or synopsis. The summary is updated upon the arrival of every element in the stream, thus it is ephemeral, meaning that it can only answer queries about the current status of the stream. This paperaims at designing persistent summaries, thereby giving streaming algorithms the ability to answer queries about the stream at any prior time.
Title: Influence Maximization Revisited: Efficient Sampling with Bound Tightened.
Authors: Qintian Guo, Sibo Wang, Zhewei Wei, Wenqing Lin, Jing Tang
Given a social network, the Influence Maximization (IM) problem asks for k nodes such that the expected number of nodes influenced by the k nodes under Independent Cascade (IC) modelis maximized. A key phase in IM algorithms is the random Reverse Reachable (RR) set generation, and this phase significantly affects the efficiency and scalability of the state-of-the-art IM algorithms.
This paper presents a study on this key phase and proposes an efficient random RR set generation algorithm under IC model. With the new algorithm, the paper shows that the expected running time of existing IM algorithms under IC model can be improved. Moreover, existing approximate IM algorithms suffer from scalability issues in high influence networks where the size of random RR sets is usually quite large. The paper tackles this challenging issue by reducing the average size of random RR sets without sacrificing the approximation guarantee. The proposed solution is orders of magnitude faster than the state-of-the-art algorithms demonstrated by the experiments.