Dr. Jianhua Yin is currently an Associate Professor in the School of Computer Science and Technology, Shandong University. He received his Bacherlor degree and Ph.D. degree from Xidian University in 2012 and Tsinghua University in 2017, respectively. He visited the data mining research group in Computer Science, University of Illinois at Urbana-Champaign, under the supervision of Prof. Jiawei Han. His research interests lie primarily in Text Clustering and Bayesian Inference. He has published several papers in the top venues, such as ACM TOIS, IEEE TKDE, ACM SIGKDD, ACM MM, ACM SIGIR, IEEE ICDE, and ACM CIKM. In addition, he has served as reviewers for some top journals and conferences, such as IEEE TKDE, ACM TKDD, ACM SIGKDD, ACM MM, ACM WSDM, and IEEE ICDM. He is invited as a guest editor of Journal Data Science and Engineering (DSE).
Text clustering is an important technology in data mining and machine learning. It is widely used in event discovery and tracking, document summarization, search results clustering, and other issues. Although there are many researches on text clustering, there are still many challenging problems to be solved: (1) How to set the number of clusters? Is it possible to automatically discover the number of clusters from the data? (2) How to deal with the sparsity of short text? (3) How to automatically discover abnormal documents in a dataset? (4) How to deal with the concept drift problem of stream text clustering?
In this report, Dr. Jianhua Yin will share his work on text clustering and the stories behind these papers when he was a PhD student at Tsinghua University, hoping to inspire the younger students who are interested in scientific research.