Time: Tuesday, May 8 (half-day, afternoon, 1:30pm to 5:00pm)
We will study machine learning and associated numerical techniques to automatically learn ranking functions for entities represented as feature vectors as well as nodes in a social network. In the feature-vector scenario, an entity, e.g., a document x, is mapped to a feature vector in a d-dimensional space, and we have to search for a weight vector. This case corresponds to Information Retrieval in the vector space model. Training data consists of a partial order of preference among entities. We will study recent Bayesian and maximum-margin approaches to solving this problem, including recent efficient linear-time approximate algorithms. We will consider various ranking performance measures and how some of them create complications for learning algorithms. In the graph node ranking scenario, we will briefly review Pagerank, generalize it to arbitrary Markov conductance matrices, and consider the problem of learning conductance parameters from partial orders between nodes. In another class of formulation, the graph does not establish Pagerank or prestige-flow relationships between nodes, but encourages a certain smoothness between the scores of neighboring nodes. Some of these techniques have been used by Web search companies with very large query logs. We will review some of the issues and experience with applying the theory to practical systems. If time permits, we will look at general connections between score stability as against rank stability, and the connection between the stability of a score/rank-learning algorithm and its power to generalize to unforeseen test data.
Basic vector algebra and calculus, basic knowledge of Web and search (prerequisite books and survey papers will be posted in advance).
Presenter: Soumen Chakrabarti (Indian Institute of Technology, Bombay)
Soumen Chakrabarti received his B.Tech in Computer Science from the Indian Institute of Technology Kharagpur in 1991 and his M.S. and Ph.D. in Computer Science from the University of California, Berkeley in 1992 and 1996. At Berkeley he worked on compilers and runtime systems for running scalable parallel scientific software on message passing multiprocessors.
He was a Research Staff Member at IBM Almaden Research Center from 1996 to 1999, where he worked on the CLEVER Web search project and led the Focused Crawling project.
In 1999 he joined the Department of Computer Science and Engineering at the Indian Institute of Technology, Bombay, where he has been an Associate professor since 2003. In Spring 2004 he was Visiting Associate professor at Carnegie-Mellon University.
He has published in the WWW, SIGIR, SIGKDD, SIGMOD, VLDB, ICDE, SODA, STOC, SPAA and other conferences as well as Scientific American, IEEE Computer, VLDB and other journals. He holds eight US patents on Web-related inventions. He has served as technical advisor to search-related companies and vice-chair or program committee member for WWW, SIGIR, SIGKDD, VLDB, ICDE, SODA and other conferences, and guest editor or editorial board member for DMKD and TKDE journals. He is also author of a book on Web Mining.
His current research interests include integrating, searching, and mining text and graph data models, exploiting types and relations in search, and Web graph and popularity analysis.