Ecir 2007: Keynote Speakers

Andrei Broder

Fellow & VP Emerging Search Yahoo! Research

Stephen Robertson

Microsoft Research Cambridge and City University London

Marco Gori

Dipartimento di Ingegneria dell'Informazione Università di Siena

Andrei Broder

Fellow & VP Emerging Search Yahoo! Research

The Next Generation Web Search and the Demise of the Classic IR model

The classic IR model assumes a human engaged in activity that generates an information need. This need is verbalized and then expressed as a query to search engine over a defined corpus. In the past decade, Web search engines have evolved from a first generation based on classic IR algorithms scaled to web size and thus supporting only informational queries, to a second generation supporting navigational queries using web specific information (primarily link analysis), to a third generation enabling transactional and other "semantic" queries based on a variety of technologies aimed to directly satisfy the unexpressed "user intent", thus moving further and further away from the classic model.
What is coming next? In this talk, we identify two trends, both representing short-circuits of the model: The first is the trend towards context driven Information Supply (IS), that is, the goal of Web IR will widen to include the supply of relevant information from multiple sources without requiring the user to make an explicit query. The information supply concept greatly precedes information retrieval; what is new in the web framework, is the ability to supply relevant information specific to a given activity and a given user, while the activity is being performed. Thus the entire verbalization and query-formation phase are eliminated. The second trend is social search driven by the fact that the Web has evolved to being simultaneously a huge repository of knowledge and a vast social environment. As such, it is often more effective to ask the members of a given web milieu rather than construct elaborate queries. This short-circuits only the query formulation, but allows information finding activities such as opinion elicitation and discovery of social norms, that are not expressible at all as queries against a fixed corpus.

Biography
Andrei Broder is a Yahoo! Research Fellow and Vice President of Emerging Search Technology. Previously he was an IBM Distinguished Engineer and the CTO of the Institute for Search and Text Analysis in IBM Research. From 1999 until early 2002 he was Vice President for Research and Chief Scientist at the AltaVista Company. He was graduated Summa cum Laude from Technion and did his Ph.D. in Computer Science at Stanford University under Don Knuth. Broder is co-winner of the Best Paper award at WWW6 (for his work on duplicate elimination of web pages) and at WWW9 (for his work on mapping the web). He has published more than seventy papers and was awarded twenty patents. He is an IEEE fellow and served as chair of the IEEE Technical Committee on Mathematical Foundations of Computing.

Stephen Robertson

Microsoft Research Cambridge and City University London

The last half-century: a perspective on experimentation in information retrieval

The experimental evaluation of information retrieval systems has a venerable history. Long before the current notion of a search engine, in fact before search by computer was even feasible, people in the library and information science community were beginning to tackle the evaluation issue. Sometimes it feels as though evaluation methodology has become fixed (stable or frozen, according to your viewpoint). However, this is far from the case. Interest in methodological questions is as great now as it ever was, and new ideas are continuing to develop. This talk will be a personal take on the field.

Biography
Stephen Robertson is a researcher at the Microsoft Research Laboratory in Cambridge, UK. He retains a part-time professorship in the Department of Information Science which is part of the School of Informatics at City University, London. He was full-time at City University from 1978 to 1998, and head of department from 1988 to 1996. He also started the Centre for Interactive Systems Research in the Department. His main research interests are in theories and models for information retrieval, specifically probabilistic models, the design and evaluation of IR systems, evaluation methods and optimization. Back in 1976, he was the author (with Karen Sparck Jones) of a probabilistic theory of relevance weighting, which has been moderately influential. An extension of that model (with Stephen Walker, 1994) led to the BM25 function for term weighting and document scoring, now used by many other research groups. Prof. Robertson is a Fellow of Girton College, Cambridge; he was awarded the Tony Kent Strix award in 1998 and the Gerard Salton award in 2000.

Marco Gori

Dipartimento di Ingegneria dell'Informazione Università di Siena

Learning in hyperlinked environments

A remarkable number of important problems in different domains (e.g. web mining, pattern recognition, biology ...) are naturally modeled by functions defined on graphical domains, rather than on traditional vector spaces. Following the recent developments in statistical relational learning, in this talk, I introduce Diffusion Learning Machines (DLM) whose computation is very much related to Web ranking schemes based on link analysis. Using arguments from function approximation theory, I argue that, as a matter of fact, DLM can compute any conceivable ranking function on the Web. The learning is based on a human supervision scheme that takes into account both the content and the links of the pages. I give very promising experimental results on artificial tasks and on the learning of functions used in link analysys, like PageRank, HITS, and TrustRank. Interestingly, the proposed learning mechanism is proven to be effective also when the rank depends jointly on the page content and on the links. Finally, I argue that the propagation of the relationships expressed by the links reduces dramatically the sample complexity with respect to traditional learning machines operating on vector spaces, thus making it reasonable the application to real-world problems on the Web, like spam detection and page classification.

Biography
Marco Gori received the Ph.D. degree in 1990 from Università di Bologna, Italy. From October 1988 to June 1989 he was a visiting student at the School of Computer Science (McGill University, Montreal). In 1992, he became an Associate Professor of Computer Science at Università di Firenze and, in November 1995, he joint the Università di Siena, where he is currently full professor of computer science. His main interests are in machine learning, with applications to pattern recognition, Web mining, and game playing. He is especially interested in the formulation of relational machine learning schemes in the continuum setting. He is the leader of the WebCrow project for automatic solving of crosswords, that has recently outperformed human competitors in an official competition taken place within the ECAI-06 conference. He is co-author of the book "Web Dragons: Inside the myths of search engines technologies," Morgan Kauffman (Elsevier), 2006. Dr. Gori serves (has served) as an Associate Editor of a number of technical journals related to his areas of expertise, including IEEE Transaction on Neural Networks, Pattern Recognition, Neural Networks, Neurocomputing, Pattern Analysis and Application, the International Journal of Document Analysis and Recognition, and the International Journal on Pattern Recognition and Artificial Intelligence. He has been the recipient of best paper awards and keynote speakers in a number of international conferences. He is the Chairman of the Italian Chapter of the IEEE Computational Intelligence Society, a fellow of the ECCAI and of the IEEE.