Notes on the Future of Web Search Workshop

Posted: May 26th, 2006 | No Comments »

A few moliskine notes from last week’s Future of Web Search workshop.

From the many diverse talks a few speakers caught my interest including:

Andrei Broder (Yahoo! Research, USA): From query based Information Retrieval to context driven Information Supply Andrei shared his vision of context driven information supply, that is, providing relevant information without requiring the user to make an explicit query (see below).

Andrew Tomkins (Yahoo! Research, USA): Blogs, Friendship and Geography. Analysis of the LiveJournal population based on location and proximity. Mention of “Geo-search”

Nick Craswell (Microsoft Research, UK): Image Search Live. When searching for images people go deeper in the results as for web pages. They read blogs to gather opinions on their implementation of the image search!

Data uncertainty in IR (information retrieval) and databases
I exchanged a few words with Gerhard Weikum from the Max-Planck Institute for Informatics on inferences in search queries and the additional uncertainty in the data that it generates (e.g. does a search on a “professor” also mean to retrieve a “lecturer”?) As I understand handling the mismatch between what is delivered and what is expected is a rather old IR (information retrieval) issue. Somehow, the admitted trade-off is that users willingness to cope with these mismatches is proportional to the value of the seeked information. He pointed me to the work of Jennifer Widom at Stanford on the Trio project, a system for integrated management of data, uncertainty, and lineage. Her group published a couple of papers relevant to my interest:

A. Das Sarma, S.U. Nabar, and J. Widom. Representing Uncertain Data: Uniqueness, Equivalence, Minimization, and Approximation. Technical Report, December 2005.

A. Das Sarma, O. Benjelloun, A. Halevy, and J. Widom. Working Models for Uncertain Data. Proceedings of the Twenty-Second International Conference on Data Engineering, Atlanta, Georgia, April 2006.

User context
The problem of the quality of the retrieved data is twofold. First the algorithm must be able to define relevance out of inferred data. Second how to communicate this relevance to the user based on his own expectations. I raised this question to Andrei Broder from Yahoo! Research who works (user) context driven information supply. He mentioned that it is the job for people of in user experience such as Marc Davis, while I, of course, think that the user’s perspective must already being taken into account in the design of the algorithms. Part of Andrei’s aim is to increase the accuracy of web search results. He defined 3 ways to improve the results:
- Learn a user profile from the interactions
- Define a session profile for each user session
- Use pre-difine word relations (i.e. semantic)

To define profiles there is a need to understand the user context and find the right balance between explicit and implicit queries. My critique of Andrei’s vision is that he wants to infer and push information based on patchy and questionably accurate user contextual data (previous experiences, queries, current location, …) that can misrepresent the user’s intends. As search engines are getting closer to the user (as in ubicomp) it is critical to go beyond database attribute to represent a context. Andrei rightly made fun of the Microsoft Office virtual assistant in form of a paperclip and show other pathetic example of context mismatches. However I am wondering as to how much we have learned from that annoying feature.

Miscellaneous
When you grow a problem by factor 10, you have a new problem

Somebody mentioned the dynamic of the query, that is the path a user take to improve a query. There is certainly something similar as for dealing with location information.

During the talks wireless microphones lost the connection with the base station. The audio engineer told me it was due to the positions and gestures performed by the speakers and there was not much to do about it. Some much for the “cloud of connectivity”… Even mature technologies sometimes fail to deliver in controlled environments. My thesis has a future!