Workshop on the Future of Web Search

Posted: May 18th, 2006 | No Comments »

Today and tomorrow I take part to a workshop on the Future of Web Search, held at the UPF Barcelona.

My good friend Mauro Cherubini will present his work on Mobile Search on Ubiquitous Collaborative Annotations of Space.

For credits in the doctoral school, I will need to write a paper on the very specific topic of Efficient Top-k Queries for XML Information Retrieval, that will be part of Gerhard Weikum‘s keynote speech

Abstract

Non-schematic XML data that comes from many different sources and inevitably exhibits heterogeneous structures and annotations (i.e., XML tags) cannot be adequately searched using database query languages like XPath or XQuery. Often, queries either return too many or too few results. Rather the ranked-retrieval paradigm is called for, with relaxable search conditions, various forms of similarity predicates on tags and contents, and quantitative relevance scoring.

The talk discusses recent advances and open research issues for ranked retrieval of XML data, and exemplifies them by the TopX search engine, a prototype system developed at the Max-Planck Institute for Informatics. TopX supports a probabilistic-IR scoring model for full-text content conditions and tag-term combinations, path conditions for all XPath axes as exact or relaxable constraints, and ontology-based relaxation of terms and tag names as similarity conditions for ranked retrieval. For speeding up top-k queries, various techniques are employed: probabilistic models as efficient score predictors for a variant of the threshold algorithm, judicious scheduling of sequential accesses for scanning index lists and random accesses to compute full scores, incremental merging of index lists for on-demand, self-tuning query expansion, and a suite of specifically designed, precomputed indexes to evaluate structural path conditions.