Stream-processing of XPath queries: Selectivity and Cost Prediction

17 janvier 2011

Muath Alramma (LACL)

XML is one of the most important standards for manipulating data on the Internet. However, querying large volume of XML data represents a bottleneck for several computationally intensive applications. A possible solution consists in computing synopses data structures from XML data sets, i.e. compressed representations providing a “succinct” description of the original data sets. Computing such synopses for large and complex data sets is a time/space challenge. To overcome this problem, we propose to pre-process the data set in streaming mode with resources approximately proportional to data set depth and query selectivity. In this work, we give a brief introduction on XML, stream-processing and selectivity estimation for XML queries. We propose a new application of the path tree synopsis data structure. The path tree provides a succinct description of the original data set with low computational overhead and high accuracy for processing tasks like selectivity estimation and query answer approximation. We also present an on-line stream-querying system able to estimate the cost for a given query before answering it accurately. The core algorithm is adapted from Gou & Chirkova 2007, we apply it to path tree traversal, cost estimation, query processing and even optimizations.