January 17, 2011

Muath Alramma (LACL)

XML is one of the most important standards for manipulating data on the Internet. However, querying large volume of XML data represents a bottleneck for several computationally intensive applications. A possible solution consists in computing synopses data structures from XML data sets, i.e. compressed representations providing a “succinct” description of the original data sets. Computing such synopses for large and complex data sets is a time/space challenge. To overcome this problem, we propose to pre-process the data set in streaming mode with resources approximately proportional to data set depth and query selectivity. In this work, we give a brief introduction on XML, stream-processing and selectivity estimation for XML queries. We propose a new application of the path tree synopsis data structure. The path tree provides a succinct description of the original data set with low computational overhead and high accuracy for processing tasks like selectivity estimation and query answer approximation. We also present an on-line stream-querying system able to estimate the cost for a given query before answering it accurately. The core algorithm is adapted from Gou & Chirkova 2007, we apply it to path tree traversal, cost estimation, query processing and even optimizations.