XQSBench: XQuery Selectivity Estimation Benchmark








Query List



The Extensible Mark-up Language (XML) has rapidly evolved to an emerging standard for large-scale data exchange and integration over the Internet. The semi structured nature of XML allows data to be represented in a considerably more flexible nature than in the traditional relational paradigm. However, the tree-based data model underlying XML poses many challenges especially with regard to the problem of performing efficient query evaluations. In 2001 XQuery is decided by the World Wide Web Consortium (W3C) as the standard XML query language. XQuery is based on a hierarchical and ordered document model which supports a wide variety of constructs and use cases. The language addresses a wide range of requirements, thus incorporating a rich set of features.


Selectivity estimation is an important feature in modern query processors. Accurate forecasting for the selectivity of a given query has many advantages such as:

1. It is a crucial piece of information for an effective query optimization process. The cost of an execution plan for a given query is determined by the size of the final result of the query plus the sum of sizes of the intermediate results used to compute this final result.

2. It allows the system to provide users with early feedback about the expected outcome of queries and their associated computational effort. The user may use this information to refine or enhance the queries such that they generate a smaller set of related results.

3. It gives a hint on the possible avenues to optimize resource allocation of the execution process.


The selectivity estimation problem is more complicated in the XML domain than the relational domain. The main reason behind this is that the XML queries involve structural conditions in addition to the value-based conditions.


The XML research community has proposed several benchmarks such as [1,2,3,4]. These benchmarks are mainly focusing on exercising the different aspects of XML databases such as storage, indexing and the performance of query evaluation. Although these benchmarks are very useful for their targets and perspectives, none of these benchmarks fits with the XQuery cardinality estimation purposes.


Unlike the usual XML database, the target of XQSBench benchmark is to establish the basis of evaluating the accuracy and capabilities of XQuery selectivity estimation approaches. A good XQuery selectivity estimation system should be able to support as much as possible from the different estimation aspects mentioned in this benchmark, and must scale reasonably well for combinations of these aspects.


The benchmark consists of six groups of queries. Each group intends to address one of the XQuery selectivity estimation challenges. The queries of our proposed benchmark are based on the well-known sample XML document of the XMark benchmark "auction.xml". The structure of this XML document is described in details in [3]. We also reuse some of the XMark queries but from other perspectives serving our selectivity estimation focus.





[1] Timo Bohme and Erhard Rahm. XMach-1: A Benchmark for XML Data Management. In Datenbanksysteme in Buro, Technik und Wissenschaft (BTW), 9. GI-Fachtagung, pages 264-273, London, UK, 2001.


[2] Matthias Nicola, Irina Kogan, and Berni Schiefer. An XML transaction processing benchmark. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 937-948, New York, NY, USA, 2007.


[3] Albrecht Schmidt, Florian Waas, Martin L. Kersten, Michael J. Carey, Ioana Manolescu, and Ralph Busse. XMark: A Benchmark for XML Data Management. In Proceedings of the 28th International Conference on Very Large Data Bases (VLDB), pages 974-985, Hong Kong, China, September 2002.


[4] Benjamin B. Yao, M. Tamer Ozsu, and John Keenleyside. XBench A Family of Benchmarks for XML DBMSs. In Proceedings of the VLDB 2002 Workshop EEXTT and CAiSE 2002 Workshop DTWeb on Efficiency and Effectiveness of XML Tools and Techniques and Data Integration over the Web-Revised Papers, pages 162{164, London, UK, 2003.