Monday 13 February 2012

XQuery parser performance

This post was updated on 15th Feb with the BaseX 7.1.1 results

A comparison of XQuery engine performance running the XQuery parser from xquerydoc project. The test parses the XQuery program string "2+3":

import module namespace p="XQueryV30" at "XQueryV30.xq";
p:parse-XQuery("2+3")

Engines

  • Zorba XQuery Engine, Version: 2.1.0
  • BaseX 7.1.1 Beta  [Standalone]
  • Saxon-HE 9.4.0.2J
  • MXQuery 0.6.0

java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) Client VM (build 20.1-b02, mixed mode, sharing)


Running on Ubuntu 11.04 on a Thinkpad T42.

Zorba


time zorba -f -q test_xqparser.xq 
real 0m24.562s 
user 0m21.489s 
sys  0m0.240s

BaseX

Results for version 7.1.1 (BaseX711-20120215.234615)

time basex test_xqparser.xq
real 0m1.601s
user 0m1.260s
sys 0m0.088s
Results for version 7.1.0
time basex test_xqparser.xq
real 96m29.589s
user 54m35.961s
sys 0m19.533s

Saxon


time saxon-xq test_xqparser.xq 
real 0m2.673s
user 0m2.372s
sys  0m0.140s

MXQuery


java -Xms1024m -Xmx1024m  -jar mxquery.jar -f test_xqparser.xq
MXQuery 0.6.0
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

All scripts are run from the xqparserperf/src directory.

Github: xqparserperf 

(import-existing-source-code-to-github)

8 comments:

  1. Andy, you might like to test with eXist-db using the Java Admin Client running in embedded mode. My measurements, probably on a less powerful machine (VM) are 508 seconds for compilation of the query and 283 ms for execution.

    ReplyDelete
  2. on MarkLogic 5.0 I get

    4063 Expressions

    PT0.01154S

    ReplyDelete
    Replies
    1. Well, MarkLogic times are not comparable, as the other results seem to contain parsing, compilation, etc. The total costs would be much more interesting here.

      Delete
    2. sure those are just execution times ... which are comparable if we compare execution times ... apologies if it seemed like I was making that unclear (I said this via tweet).

      startup times will always be based on 'what is starting up' ... I am more interested in comparing compilation and execution times, still working on 'cold' compilation timings for ML.

      in these kind of adhoc comparison ... all the vagaries that come along with benchmarks apply.

      Delete
  3. Recently, I contributed some optimizations to performance to REx that improved performance by at least 50% on MarkLogic. The key improvement was using fn:subsequence instead of predicate range expressions. I would re-run the ebnf through rex to get new statistics. Thanks Gunther for his attention.

    ReplyDelete
    Replies
    1. Interesting approach: improve queries to support implementations. Why not the other way round? ;)

      Delete
    2. well, you may have a point ... in this case perhaps predicate range expressions should perform as well as fn:subsequence I guess is what you are implying.

      processors rewrite expressions all the time and perhaps this is what should be done in this case; though having looked at the change I think that subsequence is the right instruction to use, but tastes can vary.

      otherwise its a valid point

      Delete
    3. The optimizations were in support of using Rex to parse XQuery 3.0 as a parse tree. ML does support XQuery 3.0. But using REx we could simulate it by rewriting the XQuery 3.0 back down to 1.0. I often use the XQuery parsing to do Dynamic scaffolding or unit tests. You should try xray http://www.github.com/robwhitby/xray. This shows some very interesting work in using XQuery parsing to dynamically run unit tests.

      Delete