The code to compile XQuery into Java has now advanced sufficiently that I can run all the XMark benchmark queries, producing correct results, and reproducing the Saxon-SA join optimizations. First results are shown below: there's clearly room for further work in some areas, but overall it's an encouraging milestone. All timings in milliseconds, to run the query on a 12Mb data file
Query Saxon-B Saxon-SA compiled
q1 6.6 6.8 1.8
q2 6.0 5.8 4.4
q3 22.0 19.4 14.8
q4 21.8 21.2 17.0
q5 6.8 4.6 3.8
q6 4.4 2.6 4.6
q7 32.6 20.2 34.5
q8 12217.7 56.3 21.6
q9 16700.7 81.5 47.5
q10 1735.7 163.9 265.3
q11 12528.0 4256.3 6049.0
q12 3234.7 1418.7 2697.0
q13 3.2 2.6 2.6
q14 152.2 106.7 115.8
q15 4.8 3.2 3.6
q16 8.4 7.8 4.0
q17 11.6 10.4 6.6
q18 11.2 9.0 8.2
q19 65.5 58.7 125.2
q20 34.9 27.6 24.6
These figures are very preliminary. With some of them there's very little scope for further improvement (the compiled code for count(/a/b/c/) is about as good as it's going to get, and the only way to make further improvements is for the compiler to penetrate deeper than the object model and serialization interfaces that the compiled code currently invokes. But there's no reason why any of the figures for compiled code should be slower than the interpreted figures. The results for Q8 and Q9 are the most encouraging, because these rely very heavily on optimization. In fact it's generally true that the more complex the query, the bigger the potential gains.
There are a couple of outliers here. Q19 is the only query in XMark that does sorting, and that's clearly an area that needs attention. Q6 and Q7 are both dominated by a count(). Q10, 11, and 12 are probably dominated by conversion of untyped values to typed values in filter predicates (the XMark benchmark is schemaless, which makes it very frustrating to work with at times, because there would be so many ways of speeding it up if source changes to the queries were allowed!)
I'll come back with progress reports as these numbers improve. I don't expect to achieve the same kind of speed-up as with the Knight's Tour (a factor of 7) because with these queries, far more of the time is spent in reading the input data, and there's a limit to how fast that can go. A factor of 2 for the more complex queries remains a reasonable goal.
It's worth noting that all these figures are very competitive with those reported by other vendors. Compare with the 11Mb database results at http://monetdb.cwi.nl/XQuery/Overview/Benchmark/XMark/ - the Q9 figure, for example, is 1000 times better than two of the competitive systems cited.
|
|
||||||||
First compiled XMark results
Comments
Re: First compiled XMark results
I spent some time today looking at the anomalous results for Q12. I found one inefficiency in the compiled code: it wasn't using lazy evaluation for an expression that had been moved out of a loop, which meant that the expression was sometimes being evaluated even though the loop ended up being executed zero times. But this turned out not to account for much of the disrepancy in the numbers. In fact the discrepancy turns out to be a measurement error: the column of figures for Saxon-SA are in fact Saxon-SA running against a version of the tests modified to use a schema; so the interpreted and compiled code weren't actually executing the same query. Corrected measurements are:
Query Saxon-B Saxon-SA compiled q1 6.6 6.8 1.8 q2 6.0 5.9 4.8 q3 22.0 21.6 15.3 q4 21.8 24.6 17.6 q5 6.8 6.5 4.2 q6 4.4 4.1 4.3 q7 32.6 33.7 34.1 q8 12217.7 54.7 22.3 q9 16700.7 66.1 52.7 q10 1735.7 177.0 278.6 q11 12528.0 4140.9 5671.5 q12 3234.7 3141.5 2633.7 q13 3.2 3.2 3.1 q14 152.2 122.3 112.4 q15 4.8 4.6 4.2 q16 8.4 8.4 4.3 q17 11.6 12.2 6.8 q18 11.2 11.9 8.2 q19 65.5 65.7 118.5 q20 34.9 36.4 24.6 This still leaves Q10, Q11, and Q19 needing investigation. Although the aim is to achieve a doubling of performance overall, the first step is to ensure that no query runs slower when compiled than it does under the interpreter Re: First compiled XMark results
Q19 (the sorting query) is now down to 43.86ms. The main contributor to the improvement was to ensure that the same SequenceOutputter and TinyTree data strucures are reused for all the newly constructed parentless elements in the sequence being sorted.
Re: First compiled XMark results
For Q11, I discovered that the static type analysis was making an incorrect inference that ($p/profile/@income > (5000 * $i)) was necessarily a numeric comparison. (5000+$i is always a number, but 5000*$i might be a duration.) Correcting this bug increases the time taken by the interpretive code from 4018ms to 7456ms, but decreases the compiled time from 5347ms to 5295. Don't ask me why.
|
Search
Recent Comments
Recent Articles
Month Archive
|
|||||||