The last remaining query in the XMark benchmark where the compiled code was running significantly slower than the interpreted code was Q10. This query is along the lines:
for $i in distinct-values(
/site/people/person/profile/interest/@category)
let $p := for $t in /site/people/person
where $t/profile/interest/@category = $i
return <personne>
<statistiques>
<sexe>{ $t/profile/gender }</sexe>,
<age>{ $t/profile/age }</age>,
<education>{ $t/profile/education}</education>,
<revenu>{ $t/profile/@income } </revenu>
</statistiques>
<otherStuff>................</otherStuff>
</personne>
return <categorie>
<id>{ $i }</id>
{ $p }
</categorie>
It turned out that the interpreted code was writing all the constructed elements directly to the result stream, whereas the compiled code was constructing an intermediate tree representing the contents of variable $p, and then copying this to the result stream, which meant that the execution time was nearly doubled. The reason for this is that the interpreted code represents $p in a Closure object, essentially a value subject to lazy evaluation, and then makes a run-time decision whether to evaluate this in "pull" or "push" mode depending on how the value is actually used. The compiled code was always evaluating such variables in "pull" mode, leading to the unnecessary copy.
I considered various solutions. One could reproduce the interpretive behaviour by allowing a compiled closure to be evaluated in either mode, still making a run-time decision; or one could make a decision based on compile-time analysis that the reference to $p occurs in a context where push evaluation is advantageous. Instead I decided to do something more radical, and rewrite the tree to expand the variable reference $p inline. This is something I've been meaning to do for a long time, but I was always a bit concerned about context dependencies: what if the reference to $p occurs somewhere where the context is different from the place where $p is declared? However, on checking the code, it seems that all the analysis to resolve this has already been done. Saxon is already making a compile-time decision to use a "non-memo-closure" for $p (one which evaluates the expression without remembering the result), and the conditions for that seem to be exactly the same as the conditions that allow inlining. Moreover, there's already a tree-rewriting operation that does variable inlining, which is used for a more restricted purpose. So I discovered to my great surprise that the code to inline $p was only an extra four lines!
The effect on the performance of the interpreted code is an improvement from 167 to 161ms - nothing dramatic, as one would expect, because all we've done is to eliminate the small overhead of remembering the variable definition and evaluating it on first use. For the compiled code, however, inlining eliminates the tree copy operation, and improves the performance from 301ms to 104ms.
All the more complex queries are now showing a speed up (for compiled versus interpreted execution) of between 25% and 50%. That's good enough for a first release, although I hope to improve on it. I would hope that real user workloads will see higher figures: as the work on the Knight's Tour showed, much higher speed-up is possible in cases where there are lots of variables, function calls, and computation within the query.
But before the code can be released there are still a lot of things missing. Mainly things that are missing from the test suites, like external Java functions, user-defined collations, and so on; and also a lot more work on the functions and expressions which are tested in XQTS only using constant expressions that can be evaluated at compile time. Plus a review of the hundreds of TODO statements scattered around the code.
|
|
||||||||
Compiling Q10
Comments
Re: Compiling Q10
by
Dimitre Novatchev
on Fri 10 Nov 2006 02:40 GMT | Permanent Link
Probably this is a naive question: Is it too difficult to generate C#code?
Re: Re: Compiling Q10
I'm trying to keep the Java code generation at a sufficient level of abstraction that it shouldn't be too hard to generate C# instead. I haven't absolutely decided to go that way though. Alternatives are Java -> bytecode -> IL via IKVMC, or direct generation of bytecode/IL.
Re: Compiling Q10
by
Christopher Sahnwaldt
on Mon 13 Nov 2006 21:37 GMT | Profile | Permanent Link
Congratulations on the speedup. I'd be interested to hear how you generate the code.
We're doing some code generation as well. We're using XSLT to generate Java from a small XML 'language'. Works much better than out initial approach of using Java to walk through a DOM and generate the code. Then we use com.sun.tools.javac.Main.compile() to compile the generated java files. It works, but it's a bit clumsy, since javac can only handle java files in the file system, not Strings or something else in memory. Maybe the tools API in JDK 6 will bring some improvements. Re: Re: Compiling Q10
Yes, I've also found that doing the Java compile via the API is pretty kludgey. I'm leaning towards simply writing the Java out to filestore, and leaving the user to add the Java compile stage - it works much better that way when there's a raft of queries to compile en bloc. The kludginess of the compiler API seems to be the main reason that going straight to bytecode would be beneficial.
Re: Re: Re: Compiling Q10
by
Christopher Sahnwaldt
on Mon 13 Nov 2006 22:40 GMT | Profile | Permanent Link
Thanks for the reply. Just to make sure I understand - do you mean the new javax.tools API http://java.sun.com/javase/6/docs/api/javax/tools/package-summary.html or Main.compile()? I just looked over the new API. I seems that it allows to somehow compile Java files from strings, but I'm not looking forward to implementing JavaFileManager. It would take me a while to thoroughly understand all these interfaces, enums, etc. So in the end it's almost as bad as Main.compile(). :-)
Re: Compiling Q10
I believe that the compilation approach could allow the creation of compiled XSLT "function libraries" -- modules to be easily distributed, deployed and re-used by different transformations.
Do you think this would be possible to achieve in a future Saxon version? |
Search
Recent Comments
Recent Articles
Month Archive
|
|||||||