Login
User name:
Password:
Remember me 
Powered by BlogHarbor
Powered by BlogHarbor
Re: Re: Ten Reasons why Saxon XQuery is Fast
by Michael Kay
Thanks for the comments. I prepared this article to a rather tight deadline so I didn't have time to do as much analysis and confirmation of these numbers as I would have liked (and there wouldn't have been space to publish it anyway), so my explanations are a bit conjectural. * I would have expected a bigger difference on memory usage between TinyTree and LinkedTree, and I need to investigate this one further. * On the build time, I don't find the figures surprising; no doubt the difference is partly Java object allocation, and partly overheads within Saxon itself (I've done much more optimisation on the TinyTree), and I would hesitate to guess in what ratio. * The difference in navigation time is probably particularly marked for this example, a search of the descendant axis where few nodes match the nodetest. The basic loop for the Tinytree just says {currentNodeNumber++; stop if the depth isn't greater than the starting depth; check if the node matches the nodetest (without even instantiating a node object unless it does match)}. That's essentially an integer increment and two integer comparisons (and one method call) for each descendant skipped over. For the LinkedTree the equivalent loop looks to see if there are any children, then tries the next sibling, then tries the next sibling of each ancestor in turn: it's much more complex logic and will probably involve at least five method calls in a typical case, in addition to testing the nodetest. As for schema information: the Saxon schema processor doesn't actually generate a full PSVI. All it does (apart from validation!) is to generate the type annotation for each node, essentially a reference to a simple type or complex type. At run-time this information is essentially used only for atomization and type checking in the obvious way. At query/stylesheet compile time, Saxon attempts to do static type inference for every expression, and the main use of schema information here is with path expressions: given $x/e, if the schema type of $x is known, then the schema type of $x/e can be inferred. This is done for the child, descendant, and attribute axes only. There is also some "pre-validation" of output structures, for example detecting that
is going to generate invalid output against the XHTML schema: again this only uses knowledge of which elements can appear as children/attributes of which others. I suspect one could do quite a lot of this with a Relax-NG schema. There isn't a very heavy reliance on the Element Declarations Consistent constraint - EDC doesn't stop you having two types derived by extension from E that both include an element F but with different types, so Saxon has to allow for this possibility, which means that EDC doesn't simplify the logic. And UPA makes no difference to the rules because Saxon is never (at present) analyzing the permitted sequences of child elements, only which elements are allowed as children and which aren't.
Post comment:
  Receive comment notifications for this article
Subject: 
Comment: 
Comment verification:

Please enter the text you see inside the graphic to post your comment:
This blog does not allow anonymous comments. Please provide your username and password along with your comment.
Login information:
Username: 
Password: 
If you would like to post contact information on your comment, please enter your information into the optional fields below:
Contact information:
URL:  example: http://yourdomain.com