I'm back to work on the Java code-generation for XQuery. The challenge is to improve the test coverage. With 14000 tests in the W3C test suite, one would think the coverage would be pretty good; but the problem is that many of the tests use literal constants, which means the whole query can be evaluated statically by the XQuery analyzer, before the code generator gets a look in. Early evaluation of expressions at compile time is known rather quaintly in the compiler literature as "constant folding". We should be able to improve the coverage of the code generator testing simply by not doing constant folding.
Of course one way to do this would be to set some diagnostic flags within the Saxon optimization code. But that kind of thing is hard to maintain, and it's unsatisfactory to do testing with a modified version of the product. It occurred to me that it would be better to modify the queries.
Here's an example. There's a query fn-codepoints-to-string-3 that does this:
fn:codepoints-to-string(49)
The Java code that Saxon generates for this is (minus the boilerplate):
final static StringValue constant0 =
StringValue.makeStringValue("1");
public void process(final XPathContext context)
throws XPathException {
SequenceReceiver out = context.getReceiver();
out.append(constant0, 0, 0);
}
In other words, all the hard work has already been done.
We can fix this by modifying the query to read:
fn:codepoints-to-string(saxon:noop(49))
Where saxon:noop is a simple extension function:
public static ValueRepresentation noop(ValueRepresentation arg) {
return arg;
}
The optimizer of course doesn't understand extension functions and leaves them well alone. So this query should compile to something much more interesting. (Some calls to standard functions are simply implemented by calls to the run-time library. Others, including this one, are likely to generate inline code).
It turns out that it's very easy to do this transformation of the query. That's because all the queries (or the vast majority of them) have been published not only in "human-readable" XQuery format, but also in an XML format, XQueryX. This is what the query looks like in XQueryX:
<?xml version="1.0"?>
<xqx:module xmlns:xqx="http://www.w3.org/2005/XQueryX"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2005/XQueryX
http://www.w3.org/2005/XQueryX/xqueryx.xsd">
<xqx:mainModule>
<xqx:prolog>
<xqx:varDecl>
<xqx:varName>input-context1</xqx:varName>
<xqx:external/>
</xqx:varDecl>
</xqx:prolog>
<xqx:queryBody>
<xqx:functionCallExpr>
<xqx:functionName xqx:prefix="fn">codepoints-to-string</xqx:functionName>
<xqx:arguments>
<xqx:integerConstantExpr>
<xqx:value>49</xqx:value>
</xqx:integerConstantExpr>
</xqx:arguments>
</xqx:functionCallExpr>
</xqx:queryBody>
</xqx:mainModule>
</xqx:module>
The XQueryX specification comes with an XSLT stylesheet that converts this parse tree into the human-readable XQuery syntax. All we need to do is to add an overlay to this stylesheet that adds one or two rules, for example:
<xsl:import href="xqueryx.xsl"/>
<xsl:template match="xqx:integerConstantExpr
| xqx:decimalConstantExpr
| xqx:doubleConstantExpr">
<xsl:text>saxon:noop(</xsl:text>
<xsl:value-of select="xqx:value"/>
<xsl:text>)</xsl:text>
</xsl:template>
and the modified stylesheet produces the query that we want. Add to this Saxon's ability using the collection() function (together with xsl:result-document) to transform all the files held in a directory structure, and the conversion of the whole test suite becomes extremely easy.
Two observations:
(1) I've often been very critical of XQueryX, feeling that there really wasn't a strong enough need for it to justify making it a W3C standard. Well, I'm going to have to eat my words at least partially. This particular exercise would not have been nearly as easy if the tests hadn't been published in XQueryX format, and that wouldn't have happened if it hadn't been a W3C spec.
(2) It's interesting to note how hard it would be to do this in XQuery. The main XQueryX stylesheet certainly benefits immensely from XSLT's top-down apply-templates processing model, but in theory it could have been written in XQuery. The modification layer, however, that changes the behaviour of the transformation to do something slightly different, would be quite impossible to add without modifying the source code of the original query. This is an observation I've made in a number of larger applications: once you want to write code that can be reused for more than one task, XSLT has quite a few features that make it a stronger candidate for the job than XQuery.
Having done this conversion, I've now realized that the Java code-generator doesn't yet handle calls to extension functions. Well, I tested the coverage of the code-generator in a way that I didn't quite anticipate!
|
|
||||||||
How not to fold constants
Comments
Re: How not to fold constants
by
mr
on Wed 31 Jan 2007 11:39 GMT | Profile | Permanent Link
Well, I am just wondering if Saxon support recent changes that has been made in XQuery Update Facility.
Lots of work ahead you. Re: Re: How not to fold constants
I'm planning to wait until XQuery Update is a little more stable before attempting an implementation. At one stage I was sceptical about whether XQuery update made much sense for an in-memory processor, but I'm coming to the conclusion that it does make sense, because some transformations (such as "delete all the @note attributes") are so hard to do any other way (unless you count XSLT of course...)
Re: How not to fold constants
I am studying XQueryX and I have come up with an issue that does not seem to be addressed on the w3.org website for the XQueryX specification. It has to do with ordering stepExpr tags so that they are not sibling nodes. The problem I see with having sibling nodes is that they are supposed to represent a parent-child relationship in the XPath of Xquery. If a parser does not take document linearity into account, then the step expressions, because they are sibling nodes, may become jumbled and lose their order when trying to recreate an XPath. Though most parsers do take document linearity into considerations when parsing an XML document, sibling node order is not a specification that is outlined in the XML specification. If I am missing something please let me know. Thanks for your time and your consideration.
|
Search
Recent Comments
Recent Articles
Month Archive
|
|||||||