Coincidentally, the day after my successful efforts to get the XStream benchmark running using the Saxon-SA streaming mode for large documents, (see previous post) a Saxon-SA customer sent me a problem: they needed to transform a 450 Mb file and wanted to know whether this facility would crack the problem. The answer, sadly, was no: the optimization only works if the transformation can be broken up into a sequence of transformations on subtrees of the document. In this particular use case the subtrees to be transformed carried context with them, for example they were grouped hierarchically, so the method doesn't work.
As always, though, some new customer requirements proved a useful stimulus to improving the product. I haven't cracked this use case yet, but I have made some useful improvements.
Firstly, you can now specify a filter on the expression defining the subtrees to be transformed, for example
<xsl:copy-of select="doc('huge.xml')//item[@price gt 50.00]"/>
The only restrictions on the filter are that it mustn't be positional, and it can't look outside the subtree being copied.
Secondly, union expressions now work (this is a bug fix).
Finally, the construct is now decoupled from the push-pull multithreading implementation. If all you want to do is to write selected parts of the large document to the serializer, or to a temporary tree held in a variable, then there is no push-pull conflict, and the whole thing can operate in push mode, filtering and all. On the other hand, if your stylesheet iterates over the sequence returned by the xsl:copy-of, then the multithreaded implementation is still used (this slows things down by about a factor of two, but that's often worth it for the memory savings.)
|
|
||||||||
Streaming Mode on large documents
Comments
Re: Streaming Mode on large documents
by
MarkCline
on Thu 23 Apr 2009 03:15 BST | Profile | Permanent Link
Michael,
Dumb question but how do I use the streaming capabilities from within Java? I really just want to take a source document and a supplied XSL and chunk the transformation if at all possible. I just can't seem to figure it out... Re: Re: Streaming Mode on large documents
Could I suggest you use the saxon-help list or forum on SourceForge for support questions? I don't think the blog is a very good vehicle for that - it's too hard for me to track that questions have been properly answered.
|
Search
Recent Comments
Recent Articles
Month Archive
|
|||||||