|
|
||||
|
Re: Wrapping the .NET DOM
by
Michael Kay
Wolfgang Hoschek sent me the following comments:
here are some notes that don't pertain to .NET but rather to some unqualified generalization:
> it's far better, when you can, to build a native Saxon tree directly from raw XML - the same is true for the Java product.
I think the performance characteristics aren't as simple as suggested.
There are at least 4 phases that contribute to overall execution time:
1. xml parsing / tree building
2. query compilation
3. query execution
4. serialization
Depending on the app some of these phases may be much more expensive than others. Some phases may be completely absent or amortized over many executions. For example, running millions of trivial
(sequential) XPath queries per second over records from large XML streams exhibits a quite different profile than dynamic generation of HTML for publishing via XSLT, which in turn is different from running occasional complex queries over very large document (databases).
Queries/transforms that construct new trees are very different than searches, etc. Memory consumption may or may not be an issue.
Having said that, here are some surprising yet reproducible observations based on tuning the XOM NodeWrapper:
1) XML parsing / tree building is significantly faster with the XOM model than with the saxon Tinytree model (e.g. with SAX/xerces-2.8.0)
2) serialization is faster with XOM than with tinytree
I don't have reproducible numbers on the following observations, but at least anectotal evidence suggest that:
3) query execution can go either way, depending on the type of query.
The outcome is on a case by case basis. Constructing a new tree is the worst case scenario for the XOM wrapper, but often that's not needed at all. Not everyone's publishing HTML.
4) DOM is far from the best random access tree model, and the saxon DOM NodeWrapper doesn't look like it's been optimized at all, compared to the amount of work that has gone into TinyTree (e.g. the generic AxisIterators use by DOM can be a large time sink). I wouldn't be surprised if some work on the DOM NodeWrapper would yield significant better results for the query execution phase.
|
Search
Recent Comments
Recent Articles
Month Archive
|
|||