Login
User name:
Password:
Remember me 
Powered by BlogHarbor
Powered by BlogHarbor
Re: Wrapping the .NET DOM
by Michael Kay
Wolfgang Hoschek sent me the following comments: here are some notes that don't pertain to .NET but rather to some unqualified generalization: > it's far better, when you can, to build a native Saxon tree directly from raw XML - the same is true for the Java product. I think the performance characteristics aren't as simple as suggested. There are at least 4 phases that contribute to overall execution time: 1. xml parsing / tree building 2. query compilation 3. query execution 4. serialization Depending on the app some of these phases may be much more expensive than others. Some phases may be completely absent or amortized over many executions. For example, running millions of trivial (sequential) XPath queries per second over records from large XML streams exhibits a quite different profile than dynamic generation of HTML for publishing via XSLT, which in turn is different from running occasional complex queries over very large document (databases). Queries/transforms that construct new trees are very different than searches, etc. Memory consumption may or may not be an issue. Having said that, here are some surprising yet reproducible observations based on tuning the XOM NodeWrapper: 1) XML parsing / tree building is significantly faster with the XOM model than with the saxon Tinytree model (e.g. with SAX/xerces-2.8.0) 2) serialization is faster with XOM than with tinytree I don't have reproducible numbers on the following observations, but at least anectotal evidence suggest that: 3) query execution can go either way, depending on the type of query. The outcome is on a case by case basis. Constructing a new tree is the worst case scenario for the XOM wrapper, but often that's not needed at all. Not everyone's publishing HTML. 4) DOM is far from the best random access tree model, and the saxon DOM NodeWrapper doesn't look like it's been optimized at all, compared to the amount of work that has gone into TinyTree (e.g. the generic AxisIterators use by DOM can be a large time sink). I wouldn't be surprised if some work on the DOM NodeWrapper would yield significant better results for the query execution phase.
Post comment:
  Receive comment notifications for this article
Subject: 
Comment: 
Comment verification:

Please enter the text you see inside the graphic to post your comment:
This blog does not allow anonymous comments. Please provide your username and password along with your comment.
Login information:
Username: 
Password: 
If you would like to post contact information on your comment, please enter your information into the optional fields below:
Contact information:
URL:  example: http://yourdomain.com