<?xml version="1.0" encoding="UTF-8" ?>

<rss version="2.0"
  xmlns:ent="http://www.purl.org/NET/ENT/1.0/"
  xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
  <title>Saxon diaries</title>
  <link>http://saxonica.blogharbor.com/blog</link>
  <description></description>
  <language>en-us</language>
  <lastBuildDate>Tue, 09 Feb 2010 09:14:41 +0000</lastBuildDate>
  <category domain="http://saxonica.blogharbor.com/blog">Main Page</category>
  <generator>Blogware</generator>
  
  <item>
    <dc:creator>Michael Kay</dc:creator>
    <title>Unicode, regular expressions, and Java</title>
    <link>http://saxonica.blogharbor.com/blog/_archives/2010/1/13/4427544.html</link>
    <guid>http://saxonica.blogharbor.com/blog/_archives/2010/1/13/4427544.html</guid>
    <pubDate>Wed, 13 Jan 2010 17:00:12 +0000</pubDate>
    <description>Many moons ago, when I first introduced regular expression support to Saxon&#39;s XSLT processor, I picked up a piece of software written by James Clark to translate regular expressions as defined in the XML Schema specification to regular expressions as understood by Java. Like any software written by James, it was extremely robust, handled all the quirks of the underlying specifications with unfailing accuracy, was tightly coded and fast, and was totally undocumented.&lt;br&gt;
&lt;br&gt;
One of the particular tasks it handled was to handle the fact that the Schema/XPath regex dialect counted characters above 65535 as one character, whereas the Java regex library until JDK 1.4 treated them as two.&lt;br&gt;
&lt;br&gt;
Over the years I&#39;ve modified the code a bit. When JDK 1.5 came along and handled high-end characters correctly, I forked the code and produced one version for JDK 1.4, another for JDK 1.5. A third version targetted the .NET regex dialect. In Saxon 9.2 I finally got rid of the JDK 1.4 version, and I was also able to get rid of the .NET version by switching from using the .NET regular expression library to the library in OpenJDK, which had finally become reliable enough.&lt;br&gt;
&lt;br&gt;
Another of the tasks performed by James&#39;s code was to map character classes such as \P{Lu} (matching any upper-case character) from what XPath said it should mean to what Java thought it meant. This code has been untouched until now, but I&#39;ve decided to take a fresh look at it and see whether it is really needed. Apart from the problem of high-end characters, it seems that what it was really doing was coping with differences between Unicode versions. It&#39;s a little hard to unearth the history now of which specifications mandated which Unicode version, but the current situation seems to be that JDK 1.5 and 1.6 support Unicode 4.0, while the schema (and hence XPath) specs originally specified Unicode 3.1, but now allow you to support whatever later Unicode version you like. So 4.0 support would be fine.&lt;br&gt;
&lt;br&gt;
I&#39;ve generated XML documents showing the mapping of characters to classes by three different methods: direct Java coding using the JDK regex engine; XSLT code using Saxon 9.2 which uses the Clark translation of regular expressions to the JDK engine; and analysis of the data files published by the Unicode consortium.&lt;br&gt;
&lt;br&gt;
Between Saxon and the JDK there is a very close match. The only difference is that the JDK category C includes subcategory Cn (unassigned characters) whereas Saxon includes this subcategory in its parent class.&lt;br&gt;
&lt;br&gt;
Between the data coming from Unicode and the JDK there is a less close match,. This was because I worked with Unicode 5.2 data files, which includes many more characters than the JDK understands. But I&#39;ve repeated the comparison with Unicode 4.0.0 files, and this gives a very close match, after allowing for expected discrepancies such as the omission of surrogates (which are non-characters in XPath) in one of the lists.&lt;br&gt;
&lt;br&gt;
So, it&#39;s looking as if I can cut out a lot more of James&#39; translation code.&lt;br&gt;
&lt;br&gt;
There does seem to be one snag that I need to look at further: in one of my attempts to collect this data, I used the complement classes such as \P{Cn}, replacing characters that matched this term with an empty string. The result was a string containing unmatched surrogates, which immediately crashes Saxon. This could be a bug in the JDK handling of surrogate pairs, or it could be something else that I don&#39;t yet understand. One of the difficulties with regex handling has always been that you&#39;re very exposed to bugs in the regex library for the final ounce of conformance, and if you hit them, there&#39;s sometimes not much you can do about them.</description>
    
    <category domain="http://saxonica.blogharbor.com/blog">Main Page</category>
    
    
    
    
  </item>
  
  <item>
    <dc:creator>Michael Kay</dc:creator>
    <title>Importing a stylesheet module repeatedly</title>
    <link>http://saxonica.blogharbor.com/blog/_archives/2010/1/6/4421467.html</link>
    <guid>http://saxonica.blogharbor.com/blog/_archives/2010/1/6/4421467.html</guid>
    <pubDate>Wed, 06 Jan 2010 08:07:03 +0000</pubDate>
    <description>I&#39;m looking at a case submitted by a user where XSLT compilation is very slow. It turns out to be caused by the same module being repeatedly imported with many different import precedences (in the case of one module, with 174 different import precedences). The user has a solution (only import it once, or change some of the imports to includes); but I&#39;m wondering what changes to make to prevent the problem recurring.&lt;br&gt;&lt;br&gt;In this particular case, the stylesheet uses many functions and not many template rules. This is significant because the two cases are different: with functions, a named function that is masked by another with the same name and arity but higher precedence is dead wood: it can never be invoked, so all the costs of compiling it and optimising it are unnecessary. With template rules however, a template can always be reached using xsl:apply-imports or xsl:next-match, so it can never be discarded.&lt;br&gt;&lt;br&gt;The specification says that including or importing the same module twice has exactly the same effect as if you included or imported two different modules that happened to have the same content. And that&#39;s exactly how Saxon behaves: it doesn&#39;t remember which modules have previously been read, so the second import/include causes the same document to be read from disk, parsed into a tree, and then to go through all the stages of XSLT and XPath static analysis. Clearly this involves a lot of wasted work.&lt;br&gt;&lt;br&gt;The first question is, how much does this matter? How many users does it affect, and how badly does it affect them? It&#39;s impossible to answer the question statistically, so the usual test I apply is that if the code is correct and a reasonable user might write it in this way, then the software has a responsibility to try and execute it with reasonable efficiency. The principle (another one I remember from David Wheeler - he was an appalling lecturer, but he seems to have drilled some firm ideas into my brain) is &quot;optimize the code that reasonable users actually write&quot;. And it does seem that a reasonable user, seeing that module A has a dependency on module B, might well add a redundant import declaration, and would not expect this to have an adverse effect on performance.&lt;br&gt;&lt;br&gt;The next question is how far we should go in eliminating the unnecessary processing that is currently being done. Let&#39;s look at where the inefficiencies arise:&lt;br&gt;&lt;br&gt;&lt;div style=&quot;margin-left: 40px;&quot;&gt;* We are parsing the same module repeatedly (about 10% of the cost), then parsing its XPath expressions repeatedly (another 10%), and doing other miscellaneous static checking (say another 5%).&lt;br&gt;&lt;br&gt;* In the case of functions, we do a serial search to bind function calls to stylesheet functions, which is taking a long time because there are multiple instances of the same function. I have already implemented a hash lookup to fix this particular problem, which has halved the compilation time.&lt;br&gt;&lt;br&gt;* Again in the case of functions, we are type-checking and optimizing functions that can never be called because they are masked by another of higher import precedence. This is another 10% or so of the overhead, and I have already made changes to eliminate it.&lt;br&gt;&lt;br&gt;* In the case of template rules, the &quot;masked&quot; templates do need to be type-checked and optimized because they can be called, using apply-imports or next-match. However, we could potentially take advantage of the fact that two template rules with different import precedence can share the same executable code: if they come from different incarnations of the same module, then they will have identical static contexts, and could therefore be compiled once and share the same compiled code.&lt;br&gt;&lt;br&gt;&lt;/div&gt;We can do some of these optimizations (like doing less work to process masked functions) without any change to the strategy of reading and parsing modules repeatedly. But other optimizations do mean that we need to recognize when two modules derive from the same source. In particular, we need to distinguish a &quot;masked&quot; function that comes from the same source module from a &quot;masked&quot; function deriving from a different source module: in the latter case, the spec requires all static errors to be detected even though the code is dead, while in the former case we know that we only need to do the analysis once to report all static errors.&lt;br&gt;&lt;br&gt;The toughest aspect of this is that there is an impact on data structures. This is particularly the case with template rules: the idea of two template rules sharing the same executable code even though they have different import precedence, and then taking advantage of this to only type-check and optimize the code once. Changing data structures is always the hardest kind of change. But it&#39;s probably worth doing.&lt;br&gt;</description>
    
    <category domain="http://saxonica.blogharbor.com/blog">Main Page</category>
    
    
    
    
  </item>
  
  <item>
    <dc:creator>Michael Kay</dc:creator>
    <title>Pipedreaming: Could XPath have been better?</title>
    <link>http://saxonica.blogharbor.com/blog/_archives/2010/1/5/4420740.html</link>
    <guid>http://saxonica.blogharbor.com/blog/_archives/2010/1/5/4420740.html</guid>
    <pubDate>Tue, 05 Jan 2010 11:30:06 +0000</pubDate>
    <description>I normally resist the kind of wishful thinking that tries to improve languages like XML or XPath without worrying about backwards compatibility. In practice you can never ignore the legacy: compatibility means deliberately repeating other people&#39;s mistakes, as David Wheeler used to say when I was an undergraduate. But it&#39;s New Year, so let&#39;s be absurdly optimistic, and assume that anything can be done. (And what set me on to this was actually something quite practical: Anthony Coates is looking at the XML support in Scala. Scala has a kind of XPath-like expression that adapts the XPath syntax into the Scala framework. So in such an environment, there is indeed an opportunity to rethink things.&lt;br&gt;&lt;br&gt;Here are some of the changes I think I would make:&lt;br&gt;&lt;br&gt;&lt;div style=&quot;margin-left: 40px;&quot;&gt;* Avoid the overloading of [] to act as both a filter and a subscript operator. Perhaps use [] for subscripting and ? for filtering, or perhaps use ! for subscripting and [] for filtering. The current overloading, especially because it is decided dynamically rather than statically, causes some very odd effects in edge cases. For the present, I&#39;ll avoid [] entirely, and use ? for the filter operator, and ! for subscript. We&#39;ll postpone decisions on operator precedence until later.&lt;br&gt;&lt;br&gt;* Remove the special rules for subscripting when following a reverse axis. If X delivers items A, B, C, then X!1 delivers A, regardless of the nature of the expression X.&lt;br&gt;&lt;br&gt;* The subscript operator would then be a simple binary operator: both operands would be evaluated in the same context. No special magic about N being a shorthand for position()=N. This removes the ability to use last() as a pseudo-subscript. Most languages seem to get by without such a feature, but I have to admit it is useful; I&#39;d suggest either (a) a convention that negative subscripts number from the end (so X!-1 selects the last item), or (b) a separate operator, say ¡, to number backwards (it&#39;s high time we broke free from the shackles of ASCII...). Then X¡1 selects the last item in the sequence.&lt;br&gt;&lt;br&gt;* Replace / with \ as the path operator, to avoid confusion with numeric division; and make it a pure mapping operator, with no implicit sorting into document order or deduplication. Remove all remaining restrictions on what can appear on the lh and rh sides. Use an explicit unary | operator for this purpose when required (so |EXP has the same meaning as ()|EXP, that is, take the nodes in EXP, deduplicate, and sort into document order).&lt;br&gt;&lt;br&gt;* Lose the leading &quot;/&quot; in path expressions, as well as the lone &quot;/&quot; to refer to the root node. Instead use root() at the start of the path to get the root node of the tree containing the context node.&lt;br&gt;&lt;br&gt;* Drop the abbreviation allowing E as a short-hand for child::E. Controversial, this one - the short-hand is very convenient. But it causes a lot of problems in making the grammar unambiguous and extensible. Replace it with a new abbreviation, on the same lines as &quot;@&quot; for the attribute axis: let&#39;s say ^. So a path expression might look like root()\^A\^B\@C. Not as pretty as what we are used to, but much more systematic, orthogonal, and extensible.&lt;br&gt;&lt;br&gt;* This then suggests ^^ as an abbreviation for the descendant axis, replacing the current highly-illogical // pseudo-operator with its wierd syntactic expansion.&lt;br&gt;&lt;br&gt;* Drop the implicit existential semantics for the &quot;=&quot; family of operators, giving them instead the same meaning that &quot;eq&quot; and friends have in XPath 2.0. Again, this removes a convenience in the interests of being more rigorous and orthogonal. It would be nice to offer something that&#39;s as general as the expression &quot;some $x in X satisfies $x = 3&quot; but less verbose; I would suggest prefixing any boolean operator or function name with &quot;~&quot; to indicate that it is to operate over sequences and behave existentially, so we have X ~= 3 to mean &quot;some X equals 3&quot;, and ~contains(X, (&#39;a&#39;, &#39;b&#39;)) to mean &quot;some X contains &#39;a&#39; or &#39;b&#39;&quot;. &lt;br&gt;&lt;br&gt;* Unify axes and functions. Conceptually, child::X applies the function child() to the context node and then filters the result with the predicate &quot;is an X&quot;. There is no reason why &quot;child&quot; (the axis) should not be any function, rather than forcing it to be one of 13 magic functions built in to the system. There is also no reason why X (the nodetest) should not be generalised. Assuming a syntax .¿T to test whether the context node satisfies the nodetest T,&amp;nbsp; X::T becomes a shorthand for X(.)?(.¿T), and this semantic definition paves the way to allowing X to be any single-argument function, and for generalizing nodetests to be any pattern. The overall effect is to make the semantics of XPath as a functional language much more explicit.&lt;br&gt;&lt;br&gt;* Unify node tests and types. Both are essentially ways of classifying nodes (or other items). XPath 2.0 already goes some way towards making them interchangeable through the concept of &quot;kind tests&quot;, but it could go further.&lt;br&gt;&lt;br&gt;&lt;/div&gt;What is all this trying to achieve? The bottom line, I guess, is&lt;br&gt;&lt;br&gt;(a) making the semantics of the language cleaner and more explicitly functional; &lt;br&gt;&lt;br&gt;(b) removing quirkiness and non-orthogonality even where these quirks provide ways ways of expressing commonly used constructs more concisely&lt;br&gt;&lt;br&gt;Of course, it&#39;s all an academic exercise. But perhaps it points the way to a better way of describing the current language by mapping the syntax onto a more regular core.&lt;br&gt;</description>
    
    <category domain="http://saxonica.blogharbor.com/blog">Main Page</category>
    
    
    
    
  </item>
  
  <item>
    <dc:creator>Michael Kay</dc:creator>
    <title>Patents: an Open Letter to my MP</title>
    <link>http://saxonica.blogharbor.com/blog/_archives/2009/12/14/4404215.html</link>
    <guid>http://saxonica.blogharbor.com/blog/_archives/2009/12/14/4404215.html</guid>
    <pubDate>Mon, 14 Dec 2009 09:06:51 +0000</pubDate>
    <description>The Chancellor&#39;s recent pre-budget statement announces a measure (curiously named the &quot;patent box&quot;) whose effect is to offer a reduced rate of corporation tax for profits deriving from patents. The stated intent is to encourage and reward innovation. As far as the software business is concerned, the effect is likely to be exactly the opposite...</description>
    
    <category domain="http://saxonica.blogharbor.com/blog">Main Page</category>
    
    
    
    
  </item>
  
  <item>
    <dc:creator>Michael Kay</dc:creator>
    <title>Streaming templates - towards a more coherent design</title>
    <link>http://saxonica.blogharbor.com/blog/_archives/2009/12/10/4401285.html</link>
    <guid>http://saxonica.blogharbor.com/blog/_archives/2009/12/10/4401285.html</guid>
    <pubDate>Thu, 10 Dec 2009 10:14:16 +0000</pubDate>
    <description>As previously reported, I&#39;ve been making good progress in the last few weeks in implementing streaming templates: that is, the ability to write an XSLT stylesheet using template rules, in which the templates are activated in response to events notified by a SAX-style push parser, without ever building a tree in memory. However, there&#39;s something a little unsatisfactory about the state of play. Every time I write a new test case, there&#39;s a 50% chance I have to change the code to make it work. At this stage of the game, that&#39;s too high...</description>
    
    <category domain="http://saxonica.blogharbor.com/blog">Main Page</category>
    
    
    
    
  </item>
  
  <item>
    <dc:creator>Michael Kay</dc:creator>
    <title>Intel joins the club</title>
    <link>http://saxonica.blogharbor.com/blog/_archives/2009/12/9/4400175.html</link>
    <guid>http://saxonica.blogharbor.com/blog/_archives/2009/12/9/4400175.html</guid>
    <pubDate>Wed, 09 Dec 2009 00:30:21 +0000</pubDate>
    <description>I see from Twitter that Intel have now announced availability of a beta release of their XSLT 2.0 processor. Excellent news!</description>
    
    <category domain="http://saxonica.blogharbor.com/blog">Main Page</category>
    
    
    
    
  </item>
  
  <item>
    <dc:creator>Michael Kay</dc:creator>
    <title>On the streamability of //section/head</title>
    <link>http://saxonica.blogharbor.com/blog/_archives/2009/12/1/4394031.html</link>
    <guid>http://saxonica.blogharbor.com/blog/_archives/2009/12/1/4394031.html</guid>
    <pubDate>Tue, 01 Dec 2009 00:35:39 +0000</pubDate>
    <description>At first sight, nothing seems more eminently streamable than the expression &amp;lt;xsl:value-of select=&quot;//section/head&quot;/&amp;gt;. The sections are in document order, the headings are in document order, the order of the output is the same as the order of the input, so where is the problem? The fact is, however, as I hinted in my previous post, there is a lot more to this expression than meets the eye.</description>
    
    <category domain="http://saxonica.blogharbor.com/blog">Main Page</category>
    
    
    
    
  </item>
  
  <item>
    <dc:creator>Michael Kay</dc:creator>
    <title>On the streamability of //section/head</title>
    <link>http://saxonica.blogharbor.com/blog/_archives/2009/12/1/4394030.html</link>
    <guid>http://saxonica.blogharbor.com/blog/_archives/2009/12/1/4394030.html</guid>
    <pubDate>Tue, 01 Dec 2009 00:34:43 +0000</pubDate>
    <description>At first sight, nothing seems more eminently streamable than the expression &lt;xsl:value-of select=&quot;//section/head&quot;/&gt;. The sections are in document order, the headings are in document order, the order of the output is the same as the order of the input, so where is the problem? The fact is, however, as I hinted in my previous post, there is a lot more to this expression than meets the eye.</description>
    
    <category domain="http://saxonica.blogharbor.com/blog">Main Page</category>
    
    
    
    
  </item>
  
  <item>
    <dc:creator>Michael Kay</dc:creator>
    <title>Moving forward with streaming templates</title>
    <link>http://saxonica.blogharbor.com/blog/_archives/2009/11/26/4390549.html</link>
    <guid>http://saxonica.blogharbor.com/blog/_archives/2009/11/26/4390549.html</guid>
    <pubDate>Thu, 26 Nov 2009 10:16:25 +0000</pubDate>
    <description>Saxon 9.2 introduced a limited capability to do &quot;push processing&quot; (using template rules) in streaming mode, that is, while parsing the source document, without ever building a tree representation of the source document in memory. This (added to the other streaming capabilities in Saxon) provides a useful capability for people who have to handle very large documents, but it only supports a small subset of the language, and achieving anything useful can sometimes require rather contorted programming. I&#39;ve been revisiting the facility with the aim of supporting a much larger subset of the XSLT language in streaming mode, hopefully more like 80% rather than 20%, and thus making the streaming facilities far more accessible to the average developer. More...</description>
    
    <category domain="http://saxonica.blogharbor.com/blog">Main Page</category>
    
    
    
    
  </item>
  
  <item>
    <dc:creator>Michael Kay</dc:creator>
    <title>A stylesheet conversion</title>
    <link>http://saxonica.blogharbor.com/blog/_archives/2009/8/20/4294785.html</link>
    <guid>http://saxonica.blogharbor.com/blog/_archives/2009/8/20/4294785.html</guid>
    <pubDate>Thu, 20 Aug 2009 22:13:55 +0100</pubDate>
    <description>I&#39;m doing what looks like a fairly simple project to upgrade an XSLT 1.0 stylesheet. It&#39;s XSLT 1.0 because it has to run in the browser, but fortunately it doesn&#39;t really need any 2.0 features. The old version of the stylesheet worked with XML files in format A, described by a DTD. The new version of the stylesheet has to produce the same HTML output, but this time from XML files in format B, described by an XSD schema...</description>
    
    <category domain="http://saxonica.blogharbor.com/blog">Main Page</category>
    
    
    
    
  </item>
  
  <item>
    <dc:creator>Michael Kay</dc:creator>
    <title>Beyond Saxon 9.2</title>
    <link>http://saxonica.blogharbor.com/blog/_archives/2009/8/14/4288490.html</link>
    <guid>http://saxonica.blogharbor.com/blog/_archives/2009/8/14/4288490.html</guid>
    <pubDate>Fri, 14 Aug 2009 20:16:10 +0100</pubDate>
    <description>So, Saxon 9.2 is finally out. I haven&#39;t had much chance to sit back and think yet - it&#39;s been a busy and stimulating week at the Balisage conference. Regulars will know that I don&#39;t do planning, either of dates or facilities: I prefer to keep the programme completely flexible, so that I can always find room to put new things in if the opportunity presents itself. But it&#39;s worth thinking a little bit about what&#39;s on the to-do list.</description>
    
    <category domain="http://saxonica.blogharbor.com/blog">Main Page</category>
    
    
    
    
  </item>
  
  <item>
    <dc:creator>Michael Kay</dc:creator>
    <title>Analyzing dependencies in a class library: a use case for XSLT streaming</title>
    <link>http://saxonica.blogharbor.com/blog/_archives/2009/6/26/4235816.html</link>
    <guid>http://saxonica.blogharbor.com/blog/_archives/2009/6/26/4235816.html</guid>
    <pubDate>Fri, 26 Jun 2009 18:01:10 +0100</pubDate>
    <description>IKVM (in version 0.40) now has much more complete coverage of the Java class library, but an unfortunate consequence of this is that the libraries have become rather large. To cut down the size of the library, we need to understand the dependencies between its parts. Read on to find how I&#39;m doing this using the streaming facilities of the new Saxon 9.2 release...</description>
    
    <category domain="http://saxonica.blogharbor.com/blog">Main Page</category>
    
    
    
    
  </item>
  
  <item>
    <dc:creator>Michael Kay</dc:creator>
    <title>Saxon repackaging</title>
    <link>http://saxonica.blogharbor.com/blog/_archives/2009/6/19/4227718.html</link>
    <guid>http://saxonica.blogharbor.com/blog/_archives/2009/6/19/4227718.html</guid>
    <pubDate>Fri, 19 Jun 2009 22:44:31 +0100</pubDate>
    <description>Saxon9.2-EE is out on beta release, so the release process has finally started. It was beginning to feel as if the light at the end of the tunnel was an optical illusion. I decided not to change the licensing. GPL is just too distasteful, and other licenses that prevent use with commercial products just seem too burdensome. So here&#39;s what I&#39;ve decided to do...</description>
    
    <category domain="http://saxonica.blogharbor.com/blog">Main Page</category>
    
    
    
    
  </item>
  
  <item>
    <dc:creator>Michael Kay</dc:creator>
    <title>The GNU Public License</title>
    <link>http://saxonica.blogharbor.com/blog/_archives/2009/6/7/4213973.html</link>
    <guid>http://saxonica.blogharbor.com/blog/_archives/2009/6/7/4213973.html</guid>
    <pubDate>Sun, 07 Jun 2009 17:47:55 +0100</pubDate>
    <description>I&#39;ve been thinking carefully about whether there&#39;s a way to reduce the number of people &quot;freeloading&quot; on the back of Saxon - that is, selling commercial software that relies heavily on Saxon functionality without contributing anything in return. One potential way of doing that is switch in some way to a dual licensing approach: make application vendors choose between paying for a commercial license, and using the product under the GPL (GNU public license), which prevents them from creating &quot;derivative works&quot; unless those works are also licensed under the GPL (that is, made available as open source products). This is essentially the way MySQL works, and it seems to work well for them...</description>
    
    <category domain="http://saxonica.blogharbor.com/blog">Main Page</category>
    
    
    
    
  </item>
  
  <item>
    <dc:creator>Michael Kay</dc:creator>
    <title>All nodes untyped</title>
    <link>http://saxonica.blogharbor.com/blog/_archives/2009/5/14/4185505.html</link>
    <guid>http://saxonica.blogharbor.com/blog/_archives/2009/5/14/4185505.html</guid>
    <pubDate>Thu, 14 May 2009 09:34:08 +0100</pubDate>
    <description>Users sometimes imagine that just by running an application under Saxon-SA instead of Saxon-B, they will automatically get a performance boost. Sadly, this isn&#39;t the case. Sometimes Saxon-SA&#39;s more powerful optimizer will give a dramatic benefit, sometimes it will give none at all. In fact, sometimes if you move a workload to Saxon-SA without change, you see a performance regression. This is caused by the fact that Saxon-B can assume all nodes are untyped, whereas Saxon-SA can&#39;t make this assumption....</description>
    
    <category domain="http://saxonica.blogharbor.com/blog">Main Page</category>
    
    
    
    
  </item>
  
  <item>
    <dc:creator>Michael Kay</dc:creator>
    <title>Repackaging Saxon</title>
    <link>http://saxonica.blogharbor.com/blog/_archives/2009/5/6/4176679.html</link>
    <guid>http://saxonica.blogharbor.com/blog/_archives/2009/5/6/4176679.html</guid>
    <pubDate>Wed, 06 May 2009 13:32:31 +0100</pubDate>
    <description>I&#39;ve been thinking for a while about some repackaging for Saxon, and I&#39;ve now got to the point where I&#39;m working out the fine detail and creating appropriate build files. The main idea is that instead of two products, Saxon-B and Saxon-SA, there will in future be three: Home Edition (HE), Professional Edition (PE), and Enterprise Edition (EE).</description>
    
    <category domain="http://saxonica.blogharbor.com/blog">Main Page</category>
    
    
    
    
  </item>
  
  <item>
    <dc:creator>Michael Kay</dc:creator>
    <title>Streaming templates</title>
    <link>http://saxonica.blogharbor.com/blog/_archives/2009/2/23/4102743.html</link>
    <guid>http://saxonica.blogharbor.com/blog/_archives/2009/2/23/4102743.html</guid>
    <pubDate>Mon, 23 Feb 2009 23:48:41 +0000</pubDate>
    <description>The streaming facilities in Saxon-SA have been proving very popular with those users who have very large documents to process. However, at present the only thing you can do in streaming mode is to split the document into a flat sequence of subtrees, and then process each subtree independently. That meets many needs, but not all. There are many simple tasks that can intrinsically be streamed despite the fact that they don&#39;t fit this model: for example, renaming all the elements in a document, or deleting all the NOTE elements. So I&#39;ve started implementing &quot;streaming templates&quot;, where the document is processed hierarchically in classic XSLT style by applying template rules to each node at every level....</description>
    
    <category domain="http://saxonica.blogharbor.com/blog">Main Page</category>
    
    
    
    
  </item>
  
  <item>
    <dc:creator>Michael Kay</dc:creator>
    <title>Some threading tests</title>
    <link>http://saxonica.blogharbor.com/blog/_archives/2009/2/11/4089230.html</link>
    <guid>http://saxonica.blogharbor.com/blog/_archives/2009/2/11/4089230.html</guid>
    <pubDate>Wed, 11 Feb 2009 18:20:43 +0000</pubDate>
    <description>Continuing the investigation of Tatu Saloranta&#39;s XSLTMark measurements, I was puzzled by the fact that on certiain tests, he was seeing a very different ratio between Saxon and XSLTC performance from the figures I was seeing.... I thought one theory worth testing was that XSLTC might be making better use of multiple processors than Saxon (which is essentially single-threaded).....</description>
    
    <category domain="http://saxonica.blogharbor.com/blog">Main Page</category>
    
    
    
    
  </item>
  
  <item>
    <dc:creator>Michael Kay</dc:creator>
    <title>Another five-finger performance exercise</title>
    <link>http://saxonica.blogharbor.com/blog/_archives/2009/2/7/4084363.html</link>
    <guid>http://saxonica.blogharbor.com/blog/_archives/2009/2/7/4084363.html</guid>
    <pubDate>Sat, 07 Feb 2009 13:16:52 +0000</pubDate>
    <description>Tatu Saloranta has started doing some work looking at the performance of XML parsers in an XSLT environment, and in that context he started running the old Datapower XSLTMark tests (no longer available, sadly) which are all 1.0 stylesheets, and the figures comparing Saxon and Xalan seemed to be worth investigating....</description>
    
    <category domain="http://saxonica.blogharbor.com/blog">Main Page</category>
    
    
    
    
  </item>
  
  <item>
    <dc:creator>Michael Kay</dc:creator>
    <title>Bugs that don&#39;t crawl out of the woodwork</title>
    <link>http://saxonica.blogharbor.com/blog/_archives/2008/12/22/4031927.html</link>
    <guid>http://saxonica.blogharbor.com/blog/_archives/2008/12/22/4031927.html</guid>
    <pubDate>Mon, 22 Dec 2008 15:30:05 +0000</pubDate>
    <description>I&#39;ve said this before, but sometimes I&#39;m truly amazed by some of the bugs I find: how on earth can they remain undetected for so long?</description>
    
    <category domain="http://saxonica.blogharbor.com/blog">Main Page</category>
    
    
    
    
  </item>
  
  <item>
    <dc:creator>Michael Kay</dc:creator>
    <title>Ten Reasons why Saxon XQuery is Fast</title>
    <link>http://saxonica.blogharbor.com/blog/_archives/2008/12/13/4019383.html</link>
    <guid>http://saxonica.blogharbor.com/blog/_archives/2008/12/13/4019383.html</guid>
    <pubDate>Sat, 13 Dec 2008 15:42:16 +0000</pubDate>
    <description>A paper under this title is published this month in a special issue of the IEEE Data Engineering Bulletin. Available online at:&lt;br&gt;&lt;br&gt;&lt;a href=&quot;http://sites.computer.org/debull/A08dec/saxonica.pdf&quot;&gt;http://sites.computer.org/debull/A08dec/saxonica.pdf&lt;/a&gt;&lt;br&gt;&lt;br&gt;Of course, nearly all of what it says applies equally to XSLT. But for some reason, the academic community continues to be much more interested in XQuery.&lt;br&gt;</description>
    
    <category domain="http://saxonica.blogharbor.com/blog">Main Page</category>
    
    
    
    
  </item>
  
  <item>
    <dc:creator>Michael Kay</dc:creator>
    <title>Schema-Awareness and XMark performance</title>
    <link>http://saxonica.blogharbor.com/blog/_archives/2008/11/30/4001792.html</link>
    <guid>http://saxonica.blogharbor.com/blog/_archives/2008/11/30/4001792.html</guid>
    <pubDate>Sun, 30 Nov 2008 21:27:41 +0000</pubDate>
    <description>When people ask what performance benefits they can expect from using schema-aware transformations and queries, I&#39;ve often replied in a way that avoids setting expectations too high. Some queries can benefit significantly, others actually slow down because the extra cost of validating the input is not recovered by improvements in query execution speed. I&#39;ve often stressed that the main benefit of schema-awareness is in the speed and ease of debugging and testing the query, not primarily in performance. But I&#39;ve been taking another look at it, and I think I can probably start to be a bit more up-beat...</description>
    
    <category domain="http://saxonica.blogharbor.com/blog">Main Page</category>
    
    
    
    
  </item>
  
  <item>
    <dc:creator>Michael Kay</dc:creator>
    <title>TEI Conference</title>
    <link>http://saxonica.blogharbor.com/blog/_archives/2008/11/11/3972719.html</link>
    <guid>http://saxonica.blogharbor.com/blog/_archives/2008/11/11/3972719.html</guid>
    <pubDate>Tue, 11 Nov 2008 15:14:25 +0000</pubDate>
    <description>I spent a couple of days last week at the annual members meeting of the TEI (Text Encoding Initiative). It was good to meet so many Saxon users: a few familiar faces, rather more familiar names, and quite a few who introduced themselves as avid fans of the product, the book, or both...</description>
    
    <category domain="http://saxonica.blogharbor.com/blog">Main Page</category>
    
    
    
    
  </item>
  
  <item>
    <dc:creator>Michael Kay</dc:creator>
    <title>XML Schema: allowing new lexical forms</title>
    <link>http://saxonica.blogharbor.com/blog/_archives/2008/10/23/3943494.html</link>
    <guid>http://saxonica.blogharbor.com/blog/_archives/2008/10/23/3943494.html</guid>
    <pubDate>Thu, 23 Oct 2008 12:39:49 +0100</pubDate>
    <description>In a suggestion to the XML Schema Working Group, Axel Dahmen suggested defining a facet that allows the introduction of new lexical forms for existing data types. This would allow you, for example, to write a decimal value as &quot;1,23&quot;, or a date as &quot;12DEC2008&quot;, or a boolean as &quot;yes&quot;. The Schema WG has already put out two &quot;last call&quot; drafts of the XSD 1.1 spec, so it&#39;s not really welcoming ideas for new features at the moment, but I thought I would take a look at this one independently...</description>
    
    <category domain="http://saxonica.blogharbor.com/blog">Main Page</category>
    
    
    
    
  </item>
  
  <item>
    <dc:creator>Michael Kay</dc:creator>
    <title>Software Patents</title>
    <link>http://saxonica.blogharbor.com/blog/_archives/2008/10/2/3911326.html</link>
    <guid>http://saxonica.blogharbor.com/blog/_archives/2008/10/2/3911326.html</guid>
    <pubDate>Thu, 02 Oct 2008 12:20:25 +0100</pubDate>
    <description>I have absolutely no doubt that software patents are unquestionably a bad thing. It simply isn&#39;t in the public interest to grant them. The only possible reason for doing so is to reward and encourage innovation, and there is no evidence that they have that effect, and plenty of evidence that they do exactly the opposite...</description>
    
    <category domain="http://saxonica.blogharbor.com/blog">Main Page</category>
    
    
    
    
  </item>
  
  <item>
    <dc:creator>Michael Kay</dc:creator>
    <title>Just-in-Time Optimization</title>
    <link>http://saxonica.blogharbor.com/blog/_archives/2008/9/20/3893491.html</link>
    <guid>http://saxonica.blogharbor.com/blog/_archives/2008/9/20/3893491.html</guid>
    <pubDate>Sat, 20 Sep 2008 21:44:53 +0100</pubDate>
    <description>Rereading my last post on compile-time performance, the thought leaps out at me: if I don&#39;t want to rely on users setting a switch to control whether optimization happens or not, then I should be doing lazy (or just-in-time) optimization...</description>
    
    <category domain="http://saxonica.blogharbor.com/blog">Main Page</category>
    
    
    
    
  </item>
  
  <item>
    <dc:creator>Michael Kay</dc:creator>
    <title>More thoughts on compile-time performance</title>
    <link>http://saxonica.blogharbor.com/blog/_archives/2008/9/18/3890530.html</link>
    <guid>http://saxonica.blogharbor.com/blog/_archives/2008/9/18/3890530.html</guid>
    <pubDate>Thu, 18 Sep 2008 19:11:59 +0100</pubDate>
    <description>Another user has sent me some samples of generated code (XSLT this time) with a request to take a look at compile-time performance issues.  It takes about 3900ms to compile (5000ms under Saxon-SA), and the new switch to suppress optimization reduces this to about 1180ms. So optimization is the culprit here. 

So the switch will be useful here. But I hate performance improvements that depend on the user setting a switch, because 99% of users will never discover the switch is there. So, are there opportunities to improve the compile-time performance of the optimizer? You bet there are....</description>
    
    <category domain="http://saxonica.blogharbor.com/blog">Main Page</category>
    
    
    
    
  </item>
  
  <item>
    <dc:creator>Michael Kay</dc:creator>
    <title>Compile-time Performance</title>
    <link>http://saxonica.blogharbor.com/blog/_archives/2008/9/16/3887125.html</link>
    <guid>http://saxonica.blogharbor.com/blog/_archives/2008/9/16/3887125.html</guid>
    <pubDate>Tue, 16 Sep 2008 12:48:43 +0100</pubDate>
    <description>There was a support request a couple of days ago asking for a switch to turn optimization off. The user had a large query that was taking a long time to compile, and naturally felt that the cost could be reduced by omitting the optimization phase. But as always with performance, assumptions are dangerous until they have been verified by measurement...</description>
    
    <category domain="http://saxonica.blogharbor.com/blog">Main Page</category>
    
    
    
    
  </item>
  
  <item>
    <dc:creator>Michael Kay</dc:creator>
    <title>Tweaking the TinyTree</title>
    <link>http://saxonica.blogharbor.com/blog/_archives/2008/9/10/3877536.html</link>
    <guid>http://saxonica.blogharbor.com/blog/_archives/2008/9/10/3877536.html</guid>
    <pubDate>Wed, 10 Sep 2008 08:26:44 +0100</pubDate>
    <description>It&#39;s unusual to find anything these days that gives Saxon a 5% performance boost across the board, but I seem to have achieved that with some tweaks to the TinyTree data structure...</description>
    
    <category domain="http://saxonica.blogharbor.com/blog">Main Page</category>
    
    
    
    
  </item>
  
  <item>
    <dc:creator>Michael Kay</dc:creator>
    <title>There&#39;s an R in the month</title>
    <link>http://saxonica.blogharbor.com/blog/_archives/2008/9/1/3863838.html</link>
    <guid>http://saxonica.blogharbor.com/blog/_archives/2008/9/1/3863838.html</guid>
    <pubDate>Mon, 01 Sep 2008 10:57:15 +0100</pubDate>
    <description>September is upon us; summer is over; so it&#39;s time to start thinking about a new season. This blog has become rather dormant. Well, I&#39;m going to try and reform, and post more regularly, and if I haven&#39;t got much to say then I&#39;ll just witter on about what I&#39;ve been up to recently...</description>
    
    <category domain="http://saxonica.blogharbor.com/blog">Main Page</category>
    
    
    
    
  </item>
  
</channel>
</rss>
