I'm quite pleased with the way the new API offered by Saxon on .NET panned out. Dimitre Novatchev said he liked it too, which is high praise in my book. I've been concerned for a while that Saxon's API was looking pretty scruffy, and this was an opportunity to do a better job.
My first thoughts were to try and emulate the System.Xml.Xsl interface. But I found that this wasn't an interface at all, it was a set of concrete classes. So it didn't look feasible to make Saxon "plug-compatible", allowing it to be swapped-in as a replacement for the System.Xml.Xsl processor in existing applications. That's a shame, but it was also an opportunity, because the System.Xml.Xsl interface isn't exactly pretty - in particular, I hate its heavy overloading of methods like Transform(), which always suggests to me that there's an abstraction somewhere waiting to be discovered.
My next thought was to stick close to the JAXP model, simply changing the stylistic conventions to match the trivial differences of the .NET style, like method names starting with a capital letter. This approach would make life easy for people trying to write dual-platform applications on top of Saxon. But actually, that's not my target audience for this API: those people will probably be writing in Java and can continue to use Saxon's Java APIs. Anyway JAXP, over the years, has become rather a mess. It still doesn't support XPath 2.0 and XSLT 2.0, and the more I looked into it, the more I felt I could do a better job by starting from scratch.
I also wanted to take a consistent approach across XSLT, XQuery, and XPath. When I looked at existing APIs, I found that XQuery and XPath generally offered a two-stage processing model, with one object to hold compile-time context information, the other to hold the compiled object; while the JAXP interface for XSLT also has two stages, one object (misnamed "Templates") holding the compiled stylesheet and the other ("Transformer") holding the run-time context. Looking at this, I realised all these interfaces would be cleaner if I offered a three-step model: compile, load, go.
So the first object is the compiler for the language. At first I called it the static context, but then I realized that "compiler" was a much better name, because users of APIs find it hard to get to grips with amorphous concepts like a "static context": it's sometimes better to name an object according to what it does, not what information it holds. The second object is the compiled code; and the third object is the "loaded and runnable" code (I had difficulty naming this, and chose different names for the three languages) - this is where the dynamic context for evaluation is established.
Having made this split, all sorts of things fall into place. For example, there's a long-standing problem with JAXP that if you run multiple XSLT compilations in parallel from the same TransformerFactory, they all have to share the same ErrorListener. With an XsltCompiler object to hold this information, the problem goes away. Similarly on the XPath side, having an explicit object to hold the "loaded and runnable" expression (I decided to call it an XPathSelector) provides a natural place to set variables, define the context node, etc.
This left the problem of defining the Source and Result objects (in JAXP terms). The JAXP solution here has been much criticized because it seems to be cheating: it uses an interface as a way of avoiding method overloading, but the interface doesn't actually describe an abstract service, it's really just a kind of union type. I'm not sure I found the perfect answer here. To replace the JAXP Result object, I defined a Destination interface, which is actually a true interface in that you can write your own impementation and Saxon will accept it. The only slight oddity is that to write an implementation, you need to go beyond the classes offered in the API and use some internal Saxon machinery. For the Source, I decided not to implement an equivalent abstraction, but rather to use method overloading: except that it would be a single method SetSource() that was overloaded, to prevent the multi-dimensional overloading and the resulting explosion of method signatures that occurs in System.Xml.Xsl.
So, what next? I'm quite tempted to retrofit the same API design to the Java product, except that the Java product already suffers from a proliferation of APIs and adding another will in some ways only make matters worse. There's also the problem of maintaining tests and documentation for all these APIs. XQJ, the standard API for XQuery in Java, might emerge soon from its long hibernation in the JCP process, and of course I will have to implement that, though I hope that what finally emerges is an improvement on the 2004 drafts, which were fairly horrible. Perhaps I should wait and see what happens with JAXP 2.0.
Another general observation: most of the time, the API I have offered in Saxon has grown out of the implementation. The current Java XQuery interface is an example of that. This almost invariably leads to bad API design. Good APIs are designed from a user perspective, by someone thinking about the design of the user-written application; they are not produced by taking the implementation and deciding which of its classes and methods to expose. Saxon is not unique in showing signs of this problem. When I sat down to design the .NET API, I was determined to ignore constraints imposed by the implementation (when I found such constraints, I removed them), and I think the results speak for themselves.
Michael Kay
|
|
||||||||
|
Login
|
APIs for XML processing
Comments
Re: APIs for XML processing
by
Sergey D
on Thu 02 Mar 2006 20:18 GMT | Profile | Permanent Link
One way to express abstraction you mentioned above is in IXmlTransform interface: http://mvp-xml.sourceforge.net/api/2.0/T_Mvp_Xml_Common_Xsl_IXmlTransform.html
Re: Re: APIs for XML processing
by
Michael Kay
on Fri 03 Mar 2006 10:42 GMT | Profile | Permanent Link
Thanks for the reference - though I had to dig out an old copy of IE to read it, it doesn't seem to display in Firefox. This seems to be very close to the JAXP design, with the classes XmlInput and XmlOutput corresponding to JAXPs Source and Result. I'm not sure it solves the underlying problem of extensibility: again it seems to be just a union type. The bigger problem with the MVP interface is that there is no "Transformer" object: the IXmlTransform is a compiled stylesheet, not a loaded stylesheet, so there is nowhere to initialize aspects of the dynamic context or initialization parameters such as initial mode and initial named template, other than in the method call to Transform() itself, which then becomes unwieldy as you add more parameters. One way round that, of course, would be to replace the "parameters" parameter to Transform() with a "dynamicContext" parameter.
Re: APIs for XML processing
by
M. David Peterson
on Fri 03 Mar 2006 06:42 GMT | Profile | Permanent Link
>> "and I think the results speak for themselves."
I couldn't agree more! I feel lucky to have been able to watch the above unfold, and the .NET community as a whole is an EXTREMELY lucky recipient of such a fantastic design. Re: APIs for XML processing
by
Paul Owen
on Thu 01 Jun 2006 11:59 BST | Permanent Link
>> I'm quite pleased with the way the new API offered by Saxon on .NET panned out.
That class documentation is a great reference - I've been looking for something like this for a while. Personally, I learn best from looking at examples, but so far I've only found a couple of very simple transform examples in .NET, such as the following (which I think is now out of date): http://www.biglist.com/lists/xsl-list/archives/200503/msg01235.html Does anyone know of a more comprehensive list of examples for each method/class/interface? Re: Re: APIs for XML processing
try this site: http://www.saxonica.com/documentation/index/intro.html
|
Search
Recent Comments
Month Archive
|
||||||