I've now got the basics of XQuery update working, with no major difficulties. Things still to be tackled after Christmas:
(1) the (so-called) transform expression (which in its latest incarnation is actually a modify-copy-return expression). I don't expect any major difficulties here.
(2) building nodes that are constructed during the query (using element constructors, etc) to use a data model implementation other than the TinyTree (because the TinyTree isn't going to be made updateable). Of course this will only happen if the query is using updates, and ideally it should only happen if the tree in question is being subjected to updates, which in principle is something I could determine using the path-analysis code that's used to underpin document projection. (But what is actually the effect of updating such a node, for example "insert <a/> into <b/>"? Yes, it creates the tree <b></a></b>, but is the updated result available in any way to the calling application? Perhaps I should optimize this out as a no-op?)
(3) now that the linked tree model is updatable, some changes are needed to ensure that it handles node identity correctly. I haven't worked out all the details on this yet. In theory, if an attribute is replaced by another attribute, the new attribute must have a different identity from the old; at the moment in the linked tree the identity of an attribute is determined by the combination of the identity of the containing element and the name of the attribute, so that will need to change.
(4) Revalidation: that is, checking after a set of updates that the tree is still valid, allocating type annotations, and expanding any defaulted elements or attributes. The difficulty with this is that unlike the validate{} expression in XQuery, it has to be done in-situ, that is, without changing node identities. The Saxon schema validator is designed to work as a push (SAX-like) pipeline, and it's not obvious how to change it to work on a tree in-situ. It probably requires some kind of mechanism for retaining node identities as they pass through the pipeline, so that the receiver at the end of the pipe can apply updates to the original nodes.
(5) API and command line design. I still have to work out the best way of handling rewrite of updated documents back to disk. From the command line the default should probably be to update the source document in-situ while saving the original with a .bak extension. Clearly there are cases where the source can't be updated, for example if it's read via HTTP - what should happen then? Answers on a postcard please...
(6) Nearly forgot a few more loose ends:
(6a) I haven't yet implemented the rules for testing the compatibility of updates on the pending update list
(6b) In turn that might create some issues concerning the possibility of transient inconsistencies in the tree. For example, handling the case where there's a delete on an existing attribute and an insert of a new attribute with the same name.
(6c) There are probably more complexities with namespace consistency than I'm yet handling. In fact, I suspect there are more complexities in this area that the spec is yet handling.
|
|
||||||||
|
Login
|
Progress on XQuery Update
Comments
Re: Progress on XQuery Update
2) FWIU, the whole point about XQuery Update (which makes it different from the proposed XQueryP) is that none of the updates actually do anything until either upd:applyUpdates is done (in which case all trees are rebuilt and revalidated) or the topmost expression in the query completes. XQuery update is essentially a monadic extension to XQuery: what is manipulated are ordinary immutable objects which are interpreted *outside the XQuery engine* as commands to make changes.
Transform expressions also apply updates, but they apply them to produce an entirely novel tree. This is like the Command pattern, but the commands are applied only after the engine completes. So you don't need a mutable tree, you need a novel data structure that represents mutations. 5) The obvious answer is: if the document was retrieved by HTTP GET, it should be updated by HTTP PUT, as that is the semantics of PUT. If the PUT doesn't work, then you have to decide what to do: throw an error or write a local file or what not. Re: Re: Progress on XQuery Update
Thanks for the feedback John. Re (2) you seem to be suggesting that the PUL should actually be exposed to the calling application as part of the API. I wasn't actually intending to do that, I was thinking in terms of the lowest-level method call being "run-and-apply-updates" that (conceptually) runs the query to obtain a PUL and then applies the updates. But either way, it's still not clear what happens to "insert <a/> into <b/>". After all, the upd:applyUpdates() primitive doesn't actually return the nodes that were updated. My current thinking is that my run-and-apply-updates should return a set containing the root nodes of all trees that have been updated, including temporary trees. That leaves the user the problem of what to do with them.
Re (5) supporting HTTP PUT seems reasonable in principle but it probably needs quite a bit of API complexity (collecting credentials etc) to make it work in practice, wouldn't you think? Might be better to let the user application take care of this. Re: Progress on XQuery Update
In terms of what to do with writing the output back over the input, I think it would help if you can support version control semantics where possible, e.g. WebDAV for HTTP, or perhaps support for common version control systems for filesystems. For databases and content management systems, presumably they have their own versioning mechanisms. You do need to be able to do a genuine non-versioned overwrite for some production systems that have had sufficient testing, but otherwise I think versioning (beyond just a single ".bak" file) is the way to go. I still remember how handy it was that VMS did file versioning by default when you saved a new copy of a file.
Cheers, Tony. Re: Re: Progress on XQuery Update
Yes - sounds like an area for plug-ins along the lines of the OutputURIResolver currently used for xsl:result-document.
I miss versioned filestore too, it was the standard for all ordinary text files on ICL VME. Sometimes operating systems seem to be moving backwards. |
Search
Recent Comments
Month Archive
|
||||||