An interesting little spat on the XQuery internal mailing list over the last couple of days on the question of whether or not XQuery Update guarantees to preserve node identity. The spec says that it does. Someone remarked during a telcon that there were difficulties meeting this requirement in a SQL/XML environment. After the meeting I sent an email claiming that the assertion in the spec that updates preserve node identity is untestable and therefore ought to be removed.
It's an interesting point. Clearly a lot of the point of XQuery update is that it is supposed to update documents in situ, and this makes it different from approaches like XSLT that create a modified copy. But the question is, how can an outside observer tell the difference? Clearly you can't detect the difference within the language itself, because the way snapshot semantics works means you can never compare a node before updating with the same node (or even a different node) after updating. So you have to fall back on doing something in the next layer up, in the client application, to test whether the node identities are the same before and after. In order to do that, we need to know something about the client application.
Some clients might submit lexical XML (from filestore, or over the wire) to an XQuery update engine, and get lexical XML back. Clearly such clients can't tell whether node identity was retained or not - it's lost in transit between the client and the XQuery engine.
What about a client that submits a DOM to be updated? Well, DOM nodes aren't quite the same thing as XDM nodes, for example a DOM can have adjacent text nodes. It's quite likely that the identity of those nodes will be lost by the time they are converted to XDM and back. Did the XQuery update engine lose the node identity or was it the conversion from DOM to XDM that lost it? You can't tell: hence my belief that the claim in the spec that XQuery Update retains node identity is untestable.
Closer to home, what about the Saxon implementation? Saxon's implementation of XDM interface is essentially the NodeInfo interface, which in turn has a number of implementations including the tiny tree, the linked tree, and so on. An interesting feature of the NodeInfo interface (and I think this is also true of DOM, though it's not spelled out) is that object identity does not imply node identity: the same node can be represented by different Java objects at different times (or even at once), and you need to use the isSameNodeInfo() method (rather than ==) to determine if two variables refer to the same node.
For example, it's possible that when you iterate over the attributes of an element, the Java NodeInfo objects that represent the attributes will be "flyweight" objects created transiently and discarded as soon as you have finished with them. This means that if you do three iterations over the same attributes in parallel, you can have three Java objects representing the same node. What happens to these Java objects when you run XQuery Update to do an in-situ update of the tree? Suppose the update renames an attribute. If you ask one of these objects for the name of the attribute node that it represents, will you get the new name or the old?
Under my current design, the answer is undefined: you might get either, depending on the implementation. In some ways that's rather unsatisfactory, but anything else would involve a major rethink of the design philosophy of the NodeInfo interface. The updating API will return you a NodeInfo representing the new updated tree, and that's what you are expected to use: any references to nodes obtained before the update are unsafe and their content is undefined.
And to get back to the subject of this article, I think that applies to node identity too. If you held on to a variable that refers to an attribute node before the update, and then ask whether it's the same node (isSameNodeInfo()) as one in the updated tree, the answer is undefined.
In practice, with the linked tree, the identity of an attribute node is currently based on the identity of the parent element plus the index number of the attribute. That means that if you delete an attribute, all the identities change. I have to decide (a) whether this is conformant with the spec, and (b) whether it creates any usability problems for users. On (a), I think I've convinced myself that there is no conformance issue: although the spec says that identity is preserved, I think this is unenforceable.
|
|
||||||||
XQuery Update and Node Identity
Comments
Re: XQuery Update and Node Identity
XQilla takes a different approach - the user can really only see the results of an update if they are written back to the filesystem (when the document is loaded using doc()), or when the document is supplied as the context item or by an external variable - in which case they hold a reference to it both during and after the update query.
Re: XQuery Update and Node Identity
by
Emmanuel
on Thu 28 Feb 2008 09:13 GMT | Profile | Permanent Link
Isn't the concept of identity to loosely defined in the XQuery spec?
Cheers, Emmanuel Re: Re: XQuery Update and Node Identity
Defining identity well is something that causes philosophers to tear their hair out. Pragmatically, I think it's easy enough to say that a node acquires an identity when it's created and retains it for the duration of a query, and even that it retains identity through updates; the problem comes when the node leaves the comfortable closed world described by the query spec and acquires its own life in the outside world. The XQuery spec doesn't know anything about that world, so it can't say anything useful about it. But the only way of testing whether XQuery Update has preserved node identity is to conduct the tests in that outside world, using interfaces over which the XQuery spec has no control.
The WG, incidentally, decided to keep the clause that says updates preserve identity. What isn't clear to me now is whether that is supposed to impose a requirement on the environment/API to provide a way of determining that identity has been preserved. In some environments, for example client/server models, that's pretty tough. Re: XQuery Update and Node Identity
by
Jens Teubner
on Mon 03 Mar 2008 16:38 GMT | Profile | Permanent Link
Very interesting to read this. I've had the exact same discussion a few months back with a collaborator and brought forward the exact same arguments that you posted here.
The answer I got was "Yes, you are right. But things become different when you think beyond the XQuery Update Facility." Some implementations already provide 'scripting extensions', which make node identity after updates observable. I guess the XQuery Update Facility just wants to be ready for whenever such extensions find their way into W3C recommendations. Re: Re: XQuery Update and Node Identity
>I guess the XQuery Update Facility just wants to be ready for whenever such extensions find their way into W3C recommendations.
Yes, that's the argument the WG has used for retaining this clause. However, it's still misphrased: the test in a scripting environment will not be that nodes retain their identity (which is still meaningless and untestable), but that variables in the scripting language always refer to the node in its latest (updated) state. |
Search
Recent Comments
Recent Articles
Month Archive
|
|||||||