2014-05-23 19:01 GMT+02:00 Paul Gearon <gea...@gmail.com>: > > I still argue for using keywords here. The existing API uses them, and > they're a natural fit. >
The fact that they have established meaning (for denoting literal xml names + their prefix in a given serialization) in the API is exactly one of my reasons for not wanting to change those semantics. Having a separate tier for representing raw, serialized xml is fine. It's what the library currently does. Adding new behavior, like proper xml namespacing, warrants adding a new tier. > The one real problem is elements would need a special hash/equality check > to deal with namespace contexts (presuming that fn:deep-equal maps to > Object.equals). > I had been thinking along those lines before. Check out the dev thread, I try to argue that at first there, but at some point I realized that it makes no sense to strictly stick to the raw representation and compute other info just on the fly. The key observation is, that a tree of raw, prefixed xml doesn't make any sense without xmlns declarations, whereas they are redundant, as soon as the tree has been resolved. To your point from below: > I didn't follow the discussion for putting URIs into keywords, as I could > not see why we would want this (am I missing something obvious?) > We need the URIs for xml processing and the XmlNamespace metadata can get lost or not be there in the first place. Also the URI counts for equality, see below. I totally agree that it makes no sense putting them in keywords. > The keywords would need to be translated according to the current > context. However, that approach still works for fragments that can be > evaluated in different contexts, > The problem are fragments that are taken out from their (xmlns - declaring) root-element and/or that have no XmlNamespace metadata. Apart from actual prefix assignment (which can be done in the emitter), QNames are completely context free in that regard. See the key observation above. > while storing URIs directly needs everything to be rebuilt for another > context. > Are you talking about prefix assignments? See my comment about diffing metadata below. I also detailed on this point in the design page. Most possible QNames can be directly expressed as a keyword (for instance, > the QName 㑦:㒪 can be represented as the read-able keyword :㑦/㒪). The > keyword function is just a workaround for exotic values. While I know they > can exist (e.g. containing \u000A), I've yet to see any QNames in the wild > that cannot be represented as a read-able keyword. > Seen xhtml? What about the QName {http://www.w3.org/1999/xhtml}body? Notice that :http://www.w3.org/1999/xhtml/body would be read like (keyword "http:" "/www.w3.org/1999/xhtml/body"). Another point that's already been made on the dev thread. > In case I'm not clear, say I have these two docs: > > <a:foo xmlns="http://ex.com/" xmlns:a="http://a.com/"> > <a:bar xmlns:b="http://b.org" b:baz="blah"/> > </a:foo> > > <a:foo xmlns:a="http://something.else.com/"> > <a:bar xmlns:b="http://b.org" b:baz="blah"/> > </a:foo> > > If I compare the a:bar element form both documents with func-deep-equal > then they should compare equal, despite the fact that the a:bar qname > resolves differently in each context. > Are you saying that deep-equals compares the actual serialization (with prefixes), or that the default equality should do that? If so, please read the infoset specification: http://www.w3.org/TR/xml-infoset/#infoitem.element The relevant quote for this case: *[prefix]* The namespace prefix part of the element-type name. If the name > is unprefixed, this property has no value. Note that namespace-aware > applications should use the namespace name rather than the prefix to > identify elements. > I still don't see why the reverse mapping is needed. Is it because I'm > storing the QName in a keyword and can look up the current namespace for > the URI, while you are storing the fully qualified name? > First, terminology: In xml the namespace _is_ the uri. The thing that you write before the : in the serialization is a prefix. It is only an artifact of serialization, completely meaningless except when you actually read or write xml. So I want the user to be to "write" xml without javax.xml, just by transforming the tree back to its context-dependent keyworded prefix-representation. So we need a way to find the (a) current prefix for a namespace. Sorry, I'm not following what you were getting at with this. In this > example D and E both get mapped to the same namespace, meaning that > <D:foo/> and <E:foo/> can be made to resolve the same way. But in a > different context they could be different values. > Which is the reason we need to lift elements out of their context as soon as possible. We don't want an element to change its namespace, just because we transplant it into another xml fragment. Chouser went to great length about this point, before he realized that this was exactly my goal aswell. If both the explicit declarations of namespaces on elements and current > context are stored with the element (one in the :namespaces field, the > other in metadata), then this allows resolution to be handled correctly, > while also maintaining where each namespace needs to be emitted. > My plan is to only store the metadata. The set of namespaces is implicitly given by QNames contained within the fragment and early introduction of nessecary xmlns declarations can be achived by diffing the metadata. See my design document. Note: I'm talking about the new representation here. The current one will continue to work unchanged. I guess I was uncomfortable with XmlNamespaceImpl because of the fancy > structures with mutation. I was attracted to using a stack, since that's > what's going on in the parser/emitter. > Don't be fooled by the transients. XmlNamespaceImpl is an immutable, persistent data structure. If you have time (or inclination) for comparison, you can look at mine at > https://github.com/quoll/data.xml on the new_namespaces branch. I haven't > yet written the code for equality (fn:deep-equal), nor for resolving the > URIs for QNames, but it's parsing and emitting, and I think it's correct. > Unfortunately, it's all still in one file, as per the master branch. > I've taken a short look, but stopped reading when I realized, that you keep the thing in a dynamic var, in an atom that you mutate from the emitter or parser. It might not say anything about the data structure itself, but it has "wrong approach" written all over it. Also I'd prefer if we could focus the discussion on the proposed specification for now. As soon as we agree there, we can start bikeshedding the data structures. I hope to implement the emitter, aswell as the tree walkers, soon. Then we may finally have a non-hypothetical design to talk about. kind regards -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.