Re: [RFC] Roundtripping namespaced xml documents for data.xml

Herwig Hochleitner Sun, 25 May 2014 05:20:08 -0700

2014-05-23 19:01 GMT+02:00 Paul Gearon <gea...@gmail.com>:

>
> I still argue for using keywords here. The existing API uses them, and
> they're a natural fit.
>


The fact that they have established meaning (for denoting literal xml names
+ their prefix in a given serialization) in the API is exactly one of my
reasons for not wanting to change those semantics. Having a separate tier
for representing raw, serialized xml is fine. It's what the library
currently does. Adding new behavior, like proper xml namespacing, warrants
adding a new tier.


> The one real problem is elements would need a special hash/equality check
> to deal with namespace contexts (presuming that fn:deep-equal maps to
> Object.equals).
>

I had been thinking along those lines before. Check out the dev thread, I
try to argue that at first there, but at some point I realized that it
makes no sense to strictly stick to the raw representation and compute
other info just on the fly. The key observation is, that a tree of raw,
prefixed xml doesn't make any sense without xmlns declarations, whereas
they are redundant, as soon as the tree has been resolved.

To your point from below:


> I didn't follow the discussion for putting URIs into keywords, as I could
> not see why we would want this (am I missing something obvious?)
>

We need the URIs for xml processing and the XmlNamespace metadata can get
lost or not be there in the first place. Also the URI counts for equality,
see below.
I totally agree that it makes no sense putting them in keywords.


>  The keywords would need to be translated according to the current
> context. However, that approach still works for fragments that can be
> evaluated in different contexts,
>

The problem are fragments that are taken out from their (xmlns - declaring)
root-element and/or that have no XmlNamespace metadata. Apart from actual
prefix assignment (which can be done in the emitter), QNames are completely
context free in that regard. See the key observation above.


> while storing URIs directly needs everything to be rebuilt for another
> context.
>

Are you talking about prefix assignments? See my comment about diffing
metadata below. I also detailed on this point in the design page.

Most possible QNames can be directly expressed as a keyword (for instance,
> the QName 㑦:㒪 can be represented as the read-able keyword :㑦/㒪). The
> keyword function is just a workaround for exotic values. While I know they
> can exist (e.g. containing \u000A), I've yet to see any QNames in the wild
> that cannot be represented as a read-able keyword.
>

Seen xhtml? What about the QName {http://www.w3.org/1999/xhtml}body? Notice
that :http://www.w3.org/1999/xhtml/body would be read like (keyword "http:"
"/www.w3.org/1999/xhtml/body"). Another point that's already been made on
the dev thread.


> In case I'm not clear, say I have these two docs:
>
> <a:foo xmlns="http://ex.com/"; xmlns:a="http://a.com/";>
>   <a:bar xmlns:b="http://b.org"; b:baz="blah"/>
> </a:foo>
>
> <a:foo xmlns:a="http://something.else.com/";>
>   <a:bar xmlns:b="http://b.org"; b:baz="blah"/>
> </a:foo>
>
> If I compare the a:bar element form both documents with func-deep-equal
> then they should compare equal, despite the fact that the a:bar qname
> resolves differently in each context.
>

Are you saying that deep-equals compares the actual serialization (with
prefixes), or that the default equality should do that?
If so, please read the infoset specification:
http://www.w3.org/TR/xml-infoset/#infoitem.element

The relevant quote for this case:

*[prefix]* The namespace prefix part of the element-type name. If the name
> is unprefixed, this property has no value. Note that namespace-aware
> applications should use the namespace name rather than the prefix to
> identify elements.



> I still don't see why the reverse mapping is needed. Is it because I'm
> storing the QName in a keyword and can look up the current namespace for
> the URI, while you are storing the fully qualified name?
>

First, terminology: In xml the namespace _is_ the uri. The thing that you
write before the : in the serialization is a prefix. It is only an artifact
of serialization, completely meaningless except when you actually read or
write xml. So I want the user to be to "write" xml without javax.xml, just
by transforming the tree back to its context-dependent keyworded
prefix-representation. So we need a way to find the (a) current prefix for
a namespace.

Sorry, I'm not following what you were getting at with this. In this
> example D and E both get mapped to the same namespace, meaning that
> <D:foo/> and <E:foo/> can be made to resolve the same way. But in a
> different context they could be different values.
>

Which is the reason we need to lift elements out of their context as soon
as possible. We don't want an element to change its namespace, just because
we transplant it into another xml fragment. Chouser went to great length
about this point, before he realized that this was exactly my goal aswell.

If both the explicit declarations of namespaces on elements and current
> context are stored with the element (one in the :namespaces field, the
> other in metadata), then this allows resolution to be handled correctly,
> while also maintaining where each namespace needs to be emitted.
>

My plan is to only store the metadata. The set of namespaces is implicitly
given by QNames contained within the fragment and early introduction of
nessecary xmlns declarations can be achived by diffing the metadata. See my
design document.

Note: I'm talking about the new representation here. The current one will
continue to work unchanged.

I guess I was uncomfortable with XmlNamespaceImpl because of the fancy
> structures with mutation. I was attracted to using a stack, since that's
> what's going on in the parser/emitter.
>

Don't be fooled by the transients. XmlNamespaceImpl is an immutable,
persistent data structure.

If you have time (or inclination) for comparison, you can look at mine at
> https://github.com/quoll/data.xml on the new_namespaces branch. I haven't
> yet written the code for equality (fn:deep-equal), nor for resolving the
> URIs for QNames, but it's parsing and emitting, and I think it's correct.
> Unfortunately, it's all still in one file, as per the master branch.
>

I've taken a short look, but stopped reading when I realized, that you keep
the thing in a dynamic var, in an atom that you mutate from the emitter or
parser. It might not say anything about the data structure itself, but it
has "wrong approach" written all over it.

Also I'd prefer if we could focus the discussion on the proposed
specification for now. As soon as we agree there, we can start bikeshedding
the data structures.

I hope to implement the emitter, aswell as the tree walkers, soon. Then we
may finally have a non-hypothetical design to talk about.

kind regards

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [RFC] Roundtripping namespaced xml documents for data.xml

Reply via email to