Re: [RFC] Roundtripping namespaced xml documents for data.xml

Paul Gearon Fri, 23 May 2014 10:01:41 -0700

Hi Herwig,

I spent some time going through the design, and a email thread
(particularly Chouser and Christophe's responses), plus I spent a bit more
time on my own implementation where it was clear that I'd missed some
things.

I've yet to go through your code in fine detail, so I'll still have some
gaps in my knowledge.

On Thu, May 22, 2014 at 2:44 PM, Herwig Hochleitner
<hhochleit...@gmail.com>wrote:

> 2014-05-21 21:06 GMT+02:00 Paul Gearon <gea...@gmail.com>:
>
>
>> Are QNames strictly necessary? Keywords seem to do the trick, and they
>> work in nicely with what already exists.
>>
>> I know that there are some QName forms that are not readable as a
>> keyword, but the XML parsing code will always call (keyword ...) and that
>> holds any kind of QName,
>>
>
> I've argued this at some length on the dev thread. IMO QNames are not
> nessecary, but we want another datatype than keywords.
> I think the main argument for using keywords would be xml literals in code
> and there readability (i.e. not having to use (keyword ..)) counts. A
> reader tag is far better suited for this.
> In the course of that argument, I also came up with a way to represent
> resolved names as keywords in literals. Please check out the design page
> for this.
>

I still argue for using keywords here. The existing API uses them, and
they're a natural fit.

The one real problem is elements would need a special hash/equality check
to deal with namespace contexts (presuming that fn:deep-equal maps to
Object.equals). The keywords would need to be translated according to the
current context. However, that approach still works for fragments that can
be evaluated in different contexts, while storing URIs directly needs
everything to be rebuilt for another context.

Most possible QNames can be directly expressed as a keyword (for instance,
the QName 㑦:㒪 can be represented as the read-able keyword :㑦/㒪). The
keyword function is just a workaround for exotic values. While I know they
can exist (e.g. containing \u000A), I've yet to see any QNames in the wild
that cannot be represented as a read-able keyword.

In case I'm not clear, say I have these two docs:

<a:foo xmlns="http://ex.com/"; xmlns:a="http://a.com/";>
  <a:bar xmlns:b="http://b.org"; b:baz="blah"/>
</a:foo>

<a:foo xmlns:a="http://something.else.com/";>
  <a:bar xmlns:b="http://b.org"; b:baz="blah"/>
</a:foo>

If I compare the a:bar element form both documents with func-deep-equal
then they should compare equal, despite the fact that the a:bar qname
resolves differently in each context.

The representation I've used was only a small extension to the existing one:
#clojure.data.xml.Element{:tag :a/bar, :attrs {:b/baz "blah"}, :namespaces
{:b "http://b.org"}, :content ()}

I agree with the use of meta to handle the namespaces, since it's not
included in equality testing. Namespaces are declared on, and scoped to the
element, so it makes sense to add them as a map there (this is what I've
done). In the first case, the meta-data for the a:bar element is:
{:b "http://b.org";, :a "http://a.com/";, :xmlns "http://ex.com/"}

I didn't follow the discussion for putting URIs into keywords, as I could
not see why we would want this (am I missing something obvious?)

Are the reverse mappings (uri->prefix) definitely necessary? My first look
>> at this made me think that they were (particularly so I could call
>> XMLStreamWriter.getPrefix), but it seemed that the XmlWriter keeps enough
>> state that it isn't necessary. My final code didn't need them at all.
>>
>
> The XmlWriter does keep enough state, but I also want to support tree
> transformers that have the full information without needing to pipe through
> Xml{Reader,Parser}.
> uri->prefix could be reconstructed from prefix->uri in linear time, so
> again, the reason for the reverse mapping is performance.
>

I still don't see why the reverse mapping is needed. Is it because I'm
storing the QName in a keyword and can look up the current namespace for
the URI, while you are storing the fully qualified name?

> I was mostly considering round-tripping the data, and the parser is good
>> at not repeating namespaces for child elements, so the emitter didn't need
>> to either. As a result I didn't need to filter out prefix->uri bindings
>> from parent elements when emitting namespaces, though that should be easy.
>>
>
> What I meant are redundant prefixes, e.g. binding xmlns:D="DAV:" at the
> root element, xmlns:E="DAV:" in a child element.
>

Sorry, I'm not following what you were getting at with this. In this
example D and E both get mapped to the same namespace, meaning that
<D:foo/> and <E:foo/> can be made to resolve the same way. But in a
different context they could be different values.

If both the explicit declarations of namespaces on elements and current
context are stored with the element (one in the :namespaces field, the
other in metadata), then this allows resolution to be handled correctly,
while also maintaining where each namespace needs to be emitted.

If uri->prefix is needed, then a simple map would need that, yes. However,
>> if I needed the reverse mapping then I'd use a pair of stacks of maps - one
>> for each direction.
>>
>> (BTW, a "stack of maps" sounds complex, but the top of the stack is just
>> the new bindings merged onto the previous top of the stack).
>>
>
> In this case, XmlNamespaceImpl is just that, modulo the stack. It is meant
> to be updated at every child element that binds xmlns prefixes, so the
> stack is implicit. I don't keep the parent XmlNamespaceImpl, because an xml
> element doesn't keep a parent pointer either.
>

I guess I was uncomfortable with XmlNamespaceImpl because of the fancy
structures with mutation. I was attracted to using a stack, since that's
what's going on in the parser/emitter.

If you have time (or inclination) for comparison, you can look at mine at
https://github.com/quoll/data.xml on the new_namespaces branch. I haven't
yet written the code for equality (fn:deep-equal), nor for resolving the
URIs for QNames, but it's parsing and emitting, and I think it's correct.
Unfortunately, it's all still in one file, as per the master branch.

Regards,
Paul

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [RFC] Roundtripping namespaced xml documents for data.xml

Reply via email to