On Thu, Dec 11, 2008 at 8:49 PM, Robert Koberg <r...@koberg.com> wrote:
>
> Given an XML structure like:
>
> <root xmlns:dc="http://purl.org/dc/elements/1.1/";  >
>   <fragment  xmlns:xyz="http://www.w3.org/1999/xhtml";>
>     <dc:title>A HEAD Title</dc:title>
>     <xyz:title>A BODY Title</xyz:title>
>   </fragment>
>   <fragment xmlns:zyx="http://www.w3.org/1999/xhtml";>
>     <dc:title>A HEAD Title</dc:title>
>     <zyx:title>A BODY Title</zyx:title>
>   </fragment>
> </root>
>
> which will be parsed by clojure.xml/parse and converted into a nested
> collection.
>
> How would you find the title elements that need to be treated the same
> (those in the xhtml namespace) and those that need to be treated
> different (those in the dublin core namespace)?
>
> Now assume that you don't know what namespace prefix some XML that is
> not in your control will use.
>
> To an XML parser the fragment at position 1 is basically the same as
> the fragment at position 2. Is this the case in clojure?

Hmmm ... doesn't look like it:

1:37 user=> (clojure.xml/parse "file:///tmp/test.xml")
{:tag :root, :attrs {:xmlns:dc "http://purl.org/dc/elements/1.1/"}, :content [
  {:tag :fragment, :attrs {:xmlns:xyz "http://www.w3.org/1999/xhtml"},
:content [
    {:tag :dc:title, :attrs nil, :content ["A HEAD Title"]}
    {:tag :xyz:title, :attrs nil, :content ["A BODY Title"]}
  ]}
  {:tag :fragment, :attrs {:xmlns:zyx "http://www.w3.org/1999/xhtml"},
:content [
    {:tag :dc:title, :attrs nil, :content ["A HEAD Title"]}
    {:tag :zyx:title, :attrs nil, :content ["A BODY Title"]}
  ]}
]}

At least all the info you need to determine their equality is
maintained. I wonder if it would be worthwhile to expand namespace
aliases so that clojure.xml/parse would return something like the
following on this input:

1:37 user=> (clojure.xml/parse "file:///tmp/test.xml")
{:tag :root, :attrs nil, :content [
  {:tag :fragment, :attrs nil, :content [
    {:tag :<http://purl.org/dc/elements/1.1/>:title, :attrs nil,
:content ["A HEAD Title"]}
    {:tag :<http://www.w3.org/1999/xhtml>:title, :attrs nil, :content
["A BODY Title"]}
  ]}
  {:tag :fragment, :attrs nil, :content [
    {:tag :<http://purl.org/dc/elements/1.1/>:title, :attrs nil,
:content ["A HEAD Title"]}
    {:tag :<http://www.w3.org/1999/xhtml>:title, :attrs nil, :content
["A BODY Title"]}
  ]}
]}

It describes the same XML and has the added benefit that (= #<first
fragment> #<second fragment>) is true.

- J.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to