bug#20339: sxml simple: sxml->xml mishandles namespaces?

Ricardo Wurmus Tue, 12 Feb 2019 16:17:30 -0800


to...@tuxteam.de writes:


> As John has noted, the namespace mappings (i.e. the prefix -> namespace
> URI binding) are kind of lexically scoped (I'd call it subtree scoped,
> but structurally it is the same). While parsing is "easy" (assuming
> well-formed XML), serializing is not unambiguous.

The “fup” handler of the parser visits every element and has a list of
namespaces that are in scope at this point.  Its purpose is to return
the SXML representation of that element.  At this point we can record
the namespaces as attributes.  (That’s what the patch does.)

When baking XML from SXML we don’t need to do anything special — we only
need to convert everything to text, including the recorded namespace
attributes.  This isn’t pretty SXML (nor is it pretty XML), but it
appears to be correct as none of the namespace information is lost.

To get a better serialized representation the parser needs to do a
better job of identifying “new” namespaces.

> In a way, the library might want to be prepared to take hints from the
> application (as far as the XML is to be read by humans, there might be
> "better" and "worse" serializations).

The XML produced when this patch is applied will not be pretty.  To
generate minimal/pretty XML knowledge of the parent elements’ namespaces
is required — knowledge that the parser’s “fup” handler does not have.

We could try to alter the parser so that it not only passes the list of
namespaces that are currently in scope, but also a list of namespaces
that are in scope for the parent node.  This would allow us to determine
the list of *new* namespaces that absolutely must be declared for the
current node.  If there are no new namespaces we can simply ignore them
and produce minimal SXML (and thus minimal XML later when the SXML is
serialized).

--
Ricardo

bug#20339: sxml simple: sxml->xml mishandles namespaces?

Reply via email to