Ben Butler-Cole wrote: >> An additional thing to keep in mind is that toprettyxml does not print >> an XML identical to the original DOM tree: it adds newlines and tabs. >> When parsed again these blank characters are inserted in the DOM tree as >> character nodes. If you toprettyxml an XML document twice in a row, then >> the second one will also add newlines and tabs around the newlines and >> tabs added by the first. Since you call toprettyxml an infinite number >> of times, it is expected that lots of blank characters appear. >> > > Right. That's the behaviour I'm asking about, which I consider to be > problematic. I would expect a module providing a parser and pretty- > printer (not just for XML parsers) to be able to conservatively round- > trip. > > As far as I can see (and your comments back this up) minidom doesn't > have this property. Unless anyone knows how to get it to behave that > way... > minidom --any DOM parser, btw-- has no means to know which blank character is a pretty print artefact or actual blank content from the original XML.
You could write a function that strips all-blank nodes recursively down the elements tree, before doing so I suggest you take a look at section 2.10 of http://www.w3.org/TR/REC-xml/. RB -- http://mail.python.org/mailman/listinfo/python-list