On Wed, 18 Nov 2009 13:55:52 +0100, Thomas Lotze wrote: > I wonder what Python XML library is best for writing a program that makes > small modifications to an XML file in a minimally intrusive way. By that I > mean that information the program doesn't recognize is kept, as are > comments and whitespace, the order of attributes and even whitespace > around attributes. In short, I want to be able to change an XML file while > producing minimal textual diffs. > > Most libraries don't allow controlling the order of and the whitespace > around attributes, so what's generally left to do is store snippets of > original text along with the model objects and re-use that for writing the > edited XML if the model wasn't modified by the program. Does a library > exist that helps with this? Does any XML library at all allow structured > access to the text representation of a tag with its attributes?
Expat parsers have a CurrentByteIndex field, while SAX parsers have locators. You can use this to identify the portions of the input which need to be processed, and just copy everything else. One downside is that these only report either the beginning (Expat) or end (SAX) of the tag; you'll have to deduce the other side yourself. OTOH, "diff" is probably the wrong tool for the job. -- http://mail.python.org/mailman/listinfo/python-list