Alain <[EMAIL PROTECTED]> wrote: > I would expect a piece of XML to be read, parsed and written back > without corruption [...]. It isn't however the case when it comes > to CDATA handling.
This is not corruption, exactly. For most intents and purposes, CDATA sections should behave identically to normal character data. In a real XML-based browser (such as Mozilla in application/xhtml+xml mode), this line of script would actually work fine: > if (a < b && a > 0) { The problem is you're (presumably) producing output that you want to be understood by things that are not XML parsers, namely legacy-HTML web browsers, which have special exceptions-to-the-rule like "<script> elements don't contain markup" that are not present in XML. ElementTree is a data binding that strives to simplify the XML processing experience, and as such it folds CDATA sections down to plain characters - this is usually easier for programmers to deal with. Such a feature is considered normal in XML processing, and is the default for, eg. DOM Level 3 implementations. If, instead, you want to keep track of where the CDATA sections are, and output them again without change, you'll need to use an XML-handling interface that supports this feature. Typically, DOM implementations do - the default Python minidom does, as does pxdom. DOM is a more comprehensive but less friendly/Python-like interface for XML processing. There are a few other obstacles you may meet if you are outputting XML for use by a non-XML parser (legacy browsers): - entity references - é etc. The HTML entities are not built into XML so to read them at all you'll need a parser that reads the external DTD subset (and a suitable !DOCTYPE). Even then they'll be converted to text, if that matters. (pxdom, optionally, can keep them as entity references regardless of whether their content is known); - empty elements - <img/> etc. An XML serialiser won't know how to output this is a browser-compatible way. (The next release of pxdom has an option to do so.) If you're generating output for legacy browsers, you might want to just use a 'real' HTML serialiser. -- Andrew Clover mailto:[EMAIL PROTECTED] http://www.doxdesk.com/ -- http://mail.python.org/mailman/listinfo/python-list