Hi all, The unicode code points in the 0000-001F range -- except newline, tab, carriage return -- are not legal XML 1.0 characters.
Attempts to serialize and deserialize such strings with ElementTree will fail: >>> elt = Element("root", char=u"\u0000") >>> xml = tostring(elt) >>> xml '<root char="\x00" />' >>> fromstring(xml) [...] xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1, column 12 Good ! But I was expecting a failure *earlier*, in the "tostring" function -- I basically assumed that ElementTree would refuse to generate a XML fragment that is not well-formed. Could anyone comment on the rationale behind the current behavior ? Is it a performance issue, the search for non-valid unicode code points being too expensive ? Cheers, SB -- http://mail.python.org/mailman/listinfo/python-list