In <[EMAIL PROTECTED]>, sim.sim wrote: > Below the code that tryes to parse an well-formed xml, but it fails > with error message: > "not well-formed (invalid token): line 3, column 85"
How did you verified that it is well formed? `xmllint` barf on it too. > The "problem" within CDATA-section: it consists a part of utf-8 > encoded string wich was splited (widely used for memory limited > devices). > > When minidom parses the xml-string, it fails becouse it tryes to convert > into unicode the data within CDATA-section, insted of just to return the > value of the section "as is". The convertion contradicts the > specification http://www.w3.org/TR/REC-xml/#sec-cdata-sect An XML document contains unicode characters, so does the CDTATA section. CDATA is not meant to put arbitrary bytes into a document. It must contain valid characters of this type http://www.w3.org/TR/REC-xml/#NT-Char (linked from the grammar of CDATA in your link above). Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list