On 25 май, 12:45, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote: > In <[EMAIL PROTECTED]>, sim.sim wrote: > > Below the code that tryes to parse an well-formed xml, but it fails > > with error message: > > "not well-formed (invalid token): line 3, column 85" > > How did you verified that it is well formed? `xmllint` barf on it too.
you can try to write iMessage to file and open it using Mozilla Firefox (web-browser) > > > The "problem" within CDATA-section: it consists a part of utf-8 > > encoded string wich was splited (widely used for memory limited > > devices). > > > When minidom parses the xml-string, it fails becouse it tryes to convert > > into unicode the data within CDATA-section, insted of just to return the > > value of the section "as is". The convertion contradicts the > > specificationhttp://www.w3.org/TR/REC-xml/#sec-cdata-sect > > An XML document contains unicode characters, so does the CDTATA section. > CDATA is not meant to put arbitrary bytes into a document. It must > contain valid characters of this > typehttp://www.w3.org/TR/REC-xml/#NT-Char(linked from the grammar of CDATA in > your link above). > > Ciao, > Marc 'BlackJack' Rintsch my CDATA-section contains only symbols in the range specified for Char: Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] filter(lambda x: ord(x) not in range(0x20, 0xD7FF), iMessage) -- http://mail.python.org/mailman/listinfo/python-list