Carsten Haese: > If you want to convey an arbitrary sequence of bytes as if they were > characters, you need to pick a character encoding that can handle an > arbitrary sequence of bytes. utf-8 can not do that. ISO-8859-1 can, but > you need to specify the encoding explicitly. Observe what happens if I > take your example and insert an encoding specification: > >>>> iMessage = '<?xml version="1.0" encoding="ISO-8859-1"?>\n<message>\n > <Data><![CDATA[\xd0\x94\xd0\xb0\xd0\xbd\xd0\xbd\xd1\x8b\xd0\xb5 \xd0\xbf > \xd0\xbe\xd0\xbf\xd1\x83\xd0\xbb\xd1\x8f\xd1\x80\xd0\xbd\xd1\x8b\xd1\x85 > \xd0\xb7\xd0\xb0\xd0\xbf\xd1\x80\xd0\xbe\xd1\x81\xd0\xbe\xd0\xb2 \xd0 > \xbc\xd0\xbe\xd0\xb6\xd0\xbd\xd0\xbe \xd1\x83\xd1\x87\xd0\xb8\xd1\x82 > \xd1\x8b\xd0\xb2\xd0\xb0\xd1\x82\xd1\x8c \xd0\xbf\xd1\x80\xd0\xb8 \xd1 > \x81\xd0\xbe\xd0\xb1\xd1\x81\xd1\x82\xd0\xb2\xd0\xb5\xd0\xbd\xd0\xbd\xd1 > \x8b\xd1\x85 \xd1\x80\xd0\xb5\xd0\xba\xd0\xbb\xd0\xb0\xd0\xbc\xd0\xbd > \xd1]]></Data>\n</message>\n\n' >>>> minidom.parseString(iMessage) > <xml.dom.minidom.Document instance at 0xb7c157ac> > > Of course, when you extract your CDATA, it will come out as a unicode > string which you'll have to encode with ISO-8859-1 to turn it into a > sequence of bytes. Then you add the sequence of bytes from the next > message, and in the end that should yield a valid utf-8-encoded string > once you've collected and assembled all fragments. > > Hope this helps, >
Hi Carsten! Thanks for your suggestion - it is possible to fix the problem in that way. BTW: i've found an "xmlproc" and use to try to parse with commandline tool xpcmd.py it gives me "Parse complete, 0 error(s) and 0 warning(s)" I did not pick a character encoding "ISO-8859-1" (but using the lib it is another problem: to recode/retest/redoc/re* a lot of things) the project homepage: http://www.garshol.priv.no/download/software/xmlproc/ and another thing: I've open my xml-message in Mozilla again, in pop-up menu select "Page info" item, it shows me: Content-Type: text/xml Encoding: UTF-8 Many thank for your attention and patience! -- Maksim Kasimov -- http://mail.python.org/mailman/listinfo/python-list