Daniel Molina Wegener wrote: > unicode objects are encoded into the > encoding that the XML document encoding has, and as you say, the whole > XML document has one encoding. There is no mixing of byte encoded strings > with different encodings in the outout document.
Ok, that's what I hoped anyway. It just wasn't clear from your description. > When the object is restored, by using pyxser.unserialize: > > pyobj = pyxser.unserialize(obj = xmldocstr, enc = "utf-8") But this is XML, right? What do you need to pass the encoding for at this point? > Another issue is the fact that if you have mixed some encodings in byte > strings objects in your object tree, such as iso-8859-1 and utf-8, and > you try to serialize that object, pyxser will output to stdout the > serialization errors by trying to handle those mixed encodings which are > not regarding the document encoding. There shouldn't be any serialisation errors (unless you try to recode byte strings on the way out, which is a no-no for arbitrary user input). All you have to do is properly escape the byte string so that it passes the XML encoding step. One trick to do that is to decode the byte string as ISO-8859-1 and serialise the result as a normal Unicode string. Then you can re-encode the unicode string on input back to ISO-8859-1. I choose ISO-8859-1 here because it has the well-defined side-effect of mapping byte values directly to Unicode characters with an identical code point value. So you do not risk any failures or data loss. Stefan -- http://mail.python.org/mailman/listinfo/python-list