Fredrik Lundh wrote: > [EMAIL PROTECTED] wrote: > > > I wanted to see what would happen if one used the results of a tostring > > method as input into the XML method. What I observed is this: > > a) beforeCtag.text is of type <type 'str'> > > b) beforeCtag.text when printed displays: I'm confused > > c) afterCtag.text is of type <type 'unicode'> > > d) afterCtag.text when printed displays: I?m confused > > the XML file format isn't a Python string serialization format, it's an XML > infoset > serialization format. > > as stated in the documentation, ET always uses Unicode strings for text that > contain non-ASCII characters. for text that *only* contains ASCII, it may use > either Unicode strings or 8-bit strings, depending on the implementation. > > the behaviour if you're passing in non-ASCII text as 8-bit strings is > undefined > (which means that you shouldn't do that; it's not portable).
I was about to post a similar question when I found this thread. Fredrik, can you explain why this is not portable ? I'm currently using (a variation of) the workaround below instead of ET.tostring and it works fine for me: def tostring(element, encoding=None): text = element.text if text: if not isinstance(text, basestring): text2 = str(text) elif isinstance(text, str) and encoding: text2 = text.decode(encoding) element.text = text2 s = ET.tostring(element, encoding) element.text = text return s Why isn't this the standard behaviour ? Thanks, George -- http://mail.python.org/mailman/listinfo/python-list