George Sakkis wrote: > On Aug 21, 1:48 am, Fredrik Lundh <[EMAIL PROTECTED]> wrote: > >> George Sakkis wrote: >>> It's interesting that the element text attributes after a successful >>> parse do not necessarily have the same type, i.e. all be str or all >>> unicode. I ported some text extraction code from BeautifulSoup (which >>> handles all text as unicode) and I was surprized to find out that in >>> xml.etree the returned text's type is not fixed, even within the same >>> file. Although it's not a bug, having a mixed collection of byte and >>> unicode strings from the same source makes me somewhat uneasy. >> If you don't care about memory and execution performance, there are >> plenty of toolkits that guarantee that you always get Unicode strings. > > As long as they are documented, both approaches are fine for different > cases. Currently the only reference I found about unicode in > ElementTree is "All strings can either be Unicode strings, or 8-bit > strings containing US-ASCII only." [1], which is rather ambiguous
It's not ambiguous in Py2.x, where ASCII byte strings and unicode strings are compatible. No need to feel "uneasy". :) Stefan -- http://mail.python.org/mailman/listinfo/python-list