George Sakkis wrote:

> Thank you both for the suggestions. I made a few more experiments to
> understand how iterparse behaves with respect to three dimensions:

Spending time researching undefined behaviour is pretty pointless. ET parsers expect byte streams, because that's what XML files are. If you pass it anything else, an ET implementation may attempt to convert that thing to a byte string, run the game "rogue", or do something else that it finds appropriate.

It's interesting that the element text attributes after a successful
parse do not necessarily have the same type, i.e. all be str or all
unicode. I ported some text extraction code from  BeautifulSoup (which
handles all text as unicode) and I was surprized to find out that in
xml.etree the returned text's type is not fixed, even within the same
file. Although it's not a bug, having a mixed collection of byte and
unicode strings from the same source makes me somewhat uneasy.

If you don't care about memory and execution performance, there are plenty of toolkits that guarantee that you always get Unicode strings.

</F>

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to