Re: iterparse and unicode

Fredrik Lundh Wed, 20 Aug 2008 22:53:18 -0700

George Sakkis wrote:

> Thank you both for the suggestions. I made a few more experiments to
> understand how iterparse behaves with respect to three dimensions:

Spending time researching undefined behaviour is pretty pointless. ETparsers expect byte streams, because that's what XML files are. If youpass it anything else, an ET implementation may attempt to convert thatthing to a byte string, run the game "rogue", or do something else thatit finds appropriate.

It's interesting that the element text attributes after a successful
parse do not necessarily have the same type, i.e. all be str or all
unicode. I ported some text extraction code from  BeautifulSoup (which
handles all text as unicode) and I was surprized to find out that in
xml.etree the returned text's type is not fixed, even within the same
file. Although it's not a bug, having a mixed collection of byte and
unicode strings from the same source makes me somewhat uneasy.

If you don't care about memory and execution performance, there areplenty of toolkits that guarantee that you always get Unicode strings.


</F>

--
http://mail.python.org/mailman/listinfo/python-list

Re: iterparse and unicode

Reply via email to