Re: iterparse and unicode

Stefan Behnel Sat, 23 Aug 2008 22:22:47 -0700

George Sakkis wrote:
> On Aug 21, 1:48 am, Fredrik Lundh <[EMAIL PROTECTED]> wrote:
> 
>> George Sakkis wrote:
>>> It's interesting that the element text attributes after a successful
>>> parse do not necessarily have the same type, i.e. all be str or all
>>> unicode. I ported some text extraction code from  BeautifulSoup (which
>>> handles all text as unicode) and I was surprized to find out that in
>>> xml.etree the returned text's type is not fixed, even within the same
>>> file. Although it's not a bug, having a mixed collection of byte and
>>> unicode strings from the same source makes me somewhat uneasy.
>> If you don't care about memory and execution performance, there are
>> plenty of toolkits that guarantee that you always get Unicode strings.
> 
> As long as they are documented, both approaches are fine for different
> cases. Currently the only reference I found about unicode in
> ElementTree is "All strings can either be Unicode strings, or 8-bit
> strings containing US-ASCII only." [1], which is rather ambiguous


It's not ambiguous in Py2.x, where ASCII byte strings and unicode strings are
compatible. No need to feel "uneasy". :)

Stefan
--
http://mail.python.org/mailman/listinfo/python-list

Re: iterparse and unicode

Reply via email to