On Aug 21, 8:36 am, George Sakkis <[EMAIL PROTECTED]> wrote: > It seems xml.etree.cElementTree.iterparse() is not unicode aware: > > >>> from StringIO import StringIO > >>> from xml.etree.cElementTree import iterparse > >>> s = > >>> u'<name>\u03a0\u03b1\u03bd\u03b1\u03b3\u03b9\u03ce\u03c4\u03b7\u03c2</name>' > >>> for event,elem in iterparse(StringIO(s)): > > ... print elem.text > ... > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "<string>", line 64, in __iter__ > UnicodeEncodeError: 'ascii' codec can't encode characters in position > 6-15: ordinal not in range(128) > > Am I using it incorrectly or it doesn't currently support unicode ?
Hi George, I'm no XML guru by any means but as far as I understand it, you would need to encode your text into UTF-8, and prepend something like '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>' to it. This appears to be the way XML is, rather than an ElementTree problem. E.g. >>> from StringIO import StringIO >>> from xml.etree.cElementTree import iterparse >>> s = u'<wrapper><name>\u03a0\u03b1</name><digits>01234567</digits></wrapper>' >>> h = '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>' >>> xml = h + s.encode('utf8') >>> for event,elem in iterparse(StringIO(xml)): ... print elem.tag, repr(elem.text) ... name u'\u03a0\u03b1' digits '01234567' wrapper None >>> HTH, John -- http://mail.python.org/mailman/listinfo/python-list