On Nov 4, 11:01 am, Kee Nethery <k...@kagi.com> wrote: > Having an issue with elementtree XML() in python 2.6.4. > > This code works fine: > > from xml.etree import ElementTree as et > getResponse = u'''<?xml version="1.0" encoding="UTF-8"?> > <customer><shipping><state>bobble</state><city>head</ > city><street>city</street></shipping></customer>''' > theResponseXml = et.XML(getResponse) > > This code errors out when it tries to do the et.XML() > > from xml.etree import ElementTree as et > getResponse = u'''<?xml version="1.0" encoding="UTF-8"?> > <customer><shipping><state>\ue58d83\ue89189\ue79c8C</state><city> > \ue69f8f\ue5b882</city><street>\ue9ab98\ue58d97\ue58fb03</street></ > shipping></customer>''' > theResponseXml = et.XML(getResponse) > > In my real code, I'm pulling the getResponse data from a web page that > returns as XML and when I display it in the browser you can see the > Japanese characters in the data. I've removed all the stuff in my code > and tried to distill it down to just what is failing. Hopefully I have > not removed something essential. > > Why is this not working and what do I need to do to use Elementtree > with unicode?
On Nov 4, 11:01 am, Kee Nethery <k...@kagi.com> wrote: > Having an issue with elementtree XML() in python 2.6.4. > > This code works fine: > > from xml.etree import ElementTree as et > getResponse = u'''<?xml version="1.0" encoding="UTF-8"?> > <customer><shipping><state>bobble</state><city>head</ > city><street>city</street></shipping></customer>''' > theResponseXml = et.XML(getResponse) > > This code errors out when it tries to do the et.XML() > > from xml.etree import ElementTree as et > getResponse = u'''<?xml version="1.0" encoding="UTF-8"?> > <customer><shipping><state>\ue58d83\ue89189\ue79c8C</state><city> > \ue69f8f\ue5b882</city><street>\ue9ab98\ue58d97\ue58fb03</street></ > shipping></customer>''' > theResponseXml = et.XML(getResponse) > > In my real code, I'm pulling the getResponse data from a web page that > returns as XML and when I display it in the browser you can see the > Japanese characters in the data. I've removed all the stuff in my code > and tried to distill it down to just what is failing. Hopefully I have > not removed something essential. > > Why is this not working and what do I need to do to use Elementtree > with unicode? What you need to do is NOT feed it unicode. You feed it a str object and it gets decoded according to the encoding declaration found in the first line. So take the str object that you get from the web (should be UTF8-encoded already unless the header is lying), and throw that at ET ... like this: | Python 2.6.4 (r264:75708, Oct 26 2009, 08:23:19) [MSC v.1500 32 bit (Intel)] on win32 | Type "help", "copyright", "credits" or "license" for more information. | >>> from xml.etree import ElementTree as et | >>> ucode = u'''<?xml version="1.0" encoding="UTF-8"?> | ... <customer><shipping> | ... <state>\ue58d83\ue89189\ue79c8C</state> | ... <city>\ue69f8f\ue5b882</city> | ... <street>\ue9ab98\ue58d97\ue58fb03</street> | ... </shipping></customer>''' | >>> xml= et.XML(ucode) | Traceback (most recent call last): | File "<stdin>", line 1, in <module> | File "C:\python26\lib\xml\etree\ElementTree.py", line 963, in XML | parser.feed(text) | File "C:\python26\lib\xml\etree\ElementTree.py", line 1245, in feed | self._parser.Parse(data, 0) | UnicodeEncodeError: 'ascii' codec can't encode character u'\ue58d' in position 69: ordinal not in range(128) | # as expected | >>> strg = ucode.encode('utf8') | # encoding as utf8 is for DEMO purposes. | # i.e. use the original web str object, don't convert it to unicode | # and back to utf8. | >>> xml2 = et.XML(strg) | >>> xml2.tag | 'customer' | >>> for c in xml2.getchildren(): | ... print c.tag, repr(c.text) | ... | shipping '\n' | >>> for c in xml2[0].getchildren(): | ... print c.tag, repr(c.text) | ... | state u'\ue58d83\ue89189\ue79c8C' | city u'\ue69f8f\ue5b882' | street u'\ue9ab98\ue58d97\ue58fb03' | >>> By the way: (1) it usually helps to be more explicit than "errors out", preferably the exact copied/pasted output as shown above; this is one of the rare cases where the error message is predictable (2) PLEASE don't start a new topic in a reply in somebody else's thread. -- http://mail.python.org/mailman/listinfo/python-list