Steven Bethard wrote: > I'm having trouble using elementtree with an XML file that has some > gbk-encoded text. (I can't read Chinese, so I'm taking their word for > it that it's gbk-encoded.) I always have trouble with encodings, so I'm > sure I'm just screwing something simple up. Can anyone help me?
absolutely! pyexpat has only limited support for non-standard encodings; the core expat library only supports UTF-8, UTF-16, US-ASCII, and ISO-8859-1, and the Python glue layer then adds support for all byte-to-byte en- codings support by Python on top of that. if you're using any other encoding, you need to recode the file on the way in (just decoding to Unicode doesn't work, since the parser expects an encoded byte stream). the approach shown on this page should work http://effbot.org/zone/celementtree-encoding.htm except that it uses the new XMLParser interface which isn't available in ET 1.2.6, and the corresponding XMLTreeBuilder interface in ET doesn't support the encoding override argument... the easiest way to fix this is to modify the file header on the way in; if the file has an <?xml encoding?> header, rip out the header and recode from that encoding to utf-8 while parsing. </F> -- http://mail.python.org/mailman/listinfo/python-list