Diez B. Roggisch wrote: >> I would think it more likely that he wants to end up with u'Bob\u2019s >> Breakfast' rather than u'Bob\x92s Breakfast' although u'Dog\u2019s dinner' >> seems a probable consequence. > > If that's the case, he should read the file as string, de- and encode it > (probably into a StringIO) and then feed it to the parser.
some alternatives: - clean up the offending strings: http://effbot.org/zone/unicode-gremlins.htm - turn the offending strings back to iso-8859-1, and decode them again: u = u'Bob\x92s Breakfast' u = u.encode("iso-8859-1").decode("cp1252") - upgrade to ET 1.3 (available in alpha) and use the parser's encoding option to override the file's encoding: parser = ET.XMLParser(encoding="cp1252") tree = ET.parse(source, parser) </F> -- http://mail.python.org/mailman/listinfo/python-list