Diez B. Roggisch wrote:

>> I would think it more likely that he wants to end up with u'Bob\u2019s 
>> Breakfast' rather than u'Bob\x92s Breakfast' although u'Dog\u2019s dinner' 
>> seems a probable consequence.
> 
> If that's the case, he should read the file as string, de- and encode it 
> (probably into a StringIO) and then feed it to the parser.

some alternatives:

- clean up the offending strings:

     http://effbot.org/zone/unicode-gremlins.htm

- turn the offending strings back to iso-8859-1, and decode them again:

     u = u'Bob\x92s Breakfast'
     u = u.encode("iso-8859-1").decode("cp1252")

- upgrade to ET 1.3 (available in alpha) and use the parser's encoding 
option to override the file's encoding:

     parser = ET.XMLParser(encoding="cp1252")
     tree = ET.parse(source, parser)

</F>

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to