jasiu85 wrote:
> I have a problem with character encoding in LXML. Here's how it goes:
> 
> I read an HTML document from a third-party site. It is supposed to be
> in UTF-8, but unfortunately from time to time it's not.

You can instantiate your own HTML parser and pass encoding="utf-8". That way,
when it's not UTF-8, you will get an exception at parse time, which allows you
to reparse the document with another encoding (say, ISO-8859-1) to get the
correct content.

Stefan
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to