Re: encoding in lxml

Stefan Behnel Mon, 03 Nov 2008 12:05:31 -0800

jasiu85 wrote:
> I have a problem with character encoding in LXML. Here's how it goes:
> 
> I read an HTML document from a third-party site. It is supposed to be
> in UTF-8, but unfortunately from time to time it's not.


You can instantiate your own HTML parser and pass encoding="utf-8". That way,
when it's not UTF-8, you will get an exception at parse time, which allows you
to reparse the document with another encoding (say, ISO-8859-1) to get the
correct content.

Stefan
--
http://mail.python.org/mailman/listinfo/python-list

Re: encoding in lxml

Reply via email to