jasiu85 wrote: > I have a problem with character encoding in LXML. Here's how it goes: > > I read an HTML document from a third-party site. It is supposed to be > in UTF-8, but unfortunately from time to time it's not.
You can instantiate your own HTML parser and pass encoding="utf-8". That way, when it's not UTF-8, you will get an exception at parse time, which allows you to reparse the document with another encoding (say, ISO-8859-1) to get the correct content. Stefan -- http://mail.python.org/mailman/listinfo/python-list