Re: encoding in lxml

2008-11-03 Thread Stefan Behnel
jasiu85 wrote: > I have a problem with character encoding in LXML. Here's how it goes: > > I read an HTML document from a third-party site. It is supposed to be > in UTF-8, but unfortunately from time to time it's not. You can instantiate your own HTML parser and pass encoding="utf-8". That way,

Re: encoding in lxml

2008-11-03 Thread pjacobi . de
Hi Mike, > I read an HTML document from a third-party site. It is supposed to be > in UTF-8, but unfortunately from time to time it's not. There will be host of more lightweight solutions, but you can opt to sanizite incominhg HTML with HTML Tidy (python binding available). It will replace inval