Re: beautifulsoup .vs tidy

Fredrik Lundh Sun, 02 Jul 2006 00:22:57 -0700

Ravi Teja wrote:

>> Of course, lxml should be able to do this kind of thing as well. I'd be
>> interested to know why this "is not a good idea", though.
> 
> No reason that you don't know already.
> 
> http://www.boddie.org.uk/python/HTML.html
> 
> "If the document text is well-formed XML, we could omit the html
> parameter or set it to have a false value."
> 
> XML parsers are not required to be forgiving to be regarded compliant.
> And much HTML out there is not well formed.


so?  once you run it through an HTML-aware parser, the *resulting* 
structure is well formed.

a site generator->converter->xpath approach is no less reliable than any 
other HTML-scraping approach.

</F>

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: beautifulsoup .vs tidy

Reply via email to