Ravi Teja wrote: >> Of course, lxml should be able to do this kind of thing as well. I'd be >> interested to know why this "is not a good idea", though. > > No reason that you don't know already. > > http://www.boddie.org.uk/python/HTML.html > > "If the document text is well-formed XML, we could omit the html > parameter or set it to have a false value." > > XML parsers are not required to be forgiving to be regarded compliant. > And much HTML out there is not well formed.
so? once you run it through an HTML-aware parser, the *resulting* structure is well formed. a site generator->converter->xpath approach is no less reliable than any other HTML-scraping approach. </F> -- http://mail.python.org/mailman/listinfo/python-list