Re: Web Crawler - Python or Perl?

Stefan Behnel Mon, 09 Jun 2008 14:11:13 -0700

Ray Cote wrote:
> Beautiful Soup is a bit slower, but it will actually parse some of the
> bizarre HTML you'll download off the web.
[...]
> I don't know if some of the quicker parsers discussed require
> well-formed HTML since I've not used them. You may want to consider
> using one of the quicker HTML parsers and, when they throw a fit on the
> downloaded HTML, drop back to Beautiful Soup -- which usually gets
> _something_ useful off the page.


So does lxml.html. And if you still feel like needing BS once in a while,
there's lxml.html.soupparser.

http://codespeak.net/lxml/elementsoup.html

Stefan
--
http://mail.python.org/mailman/listinfo/python-list

Re: Web Crawler - Python or Perl?

Reply via email to