Stefan Behnel <[EMAIL PROTECTED]>: > [EMAIL PROTECTED] wrote: >> I am trying to build my own web crawler for an experiement and I don't >> know how to access HTTP protocol with python. >> >> Also, Are there any Opensource Parsing engine for HTML documents >> available in Python too? That would be great. > > Try lxml.html. It parses broken HTML, supports HTTP, is much faster than > BeautifulSoup and threadable, all of which should be helpful for your > crawler.
You should mention its powerful features like XPATH and CSS selection support and its easy API here, too ;) -- Freedom is always the freedom of dissenters. (Rosa Luxemburg) -- http://mail.python.org/mailman/listinfo/python-list