Stefan Behnel <[EMAIL PROTECTED]>:

> [EMAIL PROTECTED] wrote:
>> I am trying to build my own web crawler for an experiement and I don't
>> know how to access HTTP protocol with python.
>>
>> Also, Are there any Opensource Parsing engine for HTML documents
>> available in Python too? That would be great.
> 
> Try lxml.html. It parses broken HTML, supports HTTP, is much faster than
> BeautifulSoup and threadable, all of which should be helpful for your
> crawler.

You should mention its powerful features like XPATH and CSS selection
support and its easy API here, too ;)

-- 
Freedom is always the freedom of dissenters.
                                      (Rosa Luxemburg)
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to