Re: Web Crawler - Python or Perl?

Sebastian "lunar" Wiesner Mon, 09 Jun 2008 13:11:14 -0700

 subeen <[EMAIL PROTECTED]> at Montag 09 Juni 2008 20:21:

> On Jun 10, 12:15 am, Stefan Behnel <[EMAIL PROTECTED]> wrote:
>> subeen wrote:
>> > can use urllib2 module and/or beautiful soup for developing crawler
>>
>> Not if you care about a) speed and/or b) memory efficiency.
>>
>> http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/
>>
>> Stefan
> 
> ya, beautiful soup is slower. so it's better to use urllib2 for
> fetching data and regular expressions for parsing data.


BeautifulSoup is implemented on regular expressions.  I doubt, that you can
achieve a great performance gain by using plain regular expressions, and
even if, this gain is certainly not worth the effort.  Parsing markup with
regular expressions is hard, and the result will most likely not be as fast
and as memory-efficient as lxml.html.

I personally am absolutely happy with lxml.html.  It's fast, memory
efficient, yet powerful and easy to use.

-- 
Freedom is always the freedom of dissenters.
                                      (Rosa Luxemburg)
--
http://mail.python.org/mailman/listinfo/python-list

Re: Web Crawler - Python or Perl?

Reply via email to