subeen <[EMAIL PROTECTED]> at Montag 09 Juni 2008 20:21: > On Jun 10, 12:15 am, Stefan Behnel <[EMAIL PROTECTED]> wrote: >> subeen wrote: >> > can use urllib2 module and/or beautiful soup for developing crawler >> >> Not if you care about a) speed and/or b) memory efficiency. >> >> http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/ >> >> Stefan > > ya, beautiful soup is slower. so it's better to use urllib2 for > fetching data and regular expressions for parsing data.
BeautifulSoup is implemented on regular expressions. I doubt, that you can achieve a great performance gain by using plain regular expressions, and even if, this gain is certainly not worth the effort. Parsing markup with regular expressions is hard, and the result will most likely not be as fast and as memory-efficient as lxml.html. I personally am absolutely happy with lxml.html. It's fast, memory efficient, yet powerful and easy to use. -- Freedom is always the freedom of dissenters. (Rosa Luxemburg) -- http://mail.python.org/mailman/listinfo/python-list