On Jun 9, 1:48 pm, [EMAIL PROTECTED] wrote: > Hi all, > I am currently planning to write my own web crawler. I know Python but > not Perl, and I am interested in knowing which of these two are a > better choice given the following scenario: > > 1) I/O issues: my biggest constraint in terms of resource will be > bandwidth throttle neck. > 2) Efficiency issues: The crawlers have to be fast, robust and as > "memory efficient" as possible. I am running all of my crawlers on > cheap pcs with about 500 mb RAM and P3 to P4 processors > 3) Compatibility issues: Most of these crawlers will run on Unix > (FreeBSD), so there should exist a pretty good compiler that can > optimize my code these under the environments. > > What are your opinions?
You mentioned *what* you want but not *why*. If it's for a real-world production project, why reinvent a square wheel and not use (or at least extend) an existing open source crawler, with years of development behind it ? If it's a learning exercise, why bother about performance so early ? In any case, since you said you know python but not perl, the choice is almost a no-brainer, unless you're looking for an excuse to learn perl. In terms of performance they are comparable, and you can probably manage crawls in the order of 10-100K pages at best. For million-page or larger crawls though, you'll have to resort to C/C++ sooner or later. George -- http://mail.python.org/mailman/listinfo/python-list