Paul Rubin wrote: > "abhinav" <[EMAIL PROTECTED]> writes:
> > maintaining huge data structures.What should be the language so as > > not to compromise that much on speed.What is the performance of > > python based crawlers vs C based crawlers.Should I use both the > > languages(partly C and python).How should i decide what part to be > > implemented in C and what should be done in python? Please guide > > me.Thanks. > > I think if you don't know how to answer these questions for yourself, > you're not ready to take on projects of that complexity. My advice > is start in Python since development will be much easier. If and when > you start hitting performance problems, you'll have to examine many > combinations of tactics for dealing with them, and switching languages > is just one such tactic. There's another potential bottleneck, parsing HTML and extracting the text you want, especially when you hit pages that don't meet HTML 4 or XHTML spec. http://sig.levillage.org/?p=599 Paul's advice is very sound, given what little info you've provided. http://trific.ath.cx/resources/python/optimization/ (and look at psyco, pyrex, boost, Swig, Ctypes for bridging C and python, you have a lot of options. Also look at Harvestman, mechanize, other existing libs. -- http://mail.python.org/mailman/listinfo/python-list