I'm currently planning on writing a web crawler in python but have a question as far as how I should design it. My goal is speed and maximum efficient use of the hardware\bandwidth I have available.
As of now I have a Dual 2.4ghz xeon box, 4gb ram, 500gb sata and a 20mbps bandwidth cap (for now) . Running FreeBSD. What would be the best way to design the crawler? Using the thread module? Would I be able to max out this connection with the hardware listed above using python threads? Thank you kindly.
-- http://mail.python.org/mailman/listinfo/python-list