Rate limiting a web crawler

Simon Connah Wed, 26 Dec 2018 10:23:05 -0800

Hi,

I want to build a simple web crawler. I know how I am going to do it butI have one problem.

Obviously I don't want to negatively impact any of the websites that Iam crawling so I want to implement some form of rate limiting of HTTPrequests to specific domain names.

What I'd like is some form of timer which calls a piece of code sayevery 5 seconds or something and that code is what goes off and crawlsthe website.


I'm just not sure on the best way to call code based on a timer.

Could anyone offer some advice on the best way to do this? It will berunning on Linux and using the python-daemon library to run it as aservice and will be using at least Python 3.6.


Thanks for any help.
--
https://mail.python.org/mailman/listinfo/python-list

Rate limiting a web crawler

Reply via email to