Re: Repeatedly crawl website every 1 min

2017-05-11 Thread Steve D'Aprano
On Thu, 11 May 2017 12:18 pm, liyucun2...@gmail.com wrote: > Hi Everyone, > > Thanks for stoping by. I am working on a feature to crawl website content > every 1 min. I am curious to know if there any good open source project > for this specific scenario. I agree with Iuri: crawling a website ev

Re: Repeatedly crawl website every 1 min

2017-05-11 Thread Iuri
Unless you are authorized, don't do it. It literally costs a lot of money to the website you are crawling, in CPU and bandwidth. Hundreds of concurrent requests can even kill a small server (with bad configuration). Look scrapy package, it is great for scraping, but be friendly with the websites

Repeatedly crawl website every 1 min

2017-05-10 Thread liyucun2012
Hi Everyone, Thanks for stoping by. I am working on a feature to crawl website content every 1 min. I am curious to know if there any good open source project for this specific scenario. Specifically, I have many urls, and I want to maintain a thread pool so that each thread will repeatedly cr