On Thu, 11 May 2017 12:18 pm, liyucun2...@gmail.com wrote:
> Hi Everyone,
>
> Thanks for stoping by. I am working on a feature to crawl website content
> every 1 min. I am curious to know if there any good open source project
> for this specific scenario.
I agree with Iuri: crawling a website ev
Unless you are authorized, don't do it. It literally costs a lot of money
to the website you are crawling, in CPU and bandwidth.
Hundreds of concurrent requests can even kill a small server (with bad
configuration).
Look scrapy package, it is great for scraping, but be friendly with the
websites
Hi Everyone,
Thanks for stoping by. I am working on a feature to crawl website content every
1 min. I am curious to know if there any good open source project for this
specific scenario.
Specifically, I have many urls, and I want to maintain a thread pool so that
each thread will repeatedly cr