Re: Crawler and Scraper with different priorities

2014-09-09 Thread Peng Cheng
Hi Sandeep, would you be interesting in joining my open source project? https://github.com/tribbloid/spookystuff IMHO spark is indeed not for general purpose crawling, of which distributed job is highly homogeneous. But good enough for directional scraping which involves heterogeneous input and

Re: Crawler and Scraper with different priorities

2014-09-08 Thread Sandeep Singh
Hi Daniil, I have to do some processing of the results, as well as pushing the data to the front end. Currently I'm using akka for this application, but I was thinking maybe spark streaming would be a better thing to do. as well as i can use mllib for processing the results. Any specific reason's

Re: Crawler and Scraper with different priorities

2014-09-08 Thread Daniil Osipov
Depending on what you want to do with the result of the scraping, Spark may not be the best framework for your use case. Take a look at a general Akka application. On Sun, Sep 7, 2014 at 12:15 AM, Sandeep Singh wrote: > Hi all, > > I am Implementing a Crawler, Scraper. The It should be able to p