Hi Sandeep,
would you be interesting in joining my open source project?
https://github.com/tribbloid/spookystuff
IMHO spark is indeed not for general purpose crawling, of which distributed
job is highly homogeneous. But good enough for directional scraping which
involves heterogeneous input and
Hi Daniil,
I have to do some processing of the results, as well as pushing the data to
the front end. Currently I'm using akka for this application, but I was
thinking maybe spark streaming would be a better thing to do. as well as i
can use mllib for processing the results. Any specific reason's
Depending on what you want to do with the result of the scraping, Spark may
not be the best framework for your use case. Take a look at a general Akka
application.
On Sun, Sep 7, 2014 at 12:15 AM, Sandeep Singh
wrote:
> Hi all,
>
> I am Implementing a Crawler, Scraper. The It should be able to p