Re: Crawler and Scraper with different priorities

2014-09-09 Thread Peng Cheng
Hi Sandeep, would you be interesting in joining my open source project? https://github.com/tribbloid/spookystuff IMHO spark is indeed not for general purpose crawling, of which distributed job is highly homogeneous. But good enough for directional scraping which involves heterogeneous input and

Re: Crawler and Scraper with different priorities

2014-09-08 Thread Sandeep Singh
Hi Daniil, I have to do some processing of the results, as well as pushing the data to the front end. Currently I'm using akka for this application, but I was thinking maybe spark streaming would be a better thing to do. as well as i can use mllib for processing the results. Any specific reason's

Re: Crawler and Scraper with different priorities

2014-09-08 Thread Daniil Osipov
sage in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Crawler-and-Scraper-with-different-priorities-tp13645.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To un

Crawler and Scraper with different priorities

2014-09-07 Thread Sandeep Singh
anks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Crawler-and-Scraper-with-different-priorities-tp13645.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubsc