Hallo,

Spark is a general framework for distributed in-memory processing. You can 
always write a highly-specified piece of code which is faster than Spark, but 
then it can do only one thing and if you need something else you will have to 
rewrite everything from scratch . This is why Spark is beneficial.
In this context, your setup does not make sense. You should have at least 5 
worker nodes to make evaluations.
Follow the Spark tuning and recommendation guide.

> On 03 May 2016, at 07:02, Abhijith Chandraprabhu <[email protected]> wrote:
> 
> Hello,
> 
> I am trying to find some performance figures of spark vs various other 
> languages for ALS based recommender system. I am using 20 million ratings 
> movielens dataset. The test environment involves one big 30 core machine with 
> 132 GB memory. I am using the scala version of the script provided here,
> http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html 
> 
> I am not an expert in spark, and I assume that varying the n while invoking 
> spark with following flags, --master local[n], is supposed to provide ideal 
> scaling. 
> 
> Initial observations didnt favour spark by some small margins, but as I said 
> since I am not a spark expert, I would comment only after being assured that 
> this is the most optimal way of running the ALS snippet. 
> 
> Could the experts please help me with the most optimal way to get the best 
> timings out of sparks ALS example on the mentioned environment. Thanks.
> 
> -- 
> Best regards,
> Abhijith

Reply via email to