Hi, I evaluated the runtime performance of some of the MLlib classification algorithms on a local machine and a cluster with 10 nodes. I used standalone mode and Spark 1.0.1 in both cases. Here are the results for the total runtime: Local Cluster Logistic regression 138 sec 336 sec SVM 138 sec 336 sec Decision tree 50 sec 132 sec
My dataset is quite small and my programs are very similar to the mllib examples that are included in the Spark distribution. Why is the runtime on the cluster significantly higher (almost 3 times) than that on the local machine even though the former uses more memory and more nodes? Is it because of the communication overhead on the cluster? I would like to know if there is something I need to be doing to optimize the performance on the cluster or if others have also been getting similar results. thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/mllib-performance-on-cluster-tp13290.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org