Also - what hardware are you running the cluster on? And what is the local machine hardware?
On Tue, Sep 2, 2014 at 11:57 AM, Evan R. Sparks <evan.spa...@gmail.com> wrote: > How many iterations are you running? Can you provide the exact details > about the size of the dataset? (how many data points, how many features) Is > this sparse or dense - and for the sparse case, how many non-zeroes? How > many partitions is your data RDD? > > For very small datasets the scheduling overheads of shipping tasks across > the cluster and delays due to stragglers can dominate the time actually > doing your parallel computation. If you have too few partitions, you won't > be taking advantage of cluster parallelism, and if you have too many you're > introducing even more of the aforementioned overheads. > > > > On Tue, Sep 2, 2014 at 11:24 AM, SK <skrishna...@gmail.com> wrote: > >> Hi, >> >> I evaluated the runtime performance of some of the MLlib classification >> algorithms on a local machine and a cluster with 10 nodes. I used >> standalone >> mode and Spark 1.0.1 in both cases. Here are the results for the total >> runtime: >> Local Cluster >> Logistic regression 138 sec 336 sec >> SVM 138 sec 336 sec >> Decision tree 50 sec 132 sec >> >> My dataset is quite small and my programs are very similar to the mllib >> examples that are included in the Spark distribution. Why is the runtime >> on >> the cluster significantly higher (almost 3 times) than that on the local >> machine even though the former uses more memory and more nodes? Is it >> because of the communication overhead on the cluster? I would like to know >> if there is something I need to be doing to optimize the performance on >> the >> cluster or if others have also been getting similar results. >> >> thanks >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/mllib-performance-on-cluster-tp13290.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >