Re: K-means faster on Mahout then on Spark

2014-09-25 Thread Xiangrui Meng
Please also check the load balance of the RDD on YARN. How many partitions are you using? Does it match the number of CPU cores? -Xiangrui On Thu, Sep 25, 2014 at 12:28 PM, bhusted wrote: > What is the size of your vector mine is set to 20? I am seeing slow results > as well with iteration=5, # o

Re: K-means faster on Mahout then on Spark

2014-09-25 Thread bhusted
What is the size of your vector mine is set to 20? I am seeing slow results as well with iteration=5, # of elements 200,000,000. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/K-means-faster-on-Mahout-then-on-Spark-tp3195p15168.html Sent from the Apach

Re: K-means faster on Mahout then on Spark

2014-03-25 Thread Egor Pahomov
Mahout used MR and made one MR on every iteration. It worked as predicted. My question more about why spark was so slow. I would try MEMORY_AND_DISK_SER 2014-03-25 17:58 GMT+04:00 Suneel Marthi : > Mahout does have a kmeans which can be executed in mapreduce and iterative > modes. > > Sent from

Re: K-means faster on Mahout then on Spark

2014-03-25 Thread Suneel Marthi
Mahout does have a kmeans which can be executed in mapreduce and iterative modes. Sent from my iPhone > On Mar 25, 2014, at 9:25 AM, Prashant Sharma wrote: > > I think Mahout uses FuzzyKmeans, which is different algorithm and it is not > iterative. > > Prashant Sharma > > >> On Tue, Mar 2

Re: K-means faster on Mahout then on Spark

2014-03-25 Thread Prashant Sharma
I think Mahout uses FuzzyKmeans, which is different algorithm and it is not iterative. Prashant Sharma On Tue, Mar 25, 2014 at 6:50 PM, Egor Pahomov wrote: > Hi, I'm running benchmark, which compares Mahout and SparkML. For now I > have next results for k-means: > Number of iterations= 10, numb

Re: K-means faster on Mahout then on Spark

2014-03-25 Thread Guillaume Pitel (eXenSa)
Maybe with "MEMORY_ONLY", spark has to recompute the RDD several times because they don't fit in memory. It makes things run slower. As a general safe rule, use MEMORY_AND_DISK_SER Guillaume Pitel - Président d'eXenSa Prashant Sharma a écrit : >I think Mahout uses FuzzyKmeans, which is dif