Please also check the load balance of the RDD on YARN. How many
partitions are you using? Does it match the number of CPU cores?
-Xiangrui
On Thu, Sep 25, 2014 at 12:28 PM, bhusted wrote:
> What is the size of your vector mine is set to 20? I am seeing slow results
> as well with iteration=5, # o
What is the size of your vector mine is set to 20? I am seeing slow results
as well with iteration=5, # of elements 200,000,000.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/K-means-faster-on-Mahout-then-on-Spark-tp3195p15168.html
Sent from the Apach
Mahout used MR and made one MR on every iteration. It worked as predicted.
My question more about why spark was so slow. I would try
MEMORY_AND_DISK_SER
2014-03-25 17:58 GMT+04:00 Suneel Marthi :
> Mahout does have a kmeans which can be executed in mapreduce and iterative
> modes.
>
> Sent from
Mahout does have a kmeans which can be executed in mapreduce and iterative
modes.
Sent from my iPhone
> On Mar 25, 2014, at 9:25 AM, Prashant Sharma wrote:
>
> I think Mahout uses FuzzyKmeans, which is different algorithm and it is not
> iterative.
>
> Prashant Sharma
>
>
>> On Tue, Mar 2
I think Mahout uses FuzzyKmeans, which is different algorithm and it is not
iterative.
Prashant Sharma
On Tue, Mar 25, 2014 at 6:50 PM, Egor Pahomov wrote:
> Hi, I'm running benchmark, which compares Mahout and SparkML. For now I
> have next results for k-means:
> Number of iterations= 10, numb
Maybe with "MEMORY_ONLY", spark has to recompute the RDD several times because
they don't fit in memory. It makes things run slower.
As a general safe rule, use MEMORY_AND_DISK_SER
Guillaume Pitel - Président d'eXenSa
Prashant Sharma a écrit :
>I think Mahout uses FuzzyKmeans, which is dif