Why KMeans with mllib is so slow ?

Jaonary Rabarisoa Fri, 05 Dec 2014 07:53:09 -0800

Hi all,

I'm trying to a run clustering with kmeans algorithm. The size of my data
set is about 240k vectors of dimension 384.


Solving the problem with the kmeans available in julia (kmean++)

http://clusteringjl.readthedocs.org/en/latest/kmeans.html

take about 8 minutes on a single core.

Solving the same problem with spark kmean|| take more than 1.5 hours with 8
cores!!!!

Either they don't implement the same algorithm either I don't understand
how the kmeans in spark works. Is my data not big enough to take full
advantage of spark ? At least, I expect to the same runtime.


Cheers,


Jao

Why KMeans with mllib is so slow ?

Reply via email to