Hi all, I'm trying to a run clustering with kmeans algorithm. The size of my data set is about 240k vectors of dimension 384.
Solving the problem with the kmeans available in julia (kmean++) http://clusteringjl.readthedocs.org/en/latest/kmeans.html take about 8 minutes on a single core. Solving the same problem with spark kmean|| take more than 1.5 hours with 8 cores!!!! Either they don't implement the same algorithm either I don't understand how the kmeans in spark works. Is my data not big enough to take full advantage of spark ? At least, I expect to the same runtime. Cheers, Jao