I think Mahout uses FuzzyKmeans, which is different algorithm and it is not iterative.
Prashant Sharma On Tue, Mar 25, 2014 at 6:50 PM, Egor Pahomov <pahomov.e...@gmail.com>wrote: > Hi, I'm running benchmark, which compares Mahout and SparkML. For now I > have next results for k-means: > Number of iterations= 10, number of elements = 10000000, mahouttime= 602, > spark time = 138 > Number of iterations= 40, number of elements = 10000000, mahouttime= 1917, > spark time = 330 > Number of iterations= 70, number of elements = 10000000, mahouttime= 3203, > spark time = 388 > Number of iterations= 10, number of elements = 100000000, mahouttime= > 1235, spark time = 2226 > Number of iterations= 40, number of elements = 100000000, mahouttime= > 2755, spark time = 6388 > Number of iterations= 70, number of elements = 100000000, mahouttime= > 4107, spark time = 10967 > Number of iterations= 10, number of elements = 1000000000, mahouttime= > 7070, spark time = 25268 > > Time in seconds. It runs on Yarn cluster with about 40 machines. Elements > for clusterization are randomly created. When I changed persistence level > from Memory to Memory_and_disk, on big data spark started to work faster. > > What am I missing? > > See my benchmarking code in attachment. > > > -- > > > > *Sincerely yours Egor PakhomovScala Developer, Yandex* >