Re: Understanding epsilon in KMeans

2014-05-16 Thread Brian Gawalt
Hi Stuti, I think you're right. The epsilon parameter is indeed used as a threshold for deciding when KMeans has converged. If you look at line 201 of mllib's KMeans.scala: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala#L201 you ca

Re: Understanding epsilon in KMeans

2014-05-16 Thread Krishna Sankar
Stuti, - The two numbers at different contexts, but finally end up in two sides of an && operator. - A parallel K-Means consists of multiple iterations which in turn consists of moving centroids around. A centroids would be deemed stabilized when the root square distance between suc

Re: Understanding epsilon in KMeans

2014-05-16 Thread Long Pham
Stuti, I'm answering your questions in order: 1. From MLLib https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala#L159 *,* you can see that clustering stops when we have reached*maxIterations* or there are no more*activeRuns*. KMeans is e

Re: Understanding epsilon in KMeans

2014-05-16 Thread Xiangrui Meng
In Spark's KMeans, if no cluster center moves more than epsilon in Euclidean distance from previous iteration, the algorithm finishes. No further iterations are performed. For Mahout, you need to check the documentation or the code to see what epsilon means there. -Xiangrui On Wed, May 14, 2014 at

Re: Understanding epsilon in KMeans

2014-05-16 Thread Sean Owen
It is running k-means many times, independently, from different random starting points in order to pick the best clustering. Convergence ends one run, not all of them. Yes epsilon should be the same as "convergence threshold" elsewhere. You can set epsilon if you instantiate KMeans directly. Mayb

RE: Understanding epsilon in KMeans

2014-05-15 Thread Stuti Awasthi
Hi All, Any ideas on this ?? Thanks Stuti Awasthi From: Stuti Awasthi Sent: Wednesday, May 14, 2014 6:20 PM To: user@spark.apache.org Subject: Understanding epsilon in KMeans Hi All, I wanted to understand the functionality of epsilon in KMeans in Spark MLlib. As per documentation : distance