Hi Stuti, I think you're right. The epsilon parameter is indeed used as a threshold for deciding when KMeans has converged. If you look at line 201 of mllib's KMeans.scala:
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala#L201 you can see that if any center moves more than epsilon units away from its prior position, in an L2-norm sense, then the algorithm has NOT converged. `changed` is set to true, and the outer `while` loop repeats. Your intuition was correct; you can use .setEpsilon to control this threshold value for deciding if any center has moved far enough to be considered a non-converged iteration of the algorithm. Best, --Brian On Wed, May 14, 2014 at 8:35 PM, Stuti Awasthi <stutiawas...@hcl.com> wrote: > Hi All, > > > > Any ideas on this ?? > > > > Thanks > > Stuti Awasthi > > > > *From:* Stuti Awasthi > *Sent:* Wednesday, May 14, 2014 6:20 PM > *To:* user@spark.apache.org > *Subject:* Understanding epsilon in KMeans > > > > Hi All, > > > > I wanted to understand the functionality of epsilon in KMeans in Spark > MLlib. > > > > As per documentation : > > distance threshold within which we've consider centers to have > converged.If all centers move less than this *Euclidean* distance, we > stop iterating one run. > > > > Now I have assumed that if centers are moving less than epsilon value then > Clustering Stops but then what does it mean by “we stop iterating one run”.. > > > Now suppose I have given maxIterations=10 and epsilon = 0.1 and assume > that centers are afteronly 2 iteration, the epsilon condition is met i.e. > now centers are moving only less than 0.1.. > > > > Now what happens ?? The whole 10 iterations are completed OR the > Clustering stops ?? > > > > My 2nd query is in Mahout, there is a configuration param : “Convergence > Threshold (cd)” which states : “in an iteration, the centroids don’t move > more than this distance, no further iterations are done and clustering > stops.” > > > > So is epsilon and cd similar ?? > > > > 3rd query : > > How to pass epsilon as a configurable param. KMeans.train() does not > provide the way but in code I can see “setEpsilon” as method. SO if I want > to pass the parameter as epsilon=0.1 , how may I do that.. > > > > Pardon my ignorance > > > > Thanks > > Stuti Awasthi > > > > > > > > ::DISCLAIMER:: > > ---------------------------------------------------------------------------------------------------------------------------------------------------- > > The contents of this e-mail and any attachment(s) are confidential and > intended for the named recipient(s) only. > E-mail transmission is not guaranteed to be secure or error-free as > information could be intercepted, corrupted, > lost, destroyed, arrive late or incomplete, or may contain viruses in > transmission. The e mail and its contents > (with or without referred errors) shall therefore not attach any liability > on the originator or HCL or its affiliates. > Views or opinions, if any, presented in this email are solely those of the > author and may not necessarily reflect the > views or opinions of HCL or its affiliates. Any form of reproduction, > dissemination, copying, disclosure, modification, > distribution and / or publication of this message without the prior > written consent of authorized representative of > HCL is strictly prohibited. If you have received this email in error > please delete it and notify the sender immediately. > Before opening any email and/or attachments, please check them for viruses > and other defects. > > > ---------------------------------------------------------------------------------------------------------------------------------------------------- >