It is running k-means many times, independently, from different random starting points in order to pick the best clustering. Convergence ends one run, not all of them.
Yes epsilon should be the same as "convergence threshold" elsewhere. You can set epsilon if you instantiate KMeans directly. Maybe it would be nice to overload train() to be able to set that too, but I imagine the point of the static convenience methods is to encapsulate the most usual subsets of parameters. On Wed, May 14, 2014 at 1:50 PM, Stuti Awasthi <stutiawas...@hcl.com> wrote: > Hi All, > > > > I wanted to understand the functionality of epsilon in KMeans in Spark > MLlib. > > > > As per documentation : > > distance threshold within which we've consider centers to have converged.If > all centers move less than this Euclidean distance, we stop iterating one > run. > > > > Now I have assumed that if centers are moving less than epsilon value then > Clustering Stops but then what does it mean by “we stop iterating one run”.. > > Now suppose I have given maxIterations=10 and epsilon = 0.1 and assume that > centers are afteronly 2 iteration, the epsilon condition is met i.e. now > centers are moving only less than 0.1.. > > > > Now what happens ?? The whole 10 iterations are completed OR the Clustering > stops ?? > > > > My 2nd query is in Mahout, there is a configuration param : “Convergence > Threshold (cd)” which states : “in an iteration, the centroids don’t move > more than this distance, no further iterations are done and clustering > stops.” > > > > So is epsilon and cd similar ?? > > > > 3rd query : > > How to pass epsilon as a configurable param. KMeans.train() does not provide > the way but in code I can see “setEpsilon” as method. SO if I want to pass > the parameter as epsilon=0.1 , how may I do that.. > > > > Pardon my ignorance > > > > Thanks > > Stuti Awasthi > > > > > > > > ::DISCLAIMER:: > ---------------------------------------------------------------------------------------------------------------------------------------------------- > > The contents of this e-mail and any attachment(s) are confidential and > intended for the named recipient(s) only. > E-mail transmission is not guaranteed to be secure or error-free as > information could be intercepted, corrupted, > lost, destroyed, arrive late or incomplete, or may contain viruses in > transmission. The e mail and its contents > (with or without referred errors) shall therefore not attach any liability > on the originator or HCL or its affiliates. > Views or opinions, if any, presented in this email are solely those of the > author and may not necessarily reflect the > views or opinions of HCL or its affiliates. Any form of reproduction, > dissemination, copying, disclosure, modification, > distribution and / or publication of this message without the prior written > consent of authorized representative of > HCL is strictly prohibited. If you have received this email in error please > delete it and notify the sender immediately. > Before opening any email and/or attachments, please check them for viruses > and other defects. > > ----------------------------------------------------------------------------------------------------------------------------------------------------