Stuti, - The two numbers at different contexts, but finally end up in two sides of an && operator. - A parallel K-Means consists of multiple iterations which in turn consists of moving centroids around. A centroids would be deemed stabilized when the root square distance between successive iterations <= epsilon - The iterations would end if the max iterations is reached or centers are converged - Just to complete the story, the centers on the iteration with the min cost is chosen - In Spark 1.0 rc7, max iterations = 20 & epsilon = 0.0001 - Back to your questions: - [1] In your scenario, it will stop at 2 iterations. - The documentation & the code is a little unclear. "Run" is used in different ways in the same code - for example Run means one iteration over all the centeroids. Run also means number of iterations to be executed in parallel ! - Epsilon is always an artifact for convergence. The documentation of spark says "*epsilon* determines the distance threshold within which we consider k-means to have converged." The sentence "If all centers move less than this Euclidean distance, we stop iterating one run." should be "... , we stop one iteration" - I think. - [2] Yep, the cd in Mahout and epsilon does the same functionality - [3] I can't see a way to pass Epsilon. I think the train method needs one more parameter, epsilon. I need to check this more closely.
HTH & Cheers <k/> On Wed, May 14, 2014 at 5:50 AM, Stuti Awasthi <stutiawas...@hcl.com> wrote: > Hi All, > > > > I wanted to understand the functionality of epsilon in KMeans in Spark > MLlib. > > > > As per documentation : > > distance threshold within which we've consider centers to have > converged.If all centers move less than this *Euclidean* distance, we > stop iterating one run. > > > > Now I have assumed that if centers are moving less than epsilon value then > Clustering Stops but then what does it mean by “we stop iterating one run”.. > > > Now suppose I have given maxIterations=10 and epsilon = 0.1 and assume > that centers are afteronly 2 iteration, the epsilon condition is met i.e. > now centers are moving only less than 0.1.. > > > > Now what happens ?? The whole 10 iterations are completed OR the > Clustering stops ?? > > > > My 2nd query is in Mahout, there is a configuration param : “Convergence > Threshold (cd)” which states : “in an iteration, the centroids don’t move > more than this distance, no further iterations are done and clustering > stops.” > > > > So is epsilon and cd similar ?? > > > > 3rd query : > > How to pass epsilon as a configurable param. KMeans.train() does not > provide the way but in code I can see “setEpsilon” as method. SO if I want > to pass the parameter as epsilon=0.1 , how may I do that.. > > > > Pardon my ignorance > > > > Thanks > > Stuti Awasthi > > > > > > > > ::DISCLAIMER:: > > ---------------------------------------------------------------------------------------------------------------------------------------------------- > > The contents of this e-mail and any attachment(s) are confidential and > intended for the named recipient(s) only. > E-mail transmission is not guaranteed to be secure or error-free as > information could be intercepted, corrupted, > lost, destroyed, arrive late or incomplete, or may contain viruses in > transmission. The e mail and its contents > (with or without referred errors) shall therefore not attach any liability > on the originator or HCL or its affiliates. > Views or opinions, if any, presented in this email are solely those of the > author and may not necessarily reflect the > views or opinions of HCL or its affiliates. Any form of reproduction, > dissemination, copying, disclosure, modification, > distribution and / or publication of this message without the prior > written consent of authorized representative of > HCL is strictly prohibited. If you have received this email in error > please delete it and notify the sender immediately. > Before opening any email and/or attachments, please check them for viruses > and other defects. > > > ---------------------------------------------------------------------------------------------------------------------------------------------------- >