Stuti,

   - The two numbers at different contexts, but finally end up in two sides
   of an && operator.
   - A parallel K-Means consists of multiple iterations which in turn
   consists of moving centroids around. A centroids would be deemed stabilized
   when the root square distance between successive iterations <= epsilon
   - The iterations would end if the max iterations is reached or centers
   are converged
   - Just to complete the story, the centers on the iteration with the min
   cost is chosen
   - In Spark 1.0 rc7, max iterations = 20 & epsilon = 0.0001
   - Back to your questions:
      - [1] In your scenario, it will stop at 2 iterations.
      - The documentation & the code is a little unclear. "Run" is used in
         different ways in the same code - for example Run means one
iteration over
         all the centeroids. Run also means number of iterations to be
executed in
         parallel !
         - Epsilon is always an artifact for convergence. The documentation
         of spark says "*epsilon* determines the distance threshold within
         which we consider k-means to have converged." The sentence "If all
         centers move less than this Euclidean distance, we stop
iterating one run."
         should be "... , we stop one iteration" - I think.
      - [2] Yep, the cd in Mahout and epsilon does the same functionality
      - [3] I can't see a way to pass Epsilon. I think the train method
      needs one more parameter, epsilon. I need to check this more closely.


HTH & Cheers
<k/>


On Wed, May 14, 2014 at 5:50 AM, Stuti Awasthi <stutiawas...@hcl.com> wrote:

>  Hi All,
>
>
>
> I wanted to understand the functionality of epsilon in KMeans in Spark
> MLlib.
>
>
>
> As per documentation :
>
> distance threshold within which we've consider centers to have
> converged.If all centers move less than this *Euclidean* distance, we
> stop iterating one run.
>
>
>
> Now I have assumed that if centers are moving less than epsilon value then
> Clustering Stops but then what does it mean by “we stop iterating one run”..
>
>
> Now suppose I have given maxIterations=10  and epsilon = 0.1 and assume
> that centers are afteronly 2 iteration, the epsilon condition is met i.e.
> now centers are moving only less than 0.1..
>
>
>
> Now what happens ?? The whole 10 iterations are completed OR the
> Clustering stops ??
>
>
>
> My 2nd query is in Mahout, there is a configuration param : “Convergence
> Threshold (cd)”   which states : “in an iteration, the centroids don’t move
> more than this distance, no further iterations are done and clustering
> stops.”
>
>
>
> So is epsilon and cd similar ??
>
>
>
> 3rd query :
>
> How to pass epsilon as a configurable param. KMeans.train() does not
> provide the way but in code I can see “setEpsilon” as method. SO if I want
> to pass the parameter as epsilon=0.1 , how may I do that..
>
>
>
> Pardon my ignorance
>
>
>
> Thanks
>
> Stuti Awasthi
>
>
>
>
>
>
>
> ::DISCLAIMER::
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>
> The contents of this e-mail and any attachment(s) are confidential and
> intended for the named recipient(s) only.
> E-mail transmission is not guaranteed to be secure or error-free as
> information could be intercepted, corrupted,
> lost, destroyed, arrive late or incomplete, or may contain viruses in
> transmission. The e mail and its contents
> (with or without referred errors) shall therefore not attach any liability
> on the originator or HCL or its affiliates.
> Views or opinions, if any, presented in this email are solely those of the
> author and may not necessarily reflect the
> views or opinions of HCL or its affiliates. Any form of reproduction,
> dissemination, copying, disclosure, modification,
> distribution and / or publication of this message without the prior
> written consent of authorized representative of
> HCL is strictly prohibited. If you have received this email in error
> please delete it and notify the sender immediately.
> Before opening any email and/or attachments, please check them for viruses
> and other defects.
>
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>

Reply via email to