In Spark's KMeans, if no cluster center moves more than epsilon in
Euclidean distance from previous iteration, the algorithm finishes. No
further iterations are performed. For Mahout, you need to check the
documentation or the code to see what epsilon means there. -Xiangrui

On Wed, May 14, 2014 at 8:35 PM, Stuti Awasthi <stutiawas...@hcl.com> wrote:
> Hi All,
>
>
>
> Any ideas on this ??
>
>
>
> Thanks
>
> Stuti Awasthi
>
>
>
> From: Stuti Awasthi
> Sent: Wednesday, May 14, 2014 6:20 PM
> To: user@spark.apache.org
> Subject: Understanding epsilon in KMeans
>
>
>
> Hi All,
>
>
>
> I wanted to understand the functionality of epsilon in KMeans in Spark
> MLlib.
>
>
>
> As per documentation :
>
> distance threshold within which we've consider centers to have converged.If
> all centers move less than this Euclidean distance, we stop iterating one
> run.
>
>
>
> Now I have assumed that if centers are moving less than epsilon value then
> Clustering Stops but then what does it mean by “we stop iterating one run”..
>
> Now suppose I have given maxIterations=10  and epsilon = 0.1 and assume that
> centers are afteronly 2 iteration, the epsilon condition is met i.e. now
> centers are moving only less than 0.1..
>
>
>
> Now what happens ?? The whole 10 iterations are completed OR the Clustering
> stops ??
>
>
>
> My 2nd query is in Mahout, there is a configuration param : “Convergence
> Threshold (cd)”   which states : “in an iteration, the centroids don’t move
> more than this distance, no further iterations are done and clustering
> stops.”
>
>
>
> So is epsilon and cd similar ??
>
>
>
> 3rd query :
>
> How to pass epsilon as a configurable param. KMeans.train() does not provide
> the way but in code I can see “setEpsilon” as method. SO if I want to pass
> the parameter as epsilon=0.1 , how may I do that..
>
>
>
> Pardon my ignorance
>
>
>
> Thanks
>
> Stuti Awasthi
>
>
>
>
>
>
>
> ::DISCLAIMER::
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>
> The contents of this e-mail and any attachment(s) are confidential and
> intended for the named recipient(s) only.
> E-mail transmission is not guaranteed to be secure or error-free as
> information could be intercepted, corrupted,
> lost, destroyed, arrive late or incomplete, or may contain viruses in
> transmission. The e mail and its contents
> (with or without referred errors) shall therefore not attach any liability
> on the originator or HCL or its affiliates.
> Views or opinions, if any, presented in this email are solely those of the
> author and may not necessarily reflect the
> views or opinions of HCL or its affiliates. Any form of reproduction,
> dissemination, copying, disclosure, modification,
> distribution and / or publication of this message without the prior written
> consent of authorized representative of
> HCL is strictly prohibited. If you have received this email in error please
> delete it and notify the sender immediately.
> Before opening any email and/or attachments, please check them for viruses
> and other defects.
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------

Reply via email to