Apache's commons-math implementation offers various strategies for handling
this scenarios:

http://commons.apache.org/proper/commons-math/jacoco/org.apache.commons.math3.stat.clustering/KMeansPlusPlusClusterer.java.html

(take a look at the EmptyClusterStrategy enum options)

2015-02-24 23:28 GMT+01:00 Aljoscha Krettek <aljos...@apache.org>:

> I think the behaviour is correct. If a cluster has not points then it
> has no centroid. If it has no centroid no points could ever be
> assigned to it again in the future since there is no way of
> calculating a distance.
>
> On Tue, Feb 24, 2015 at 6:57 PM, Vasiliki Kalavri
> <vasilikikala...@gmail.com> wrote:
> > Hello everyone,
> >
> > I'm using the k-means example as basis for a custom implementation and I
> > noticed the following behavior: If during an iteration no point is
> assigned
> > to a particular cluster, this cluster will then "disappear".
> > This happens because SelectNearestCenter() outputs <centroidId, point>
> > tuples, (where centroidId is the chosen center by the point) and these
> are
> > then grouped by centroidId to compute the new centers. If no point
> selects
> > a particular centroid, this centroid will not appear in subsequent
> > iterations.
> >
> > For example, assume we have the points
> > { (-10, 0), (-8, 0), (2, 0) } and the initial centroids {1, (0, 0)} and
> {2,
> > (5, 0)}.
> > Initially, point (2, 0) will be assigned to centroid 1, but then after
> > centroid 1 moves closer to (-10, 0) point(2, 0) will not be reassigned to
> > cluster 2.
> >
> > Is this intended behavior?
> > This seemed odd to me, but I couldn't really find any resources that
> define
> > the "correct" behavior.. It seems that handling such a situation is
> > implementation-specific. I think that if we keep it this way, we might
> want
> > to add a comment in the example though :)
> >
> > Cheers,
> > V.
>

Reply via email to