Apache's commons-math implementation offers various strategies for handling this scenarios:
http://commons.apache.org/proper/commons-math/jacoco/org.apache.commons.math3.stat.clustering/KMeansPlusPlusClusterer.java.html (take a look at the EmptyClusterStrategy enum options) 2015-02-24 23:28 GMT+01:00 Aljoscha Krettek <aljos...@apache.org>: > I think the behaviour is correct. If a cluster has not points then it > has no centroid. If it has no centroid no points could ever be > assigned to it again in the future since there is no way of > calculating a distance. > > On Tue, Feb 24, 2015 at 6:57 PM, Vasiliki Kalavri > <vasilikikala...@gmail.com> wrote: > > Hello everyone, > > > > I'm using the k-means example as basis for a custom implementation and I > > noticed the following behavior: If during an iteration no point is > assigned > > to a particular cluster, this cluster will then "disappear". > > This happens because SelectNearestCenter() outputs <centroidId, point> > > tuples, (where centroidId is the chosen center by the point) and these > are > > then grouped by centroidId to compute the new centers. If no point > selects > > a particular centroid, this centroid will not appear in subsequent > > iterations. > > > > For example, assume we have the points > > { (-10, 0), (-8, 0), (2, 0) } and the initial centroids {1, (0, 0)} and > {2, > > (5, 0)}. > > Initially, point (2, 0) will be assigned to centroid 1, but then after > > centroid 1 moves closer to (-10, 0) point(2, 0) will not be reassigned to > > cluster 2. > > > > Is this intended behavior? > > This seemed odd to me, but I couldn't really find any resources that > define > > the "correct" behavior.. It seems that handling such a situation is > > implementation-specific. I think that if we keep it this way, we might > want > > to add a comment in the example though :) > > > > Cheers, > > V. >