I think the behaviour is correct. If a cluster has not points then it has no centroid. If it has no centroid no points could ever be assigned to it again in the future since there is no way of calculating a distance.
On Tue, Feb 24, 2015 at 6:57 PM, Vasiliki Kalavri <vasilikikala...@gmail.com> wrote: > Hello everyone, > > I'm using the k-means example as basis for a custom implementation and I > noticed the following behavior: If during an iteration no point is assigned > to a particular cluster, this cluster will then "disappear". > This happens because SelectNearestCenter() outputs <centroidId, point> > tuples, (where centroidId is the chosen center by the point) and these are > then grouped by centroidId to compute the new centers. If no point selects > a particular centroid, this centroid will not appear in subsequent > iterations. > > For example, assume we have the points > { (-10, 0), (-8, 0), (2, 0) } and the initial centroids {1, (0, 0)} and {2, > (5, 0)}. > Initially, point (2, 0) will be assigned to centroid 1, but then after > centroid 1 moves closer to (-10, 0) point(2, 0) will not be reassigned to > cluster 2. > > Is this intended behavior? > This seemed odd to me, but I couldn't really find any resources that define > the "correct" behavior.. It seems that handling such a situation is > implementation-specific. I think that if we keep it this way, we might want > to add a comment in the example though :) > > Cheers, > V.