Hello all,

k-means algorithms can at times fail because one of the cluster become emmpty. In this case, the kmeans R function returns:
"empty cluster: try a better set of initial centers"

This has been discussed several times on several R-lists, and is NOT a bug, but can be annoying when using k-means in complex simulation where this error brings everything to a stop. One can use try() or tryCatch() to avoid this, but this is just a programming trick.

I was wondering if anyone knows about a R implementation of k-means that prevent this problem to happen. An very simple algorithm is proposed in (Pakhira, A Modified k-means Algorithm to Avoid Empty Clusters; International Journal of Recent Trends in Engineering, Vol 1, No. 1, May 2009), in which the solution is simply to add the current cluster centers to the datapoints when computing new cluster centers at the next iteration. I could code that in pure R but that would be really slow, and I'm too dumb to modify the current internal implementation. If guys in R-dev think it is worth it, maybe this could be an option available in a future version of kmeans?

Any suggestion would be appreciated.

simon

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to