[R] Empty clusters in k-means - possible solution

Simon Chamaillé Wed, 15 May 2013 06:50:56 -0700

Hello all,

k-means algorithms can at times fail because one of the cluster becomeemmpty. In this case, the kmeans R function returns:

"empty cluster: try a better set of initial centers"

This has been discussed several times on several R-lists, and is NOT abug, but can be annoying when using k-means in complex simulation wherethis error brings everything to a stop. One can use try() or tryCatch()to avoid this, but this is just a programming trick.

I was wondering if anyone knows about a R implementation of k-means thatprevent this problem to happen. An very simple algorithm is proposed in(Pakhira, A Modified k-means Algorithm to Avoid EmptyClusters; International Journal of Recent Trends in Engineering, Vol 1,No. 1, May 2009), in which the solution is simply to add the currentcluster centers to the datapoints when computing new cluster centers atthe next iteration. I could code that in pure R but that would be reallyslow, and I'm too dumb to modify the current internal implementation. Ifguys in R-dev think it is worth it, maybe this could be an optionavailable in a future version of kmeans?


Any suggestion would be appreciated.

simon

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Empty clusters in k-means - possible solution

Reply via email to