Dear experts, I am encountering a version-dependent issue. My laptop runs Ubuntu 12.04 LTS 64-bit, R 2.14.1; the issue explained below never occurred with this version of R My desktop runs Ubuntu 11.10 64-bit, R 2.13.2; what follows applies to this setup.
The data I'm clustering is constituted by the rows of a 320 x 6 matrix containing integers ranging from 1 to 7, no missing data. I applied kmeans() to this matrix, literally, 256 x 10ⶠtimes using R version 2.13.2 or 2.14.1, without never experiencing the slightest problem. My usual setup is with k=5, nstart=256, iter.max=50. Upgrading to R 2.15.2, I experienced either a warning message ('Empty cluster. Choose a better set of initial centers') or a catastrophic segfault. The only way I can get a solution whatsoever is putting nstart to its default value, i.e. 1. However, just repeating the clustering, the same issue still happen. Moreover, this is vastly suboptimal, because the risk of local minima. Something similar was reported many years ago, see https://stat.ethz.ch/pipermail/r-help/2003-November/041784.html. It was then suggested that R's behaviour was correct. I'm not familiar with such an early R version, but the up-to-date documentation of kmeans clearly states that "Except for the Lloyd-Forgy method, k clusters will always be returned if a number is specified.". I am using the default Hartigan-Wong, and I specify an exact number k: thus, k clusters should be returned. They aren't, and the empty cluster is then more likely the symptom of a bug rather than the outcome of a 'true' local minimum. Using synaptic, I managed to downgrade R to version 2.13.2. The problem disappeard, i.e. the previous message/segfault didn't occur anymore. Summarizing: given the same dataset, either an unreasonable message or a segfault regularly happen in version 2.15.2 by invoking kmeans() on an Ubuntu 11.10 64bit machine. This does not happen at all in previous versions of R, on the same machine and operating system. I respectfully suggest that the behaviour shown in the aforementioned versions 2.13.2 and 2.14.1 should be considered 'normal', and that version 2.15.2 should revert to that. Kind regards, Luca Nanetti. [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.