This example dataset breaks the kmeans in version 2.15.2, installed from the Belgian CRAN, on an Ubuntu 12.04 LTS 64bit
> my.sample.2 Day1 Day2 Day3 Day4 Day5 Day6 [1,] 4 5 5 3 5 5 [2,] 7 7 6 5 6 6 [3,] 6 6 5 5 5 5 [4,] 5 3 4 3 2 4 [5,] 4 3 2 5 3 2 [6,] 6 6 6 5 6 6 [7,] 6 7 6 6 7 6 [8,] 4 3 5 4 5 5 [9,] 3 5 5 5 5 6 [10,] 4 5 3 2 4 4 [11,] 7 7 7 5 7 7 [12,] 3 4 2 2 2 2 [13,] 4 6 6 4 6 6 [14,] 5 6 5 6 6 6 [15,] 4 5 5 5 4 3 [16,] 5 6 6 6 6 6 [17,] 7 7 7 6 7 6 [18,] 3 2 3 3 4 2 [19,] 6 5 5 4 5 4 [20,] 5 4 1 5 1 3 [21,] 4 5 5 4 6 5 [22,] 3 4 6 5 6 3 [23,] 2 3 2 3 3 3 [24,] 5 6 5 3 4 5 [25,] 6 6 6 6 6 6 [26,] 5 4 5 5 5 5 [27,] 5 6 6 1 3 6 [28,] 4 4 4 3 3 5 [29,] 6 7 5 5 4 6 [30,] 3 2 2 2 3 2 [31,] 2 4 1 6 4 3 [32,] 4 6 4 5 4 5 [33,] 3 2 2 3 3 3 [34,] 2 3 6 5 4 4 [35,] 2 2 1 1 1 2 [36,] 2 3 2 3 2 3 [37,] 3 6 5 5 3 5 [38,] 7 3 3 7 3 5 [39,] 2 2 4 4 2 4 [40,] 2 4 3 2 3 2 ## Define a variable > hm.clusters <- 5 ## Performing kmeans with 100 random starts, several times; for 7 times I ## get the 'empty cluster' error > k.liking.ts <- kmeans(my.sample.2, hm.clusters, nstart=100, iter.max=50) Error: empty cluster: try a better set of initial centers > k.liking.ts <- kmeans(my.sample.2, hm.clusters, nstart=100, iter.max=50) Error: empty cluster: try a better set of initial centers > k.liking.ts <- kmeans(my.sample.2, hm.clusters, nstart=100, iter.max=50) Error: empty cluster: try a better set of initial centers > k.liking.ts <- kmeans(my.sample.2, hm.clusters, nstart=100, iter.max=50) Error: empty cluster: try a better set of initial centers > k.liking.ts <- kmeans(my.sample.2, hm.clusters, nstart=100, iter.max=50) Error: empty cluster: try a better set of initial centers > k.liking.ts <- kmeans(my.sample.2, hm.clusters, nstart=100, iter.max=50) Error: empty cluster: try a better set of initial centers > k.liking.ts <- kmeans(my.sample.2, hm.clusters, nstart=100, iter.max=50) Error: empty cluster: try a better set of initial centers > k.liking.ts <- kmeans(my.sample.2, hm.clusters, nstart=100, iter.max=50) ## The next attempt provokes the segmentation fault. Please note that there is ## nothing special with the 7 times reported above; next time it can happen on ## the very first time > k.liking.ts <- kmeans(my.sample.2, hm.clusters, nstart=100, iter.max=50) *** caught segfault *** address 0x10, cause 'memory not mapped' Segmentation fault (core dumped) that's about it ... the attached file has been written with write.table(x, file=...) I clustered the same dataset with R 2.14.1, same computer, same OS, using nstart=1000. And I did it 1000 times. Never had the slightest problem. Moreover, at the cost of repeating myself, the 'empty cluster' is plausibly the symptom of a bug, because it _should_ never happen with the Hartigan-Wong algorithm (default for Kmeans) Kind regards, and thanks again for your time. Luca Nanetti On Sat, Feb 9, 2013 at 8:52 PM, Uwe Ligges <lig...@statistik.tu-dortmund.de>wrote: > We need a reproducible example. > > Uwe Ligges > > > > On 03.02.2013 15:03, Luca Nanetti wrote: > >> Dear experts, >> I am encountering a version-dependent issue. >> >> My laptop runs Ubuntu 12.04 LTS 64-bit, R 2.14.1; the issue explained >> below >> never occurred with this version of R >> My desktop runs Ubuntu 11.10 64-bit, R 2.13.2; what follows applies to >> this >> setup. >> >> The data I'm clustering is constituted by the rows of a 320 x 6 matrix >> containing integers ranging from 1 to 7, no missing data. >> I applied kmeans() to this matrix, literally, 256 x 10â ¶ times using R >> >> version 2.13.2 or 2.14.1, without never experiencing the slightest >> problem. >> My usual setup is with k=5, nstart=256, iter.max=50. >> >> Upgrading to R 2.15.2, I experienced either a warning message ('Empty >> cluster. Choose a better set of initial centers') or a catastrophic >> segfault. The only way I can get a solution whatsoever is putting nstart >> to >> its default value, i.e. 1. However, just repeating the clustering, the >> same >> issue still happen. Moreover, this is vastly suboptimal, because the risk >> of local minima. >> >> Something similar was reported many years ago, see >> https://stat.ethz.ch/**pipermail/r-help/2003-**November/041784.html<https://stat.ethz.ch/pipermail/r-help/2003-November/041784.html>. >> It was >> then suggested that R's behaviour was correct. I'm not familiar with such >> an early R version, but the up-to-date documentation of kmeans clearly >> states that "Except for the Lloyd-Forgy method, k clusters will always be >> returned if a number is specified.". >> I am using the default Hartigan-Wong, and I specify an exact number k: >> thus, k clusters should be returned. They aren't, and the empty cluster is >> then more likely the symptom of a bug rather than the outcome of a 'true' >> local minimum. >> >> Using synaptic, I managed to downgrade R to version 2.13.2. The problem >> disappeard, i.e. the previous message/segfault didn't occur anymore. >> >> Summarizing: given the same dataset, either an unreasonable message or a >> segfault regularly happen in version 2.15.2 by invoking kmeans() on an >> Ubuntu 11.10 64bit machine. This does not happen at all in previous >> versions of R, on the same machine and operating system. >> >> I respectfully suggest that the behaviour shown in the aforementioned >> versions 2.13.2 and 2.14.1 should be considered 'normal', and that version >> 2.15.2 should revert to that. >> >> Kind regards, >> Luca Nanetti. >> >> [[alternative HTML version deleted]] >> >> >> >> ______________________________**________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide http://www.R-project.org/** >> posting-guide.html <http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> >>
"Day1" "Day2" "Day3" "Day4" "Day5" "Day6" "1" 4 5 5 3 5 5 "2" 7 7 6 5 6 6 "3" 6 6 5 5 5 5 "4" 5 3 4 3 2 4 "5" 4 3 2 5 3 2 "6" 6 6 6 5 6 6 "7" 6 7 6 6 7 6 "8" 4 3 5 4 5 5 "9" 3 5 5 5 5 6 "10" 4 5 3 2 4 4 "11" 7 7 7 5 7 7 "12" 3 4 2 2 2 2 "13" 4 6 6 4 6 6 "14" 5 6 5 6 6 6 "15" 4 5 5 5 4 3 "16" 5 6 6 6 6 6 "17" 7 7 7 6 7 6 "18" 3 2 3 3 4 2 "19" 6 5 5 4 5 4 "20" 5 4 1 5 1 3 "21" 4 5 5 4 6 5 "22" 3 4 6 5 6 3 "23" 2 3 2 3 3 3 "24" 5 6 5 3 4 5 "25" 6 6 6 6 6 6 "26" 5 4 5 5 5 5 "27" 5 6 6 1 3 6 "28" 4 4 4 3 3 5 "29" 6 7 5 5 4 6 "30" 3 2 2 2 3 2 "31" 2 4 1 6 4 3 "32" 4 6 4 5 4 5 "33" 3 2 2 3 3 3 "34" 2 3 6 5 4 4 "35" 2 2 1 1 1 2 "36" 2 3 2 3 2 3 "37" 3 6 5 5 3 5 "38" 7 3 3 7 3 5 "39" 2 2 4 4 2 4 "40" 2 4 3 2 3 2
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.