On 13.03.2013 13:45, Dr. Detlef Groth wrote:
Hello,
here is a working reproducible example which crashes R using kmeans or
gives empty clusters using the nstart option with R 15.2.
library(cluster)
kmeans(ruspini,4)
kmeans(ruspini,4,nstart=2)
kmeans(ruspini,4,nstart=4)
kmeans(ruspini,4,nstart=10)
?kmeans
either we got empty always clusters and or, after some further commands
an segfault.
Yes, thanks, I can reproduce it in 2.15.3, but not in R-prerelease.
Maybe this is a side effect of a bug already fixed in R-prerelease.
Since R-2.15.3 is frozen now, please upgrade to R-prerelease to become
R-3.0.0 in April.
Best,
Uwe Ligges
regards,
Detlef Groth
------------
[R] Empty cluster / segfault using vanilla kmeans with version 2.15.2
Uwe Ligges ligges at statistik.tu-dortmund.de
Sat Feb 9 20:52:19 CET 2013
Previous message: [R] Empty cluster / segfault using vanilla kmeans
with version 2.15.2
Next message: [R] Fractional logit in GLM?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
We need a reproducible example.
Uwe Ligges
On 03.02.2013 15:03, Luca Nanetti wrote:
Dear experts,
I am encountering a version-dependent issue.
My laptop runs Ubuntu 12.04 LTS 64-bit, R 2.14.1; the issue explained
below
never occurred with this version of R
My desktop runs Ubuntu 11.10 64-bit, R 2.13.2; what follows applies to
this
setup.
The data I'm clustering is constituted by the rows of a 320 x 6 matrix
containing integers ranging from 1 to 7, no missing data.
I applied kmeans() to this matrix, literally, 256 x 10� times using R
version 2.13.2 or 2.14.1, without never experiencing the slightest
problem.
My usual setup is with k=5, nstart=256, iter.max=50.
Upgrading to R 2.15.2, I experienced either a warning message ('Empty
cluster. Choose a better set of initial centers') or a catastrophic
segfault. The only way I can get a solution whatsoever is putting
nstart to
its default value, i.e. 1. However, just repeating the clustering, the
same
issue still happen. Moreover, this is vastly suboptimal, because the risk
of local minima.
Something similar was reported many years ago, see
https://stat.ethz.ch/pipermail/r-help/2003-November/041784.html. It was
then suggested that R's behaviour was correct. I'm not familiar with such
an early R version, but the up-to-date documentation of kmeans clearly
states that "Except for the Lloyd-Forgy method, k clusters will always be
returned if a number is specified.".
I am using the default Hartigan-Wong, and I specify an exact number k:
thus, k clusters should be returned. They aren't, and the empty
cluster is
then more likely the symptom of a bug rather than the outcome of a 'true'
local minimum.
Using synaptic, I managed to downgrade R to version 2.13.2. The problem
disappeard, i.e. the previous message/segfault didn't occur anymore.
Summarizing: given the same dataset, either an unreasonable message or a
segfault regularly happen in version 2.15.2 by invoking kmeans() on an
Ubuntu 11.10 64bit machine. This does not happen at all in previous
versions of R, on the same machine and operating system.
I respectfully suggest that the behaviour shown in the aforementioned
versions 2.13.2 and 2.14.1 should be considered 'normal', and that
version
2.15.2 should revert to that.
Kind regards,
Luca Nanetti.
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.