We need a reproducible example.

Uwe Ligges


On 03.02.2013 15:03, Luca Nanetti wrote:
Dear experts,
I am encountering a version-dependent issue.

My laptop runs Ubuntu 12.04 LTS 64-bit, R 2.14.1; the issue explained below
never occurred with this version of R
My desktop runs Ubuntu 11.10 64-bit, R 2.13.2; what follows applies to this
setup.

The data I'm clustering is constituted by the rows of a 320 x 6 matrix
containing integers ranging from 1 to 7, no missing data.
I applied kmeans() to this matrix, literally, 256 x 10� times using R
version 2.13.2 or 2.14.1, without never experiencing the slightest problem.
My usual setup is with k=5, nstart=256, iter.max=50.

Upgrading to R 2.15.2, I experienced either a warning message ('Empty
cluster. Choose a better set of initial centers') or a catastrophic
segfault. The only way I can get a solution whatsoever is putting nstart to
its default value, i.e. 1. However, just repeating the clustering, the same
issue still happen. Moreover, this is vastly suboptimal, because the risk
of local minima.

Something similar was reported many years ago, see
https://stat.ethz.ch/pipermail/r-help/2003-November/041784.html. It was
then suggested that R's behaviour was correct. I'm not familiar with such
an early R version, but the up-to-date documentation of kmeans clearly
states that "Except for the Lloyd-Forgy method, k clusters will always be
returned if a number is specified.".
I am using the default Hartigan-Wong, and I specify an exact number k:
thus, k clusters should be returned. They aren't, and the empty cluster is
then more likely the symptom of a bug rather than the outcome of a 'true'
local minimum.

Using synaptic, I managed to downgrade R to version 2.13.2. The problem
disappeard, i.e. the previous message/segfault didn't occur anymore.

Summarizing: given the same dataset, either an unreasonable message or a
segfault regularly happen in version 2.15.2 by invoking kmeans() on an
Ubuntu 11.10 64bit machine. This does not happen at all in previous
versions of R, on the same machine and operating system.

I respectfully suggest that the behaviour shown in the aforementioned
versions 2.13.2 and 2.14.1 should be considered 'normal', and that version
2.15.2 should revert to that.

Kind regards,
Luca Nanetti.

        [[alternative HTML version deleted]]



______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to