I use kmeans to classify spectral events in high and low 1/3 octave bands: #Do cluster analysis CyclA<-data.frame(LlowA,LhghA) CntrA<-matrix(c(0.9,0.8,0.8,0.75,0.65,0.65), nrow = 3, ncol=2, byrow=TRUE) ClstA<-kmeans(CyclA,centers=CntrA,nstart=50,algorithm="MacQueen")
This works well when the actual data shows 1,2 or 3 groups that are not "too close" in a cross plot. The MacQueen algorithm will give one or more empty groups which is what I want. However, there are cases when the groups are closer together, less compact or diffuse which leads to the situation where visually only 2 groups are apparent but the algorithm returns 3 splitting one group in two. I looked at the package 'cluster' specifically at clara (cannot use pam as I have 10000 observations). But clara always returns as many groups as you aks for. Is there a way to help find a seed for the intial cluster centers? Equivalently, is there a way to find a priori the number of groups? I know this is not an easy problem. I have looked at principal components (princomp, prcomp) because there is a connection with cluster analysis. It is not obvious to me how to program that connection though. http://en.wikipedia.org/wiki/Principal_Component_Analysis http://ranger.uta.edu/~chqding/papers/Zha-Kmeans.pdf http://ranger.uta.edu/~chqding/papers/KmeansPCA1.pdf Thanks in advance, Alex van der Spek ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.