On Thursday 30 October 2008, Maura E Monville wrote: > I have the book you mentioned. It basically describes the silhouette > method. I do not have it handy as I moved so it is still in some box. > However I cannot remember that book providing any other criterion to find > the best clusters number. > On the other hand I have the same problem with hierarchical clustering > techniques. > I use clusters as exploratory analysis because I do not have any a-priori > knowledge that helps me make a choice. > How can multivariate analysis help? > I launched a loop where the silhouette test follows PAM which is passed a > clusters number increased by 1 at each iteration. > Since I am observing that the silhouette value is now oscillating among > negative numbers, I wonder whether I can assume that it can only grow worse > once it has turned negative the first time so leave the loop after the > first negative number and choose the clusters number associated with the > biggest positive silhouette value. > This procedure would spare a lot of CPU time.
Another approach might involve the stepFlexclust() from the flexclust package. See the manual page for this function for examples. Dylan > Thank you very much, > Maura > > On Thu, Oct 30, 2008 at 7:25 PM, Dylan Beaudette > > <[EMAIL PROTECTED]>wrote: > > On Thursday 30 October 2008, Maura E Monville wrote: > > > I have a pretty big similarity matrix (2870x2870). I will produce even > > > bigger ones soon. > > > I am using PAM to generate clusters. > > > The desired number of output clusters is a PAM input parameter. > > > I do not know a-priopri what is the best clusters layout . > > > I resorted to the silhouette test. It takes forever as I have to run > > > PAM with all possible > > > numbers of clusters. > > > I wonder whether there is some faster method, either a s/w code or some > > > theoretical guidelines, > > > to get the optimum clusters number. > > > > > > Thank you very much, > > > > This is a very general topic in the field of multivariate analysis. There > > really isn't any way to know the 'correct' number of clusters, however > > there > > are several metrics that can give you an indication of how messy your > > data are. > > > > For information on the methods in the cluster package, see this book: > > > > Kaufman, L. & Rousseeuw, P. J. Finding Groups in Data An Introduction to > > Cluster Analysis Wiley-Interscience, 2005 > > > > Otherwise, consider a book on multivariate analysis. Alternatively, try a > > hierarchical clustering approach, and look for meaningful groupings. Some > > thing like this: > > > > d <- diana(daisy(your_data_matrix)) > > d.hc <- as.hclust(d) > > > > d.hc$labels <- your_data_matrix$id > > > > plot(d.hc) > > > > Cheers, > > > > Dylan > > > > > > -- > > Dylan Beaudette > > Soil Resource Laboratory > > http://casoilresource.lawr.ucdavis.edu/ > > University of California at Davis > > 530.754.7341 -- Dylan Beaudette Soil Resource Laboratory http://casoilresource.lawr.ucdavis.edu/ University of California at Davis 530.754.7341 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.