I have the book you mentioned. It basically describes the silhouette method. I do not have it handy as I moved so it is still in some box. However I cannot remember that book providing any other criterion to find the best clusters number. On the other hand I have the same problem with hierarchical clustering techniques. I use clusters as exploratory analysis because I do not have any a-priori knowledge that helps me make a choice. How can multivariate analysis help? I launched a loop where the silhouette test follows PAM which is passed a clusters number increased by 1 at each iteration. Since I am observing that the silhouette value is now oscillating among negative numbers, I wonder whether I can assume that it can only grow worse once it has turned negative the first time so leave the loop after the first negative number and choose the clusters number associated with the biggest positive silhouette value. This procedure would spare a lot of CPU time.
Thank you very much, Maura On Thu, Oct 30, 2008 at 7:25 PM, Dylan Beaudette <[EMAIL PROTECTED]>wrote: > On Thursday 30 October 2008, Maura E Monville wrote: > > I have a pretty big similarity matrix (2870x2870). I will produce even > > bigger ones soon. > > I am using PAM to generate clusters. > > The desired number of output clusters is a PAM input parameter. > > I do not know a-priopri what is the best clusters layout . > > I resorted to the silhouette test. It takes forever as I have to run PAM > > with all possible > > numbers of clusters. > > I wonder whether there is some faster method, either a s/w code or some > > theoretical guidelines, > > to get the optimum clusters number. > > > > Thank you very much, > > This is a very general topic in the field of multivariate analysis. There > really isn't any way to know the 'correct' number of clusters, however > there > are several metrics that can give you an indication of how messy your data > are. > > For information on the methods in the cluster package, see this book: > > Kaufman, L. & Rousseeuw, P. J. Finding Groups in Data An Introduction to > Cluster Analysis Wiley-Interscience, 2005 > > Otherwise, consider a book on multivariate analysis. Alternatively, try a > hierarchical clustering approach, and look for meaningful groupings. Some > thing like this: > > d <- diana(daisy(your_data_matrix)) > d.hc <- as.hclust(d) > > d.hc$labels <- your_data_matrix$id > > plot(d.hc) > > Cheers, > > Dylan > > > -- > Dylan Beaudette > Soil Resource Laboratory > http://casoilresource.lawr.ucdavis.edu/ > University of California at Davis > 530.754.7341 > -- Maura E.M [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.