My problem is that I have already a distance (similarity) matrix generated outside R through a C++ code because the criteria to calculate the "distance" between pairs of objects are none of the standard criteria implemented in R. If I got it right, but I might be mistaken, stepFlexclust() performs the clusters layout optimization by calling either one of kcca or cclust which calculate their own similarity matrix. I just need a function or method to optimize the clusters number no matter how the similarity matrix has been generated and no matter which clustering function I use (PAM). Is this at all possible ?
Thank you very much, Maura On Fri, Oct 31, 2008 at 12:18 AM, Dylan Beaudette <[EMAIL PROTECTED] > wrote: > On Thursday 30 October 2008, Maura E Monville wrote: > > I have the book you mentioned. It basically describes the silhouette > > method. I do not have it handy as I moved so it is still in some box. > > However I cannot remember that book providing any other criterion to find > > the best clusters number. > > On the other hand I have the same problem with hierarchical clustering > > techniques. > > I use clusters as exploratory analysis because I do not have any a-priori > > knowledge that helps me make a choice. > > How can multivariate analysis help? > > I launched a loop where the silhouette test follows PAM which is passed a > > clusters number increased by 1 at each iteration. > > Since I am observing that the silhouette value is now oscillating among > > negative numbers, I wonder whether I can assume that it can only grow > worse > > once it has turned negative the first time so leave the loop after the > > first negative number and choose the clusters number associated with the > > biggest positive silhouette value. > > This procedure would spare a lot of CPU time. > > Another approach might involve the stepFlexclust() from the flexclust > package. > See the manual page for this function for examples. > > Dylan > > > > Thank you very much, > > Maura > > > > On Thu, Oct 30, 2008 at 7:25 PM, Dylan Beaudette > > > > <[EMAIL PROTECTED]>wrote: > > > On Thursday 30 October 2008, Maura E Monville wrote: > > > > I have a pretty big similarity matrix (2870x2870). I will produce > even > > > > bigger ones soon. > > > > I am using PAM to generate clusters. > > > > The desired number of output clusters is a PAM input parameter. > > > > I do not know a-priopri what is the best clusters layout . > > > > I resorted to the silhouette test. It takes forever as I have to run > > > > PAM with all possible > > > > numbers of clusters. > > > > I wonder whether there is some faster method, either a s/w code or > some > > > > theoretical guidelines, > > > > to get the optimum clusters number. > > > > > > > > Thank you very much, > > > > > > This is a very general topic in the field of multivariate analysis. > There > > > really isn't any way to know the 'correct' number of clusters, however > > > there > > > are several metrics that can give you an indication of how messy your > > > data are. > > > > > > For information on the methods in the cluster package, see this book: > > > > > > Kaufman, L. & Rousseeuw, P. J. Finding Groups in Data An Introduction > to > > > Cluster Analysis Wiley-Interscience, 2005 > > > > > > Otherwise, consider a book on multivariate analysis. Alternatively, try > a > > > hierarchical clustering approach, and look for meaningful groupings. > Some > > > thing like this: > > > > > > d <- diana(daisy(your_data_matrix)) > > > d.hc <- as.hclust(d) > > > > > > d.hc$labels <- your_data_matrix$id > > > > > > plot(d.hc) > > > > > > Cheers, > > > > > > Dylan > > > > > > > > > -- > > > Dylan Beaudette > > > Soil Resource Laboratory > > > http://casoilresource.lawr.ucdavis.edu/ > > > University of California at Davis > > > 530.754.7341 > > > > -- > Dylan Beaudette > Soil Resource Laboratory > http://casoilresource.lawr.ucdavis.edu/ > University of California at Davis > 530.754.7341 > -- Maura E.M [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.