Hi Christian and thanks I've tried your suggestion and it seems promising. But I have a couple of questions. I am reading a three column ASCII file (lon, lat, sst)
> mydata <- read.table("INFILE", header=FALSE,sep="", na.strings="99.00",dec=".",strip.white=TRUE,col.names=c("lon","lat","sst")) then I extract a subset of the data and try to get the right number of clusters just for third var, sst > x<-mydata$sst > asw <- numeric(10) > for (k in 4:10) + asw[k] <- clara(x, k) $ silinfo $ avg.width > k.best <- which.max(asw) > cat("silhouette-optimal number of clusters:", k.best, "\n") silhouette-optimal number of clusters: 5 I've changed the maximum number of clusters in your example from 20 just to 10 as I am expecting a number between 5 and 8 clusters would be right. Is there any problem with this change? Maybe this restriction is too strict if I just consider the data are just numbers but as it is sea surface temperature under certain "environmental-meteorological conditions" in this particular case I think there should not be more than 8-9 clusters (If 20 is retained I get 11 clusters). The second question is how should one understand the plot? Is the right number the one with greater "average silhouette width"? Thanks again 2008/9/30 Christian Hennig <[EMAIL PROTECTED]> > Hi there, > > generally finding the right number of clusters is a difficult problem and > depends heavily on the cluster concept needed for the particular > application. > No outcome of any automatic mathod should be taken for granted. > > Having said that, I guess that something like the example given in > >> ?pam.object >> > (replacing pam by clara) should work with clara, too. > > Regards, > Christian > > > > On Tue, 30 Sep 2008, pacomet wrote: > > Hi everyone >> >> I have a question about clustering. I've managed using CLARA to get a >> clustering analysis of a large data set. But now I want to find which is >> the >> right number of clusters. >> >> The clara.object gives some information like the ratio between maximal and >> minimal dissimilarity that says (maybe if lower than 1??) if a cluster is >> well-separated from the other. I've also read something about silhouette >> and >> abut cluster.stats but can't manage to get how to find the right number of >> clusters. >> >> I've tried a suggestion from the mailing list but when using dist >> >> d1<-dist(mydata$sst) >> >> it says that "specified vector size is too big" >> >> Is there any method to find the right number of clusters when using clara? >> Maybe something I've tried but with a small and simple trick I can't find >> >> Thanks in advance >> >> -- >> _________________________ >> El ponent la mou, el llevant la plou >> Usuari Linux registrat: 363952 >> ------- >> Fotos: http://picasaweb.google.es/pacomet >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > *** --- *** > Christian Hennig > University College London, Department of Statistical Science > Gower St., London WC1E 6BT, phone +44 207 679 1698 > [EMAIL PROTECTED], > www.homepages.ucl.ac.uk/~ucakche<http://www.homepages.ucl.ac.uk/%7Eucakche> > -- _________________________ El ponent la mou, el llevant la plou Usuari Linux registrat: 363952 ------- Fotos: http://picasaweb.google.es/pacomet [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.