The problem here is that distances between the two cases change dynamically across different sets, I have 100 of such sets. I guess there is no better solution than finding an experience value from a training set, isn't it?
Ralf On Wed, May 5, 2010 at 6:04 PM, Phil Spector <spec...@stat.berkeley.edu> wrote: > Ralf - > I think you're making things more complicated than they > need to be. All clustering methods are based on the distances > between observations. If the observations are all close > together, the distances between them won't be very large. > If some are farther away than others, then the distances will > be larger. The first case would suggest just one cluster, > while the second case would suggest more than one. For your > example: > >> two <- c(1,2,3,2,3,1,2,3,400,300,400) >> one <- c(400,402,405, 401,410,415, 407,412) >> max(dist(one)) > > [1] 15 >> >> max(dist(two)) > > [1] 399 > > A little experimentation should provide you with a cut off > that should reliably tell you whether there are 0 or 1 clusters in your > data. > > - Phil Spector > Statistical Computing Facility > Department of Statistics > UC Berkeley > spec...@stat.berkeley.edu > > > On Wed, 5 May 2010, Ralf B wrote: > >> Are there R packages that allow for dynamic clustering, i.e. where the >> number of clusters are not predefined? I have a list of numbers that >> falls in either 2 or just 1 cluster. Here an example of one that >> should be clustered into two clusters: >> >> two <- c(1,2,3,2,3,1,2,3,400,300,400) >> >> and here one that only contains one cluster and would therefore not >> need to be clustered at all. >> >> one <- c(400,402,405, 401,410,415, 407,412) >> >> Given a sufficiently large amount of data, a statistical test or an >> effect size should be able to determined if a data set makes sense to >> be divided i.e. if there are two groups that differ well enough. I am >> not familiar with the underlying techniques in kmeans, but I know that >> it blindly divides both data sets based on the predefined number of >> clusters. Are there any more sophisticated methods that allow me to >> determine the number of clusters in a data set based on statistical >> tests or effect sizes ? >> >> Is it possible that this is not a clustering problem but a >> classification problem? >> >> Ralf >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.