Re: [R] hclust with method = “ward”

2010-10-07 Thread Christian Hennig
On Wed, 6 Oct 2010, PeterB wrote: Thanks, Christian. This is really helpful. I was not aware of that equality, but now I can see it. I think you mean the inner sum over all distances in the distance matrix (for that cluster), which means that each distance is counted twice (which is why we divi

Re: [R] hclust with method = “ward”

2010-10-06 Thread PeterB
Thanks, Christian. This is really helpful. I was not aware of that equality, but now I can see it. I think you mean the inner sum over all distances in the distance matrix (for that cluster), which means that each distance is counted twice (which is why we divide by 2). Peter Christian Hennig

Re: [R] hclust with method = “ward”

2010-10-06 Thread Christian Hennig
The k-means/Ward criterion can be written down in terms of squared Euclidean distances in a way that doesn't involve means. It is half the sum (over all clusters) of the sum (over all observations in a cluster) of all within-cluster squared dissimilarities, the inner sum divided by the cluster

Re: [R] hclust with method = “ward”

2010-10-06 Thread PeterB
Apparently, the same issue exists in SAS, where there is an option to run the Ward algorithm based only on the distance matrix. Perhaps, a SAS user could confirm that or even check with SAS. Peter -- View this message in context: http://r.789695.n4.nabble.com/hclust-with-method-ward-tp2952140

[R] hclust with method = “ward”

2010-10-01 Thread PeterB
The clustering function hclust has a method = "ward”, and apparently many people use that option. However, the Ward method seems to minimize an increase in the error sums of squares, which are calculated with respect to the cluster mean. However, hclust has only a dissimilarity matrix as an input.