On Wed, 6 Oct 2010, PeterB wrote:
Thanks, Christian. This is really helpful.
I was not aware of that equality, but now I can see it. I think you mean the
inner sum over all distances in the distance matrix (for that cluster),
which means that each distance is counted twice (which is why we divi
Thanks, Christian. This is really helpful.
I was not aware of that equality, but now I can see it. I think you mean the
inner sum over all distances in the distance matrix (for that cluster),
which means that each distance is counted twice (which is why we divide by
2).
Peter
Christian Hennig
The k-means/Ward criterion can be written down in terms of squared
Euclidean distances in a way that doesn't involve means. It is half the
sum (over all clusters) of the sum (over all observations in a
cluster) of all within-cluster squared dissimilarities, the inner sum
divided by the cluster
Apparently, the same issue exists in SAS, where there is an option to run the
Ward algorithm based only on the distance matrix. Perhaps, a SAS user could
confirm that or even check with SAS.
Peter
--
View this message in context:
http://r.789695.n4.nabble.com/hclust-with-method-ward-tp2952140
The clustering function hclust has a method = "ward”, and apparently many
people use that option. However, the Ward method seems to minimize an
increase in the error sums of squares, which are calculated with respect to
the cluster mean. However, hclust has only a dissimilarity matrix as an
input.
5 matches
Mail list logo