The k-means/Ward criterion can be written down in terms of squared
Euclidean distances in a way that doesn't involve means. It is half the
sum (over all clusters) of the sum (over all observations in a
cluster) of all within-cluster squared dissimilarities, the inner sum
divided by the cluster size. This can also be computed for a general
dissimilarity matrix (this is for example done by cluster.stats in
package fpc).
I'd guess that hclust with method="ward" uses this when run with a general
dissimilarity matrix. At least it would make sense, although I'm not sure
whether it really is what hclust does, because I didn't check the
underlying Fortran code.
Note that I may have missed postings in this thread, so sorry if this
doesn't add to what you already have worked out.
Christian
On Wed, 6 Oct 2010, PeterB wrote:
Apparently, the same issue exists in SAS, where there is an option to run the
Ward algorithm based only on the distance matrix. Perhaps, a SAS user could
confirm that or even check with SAS.
Peter
--
View this message in context:
http://r.789695.n4.nabble.com/hclust-with-method-ward-tp2952140p2965310.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.