The k-means/Ward criterion can be written down in terms of squared Euclidean distances in a way that doesn't involve means. It is half the sum (over all clusters) of the sum (over all observations in a cluster) of all within-cluster squared dissimilarities, the inner sum divided by the cluster size. This can also be computed for a general dissimilarity matrix (this is for example done by cluster.stats in
package fpc).

I'd guess that hclust with method="ward" uses this when run with a general dissimilarity matrix. At least it would make sense, although I'm not sure whether it really is what hclust does, because I didn't check the underlying Fortran code.

Note that I may have missed postings in this thread, so sorry if this doesn't add to what you already have worked out.

Christian

On Wed, 6 Oct 2010, PeterB wrote:


Apparently, the same issue exists in SAS, where there is an option to run the
Ward algorithm based only on the distance matrix. Perhaps, a SAS user could
confirm that or even check with SAS.

Peter

--
View this message in context: 
http://r.789695.n4.nabble.com/hclust-with-method-ward-tp2952140p2965310.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to