On Wed, 6 Oct 2010, PeterB wrote:

Thanks, Christian. This is really helpful.

I was not aware of that equality, but now I can see it. I think you mean the
inner sum over all distances in the distance matrix (for that cluster),
which means that each distance is counted twice (which is why we divide by
2).

That's probably how to explain it... you can obviuously check it by writing the whole thing down, which is how I did it. (The formula is in Bock's old book on "Automatische Klassifikation", but that's in German.)

Christian


Peter


Christian Hennig wrote:

The k-means/Ward criterion can be written down in terms of squared
Euclidean distances in a way that doesn't involve means. It is half the
sum (over all clusters) of the sum (over all observations in a
cluster) of all within-cluster squared dissimilarities, the inner sum
divided by the cluster size. This can also be computed for a general
dissimilarity matrix (this is for example done by cluster.stats in
package fpc).

I'd guess that hclust with method="ward" uses this when run with a general
dissimilarity matrix. At least it would make sense, although I'm not sure
whether it really is what hclust does, because I didn't check the
underlying Fortran code.

Note that I may have missed postings in this thread, so sorry if this
doesn't add to what you already have worked out.

Christian

On Wed, 6 Oct 2010, PeterB wrote:


Apparently, the same issue exists in SAS, where there is an option to run
the
Ward algorithm based only on the distance matrix. Perhaps, a SAS user
could
confirm that or even check with SAS.

Peter

--
View this message in context:
http://r.789695.n4.nabble.com/hclust-with-method-ward-tp2952140p2965310.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
View this message in context: 
http://r.789695.n4.nabble.com/hclust-with-method-ward-tp2952140p2966045.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to