Dear Bart,

a strange thing in your question is that the term "Ward's method" usually refers to a method based on the k-means criterion, which, in its standard form, is not based on dissimilarities, but on "objects*variables-data". So I wonder how and why you want to use Ward's method on a dissimilarity matrix in the first place (I know that the "k-means" criterion can in principle be translated to dissimilarity data - this is probably what hclust's method="ward" does if fed with a dissimilarity matrix, but I'm not sure -, but then it loses its justification).

One thing you could think about is using the function pam in library cluster. Chances are that this won't work on 38,000 cases either, but you may cluster a subsample of, say, 2,000 cases and assign all further objects to the most similar cluster medoid.

It is well know that hierarchical methods are problematic with too large dissimilarity matrices; even if you resolve the memory problem, the number of operations required is enormous.

Hope this helps,
Christian


On Thu, 11 Feb 2010, Bart Thijs wrote:


Hi all,

I've stumbled upon some memory limitations for the analysis that I want to
run.

I've a matrix of distances between 38000 objects. These distances were
calculated outside of R.
I want to cluster these objects.

For smaller sets (egn=100) this is how I proceed:
A<-matrix(scan(file, n=100*100),100,100, byrow=TRUE)
ad<-as.dist(A)
ahc<-hclust(ad,method="ward",members=NULL)
....

However if I try this with the real dataset I end up with memory problems.
I've the 64bit version of R installed on a machine with 40Gb RAM (Windows
2003 64bit version).

I'm thinking about using only the lower triangle of the matrix but I can't
create a distance object for the clustering from the lower.tri

Can someone help me with a suggestion for which way to go?

Best Regards
Bart Thijs
--
View this message in context: 
http://n4.nabble.com/cluster-distance-large-matrix-tp1477237p1477237.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to