The nnclust package will compute the minimum spanning tree, from which you can extract hierarchical single-linkage clustering.
For N randomly-ordered observations it uses only NlogN memory, and takes N^2 time in high dimensions (30 is high) but only NlogN in low dimensions. -thomas On Tue, Mar 15, 2011 at 11:11 AM, array chip <arrayprof...@yahoo.com> wrote: > Scott, thanks for the suggestion. I have already filtered genes from more than > 30000. Probably I should filter more. I will take a look at genefilter > package. > > John > > > > > ________________________________ > From: "Ochsner, Scott A" <sochs...@bcm.edu> > > <r-help@r-project.org> > Sent: Mon, March 14, 2011 2:19:57 PM > Subject: RE: [R] hclust() memory issue > > John, > > First, why are you trying to cluster so many rows? Presumably, if this is a > gene expression array dataset, most of the array features are not going to > change across treatments/conditions and will be relatively uninformative. Try > using a filter which does not use treatment/condition information to decrease > the number or array features you are attempting to cluster. There are > numerous > examples in the affycoretools and genefilter packages from Bioconductor > http://www.bioconductor.org/. > > HTH, > > Scott > > > Scott A. Ochsner, PhD > One Baylor Plaza BCM130, Houston, TX 77030 > Voice: (713) 798-6227 Fax: (713) 790-1275 > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of array chip > Sent: Monday, March 14, 2011 4:03 PM > To: r-help@r-project.org > Subject: [R] hclust() memory issue > > Hi, I have a microarray dataset of dimension 25000x30 and try to clustering > using hclust(). But the clustering on the rows failed due to the size: > >> y<-hclust(dist(data),method='average') > Error: cannot allocate vector of size 1.9 Gb > > I tried to increase the memory using memory.limit(size=3000), still got the > same > > error. > > I also tried agnes() from cluster package and pvclust() from pvclust package > without success. > > My computer has 2G memory. Is there a more memory efficient clustering > packages > available? > > Thanks > > John > > >> sessionInfo() > R version 2.11.1 (2010-05-31) > i386-pc-mingw32 > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United States.1252 > > > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] pvclust_1.2-1 cluster_1.13.1 rat2302cdf_2.6.0 simpleaffy_2.24.0 > gcrma_2.20.0 genefilter_1.30.0 affy_1.26.1 > > [8] Biobase_2.8.0 > > loaded via a namespace (and not attached): > [1] affyio_1.16.0 annotate_1.26.1 AnnotationDbi_1.10.2 > Biostrings_2.16.9 DBI_0.2-5 IRanges_1.6.16 > > [7] preprocessCore_1.10.0 RSQLite_0.9-2 splines_2.11.1 > survival_2.35-8 tools_2.11.1 xtable_1.5-6 > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Thomas Lumley Professor of Biostatistics University of Auckland ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.