On Wed, 10 Aug 2011, David Winsemius wrote:


On Aug 9, 2011, at 11:38 PM, Chris Howden wrote:

Hi,

I’m trying to do a hierarchical cluster analysis in R with a Big Data set.
I’m running into problems using the dist() function.

I’ve been looking at a few threads about R’s memory and have read the
memory limits section in R help. However I’m no computer expert so I’m
hoping I’ve misunderstood something and R can handle my Big Data set,
somehow. Although at the moment I think my dataset is simply too big and
there is no way around it, but I’d like to be proved wrong!

My data set has 90523 rows of data and 24 columns.

My understanding is that this means the distance matrix has a min of
90523^2 elements which is 8194413529. Which roughly translates as 8GB of

A bit less than half that: it is symmetric.

memory being required (if I assume each entry requires 1 bit).

Hmm, that would be a 0/1 distance: there are simpler methods to cluster such distances.

I only have 4GB on a 32bit build of windows and R. So there is no way that’s going to work.

So then I thought of getting access to a more powerful computer, and maybe
using cloud computing.

However the R memory limit help mentions  “On all builds of R, the maximum
length (number of elements) of a vector is 2^31 - 1 ~ 2*10^9”. Now as the
distance matrix I require has more elements than this does this mean it’s
too big for R no matter what I do?

Yes. Vector indexing is done with 4 byte integers.

Assuming you need the full distance matrix at one time (which you do not for hierarchical clustering, itself a highly dubious method for more than a few hundred points).


--

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Brian D. Ripley,                  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to