Re: [R] R's memory limitation and Hadoop

Prof Brian Ripley Tue, 16 Sep 2014 11:50:10 -0700

On 16/09/2014 13:56, peter dalgaard wrote:

Not sure trolling was intended here.


Anyways:

Yes, there are ways of working with very large datasets in R, using databases 
or otherwise. Check the CRAN task views.

SAS will for _some_ purposes be able to avoid overflowing RAM by using 
sequential file access. The biglm package is an example of using similar 
techniques in R. SAS is not (to my knowledge) able to do this invariably, some 
procedures may need to load the entire data set into RAM.

JMP's data tables are limited by available RAM, just like R's are.

R does have somewhat inefficient memory strategies (e.g., model matrices may 
include multiple columns of binary variables, each using 8 bytes per entry), so 
may run out of memory sooner than other programs, but it is not like the 
competition is not RAM-restricted at all.

Also 'hundreds of thousands of records' is not really very much: I haveseen analyses of millions many times[*]: I have analysed a few billionwith 0.3TB of RAM.

[*] I recall a student fitting a GLM with about 30 predictors to 1.5mrecords: at the time (ca R 2.14) it did not fit in 4GB but did in 8GB.

- Peter D.


On September 16, 2014 4:40:29 AM PDT, Barry King <barry.k...@qlx.com> wrote:

Is there a way to get around R’s memory-bound limitation by interfacing
with a Hadoop database or should I look at products like SAS or JMP to
work
with data that has hundreds of thousands of records?  Any help is
appreciated.

--
__________________________
*Barry E. King, Ph.D.*
Analytics Modeler
Qualex Consulting Services, Inc.
barry.k...@qlx.com
O: (317)940-5464
M: (317)507-0661
__________________________



--
Brian D. Ripley,                  rip...@stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford
1 South Parks Road, Oxford OX1 3TG, UK

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R's memory limitation and Hadoop

Reply via email to