Re: [R] large dataset

Thomas Lumley Mon, 29 Mar 2010 13:13:47 -0700

On Sun, 28 Mar 2010, kMan wrote:

This was *very* useful for me when I dealt with a 1.5Gb text file
http://www.csc.fi/sivut/atcsc/arkisto/atcsc3_2007/ohjelmistot_html/R_and_la

rge_data/


Two hours is a *very* long time to transfer a csv file to a db. The author
of the linked article has not documented how to use scan() arguments
appropriately for the task. I take particular issue with the authors
statement that "R is said to be slow, memory hungry and only capable of
handling small datasets," indicating he/she has crummy informants and not
challenged the notion him/herself.



Ahem.

I believe that *I* am the author of the particular statement you take issue 
with (although not the of the rest of the page).

However, when I wrote it, it continued:
---------
"R (and S) are accused of being slow, memory-hungry, and able to handle only 
small data sets.

This is completely true.

Fortunately, computers are fast and have lots of memory. Data sets with  a few 
tens of thousands of observations can be handled in 256Mb of memory, and quite 
large data sets with 1Gb of memory.  Workstations with 32Gb or more to handle 
millions of observations are still expensive (but in a few years Moore's Law 
should catch up).

Tools for interfacing R with databases allow very large data sets, but this isn't 
transparent to the user."
------------

I think this is a perfectly reasonable summary and has been (with appropriate 
changes to the memory numbers) for the nearly ten years I've been saying it.


     -thomas

Thomas Lumley                   Assoc. Professor, Biostatistics
tlum...@u.washington.edu        University of Washington, Seattle

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] large dataset

Reply via email to