On Tue, 15 Sep 2009, Evan Klitzke wrote:

On Mon, Sep 14, 2009 at 10:01 PM, Henrik Bengtsson <h...@stat.berkeley.edu> 
wrote:
As already suggested, you're (much) better off if you specify colClasses, e.g.

tab <- read.table("~/20090708.tab", colClasses=c("factor", "double", "double"));

Otherwise, R has to load all the data, make a best guess of the column
classes, and then coerce (which requires a copy).

Thanks Henrik, I tried this as well as a variant that another user
sent me privately. When I tell R the colClasses, it does a much better
job of allocating memory (ending up with 96M of RSS memory, which
isn't great but is definitely acceptable).

A couple of notes I made from testing some variants, if anyone else is
interested:
* giving it an nrows argument doesn't help it allocate less memory
(just a guess, but maybe because it's trying the powers-of-two
allocation strategy in both cases)
* there's no difference in memory usage between telling it a column
is "numeric" vs "double"

Because they are the same type

* when telling it the types in advance, loading the table is much, much faster

Indeed.

Maybe if I gather some more fortitude in the future, I'll poke around
at the internals and see where the extra memory is going, since I'm
still curious where the extra memory is going. Is that just the
overhead of allocating a full object for each value (i.e. rather than
just a double[] or whatever)?

No, because it doesn't allocate a full object for each value, it does just allocate a double[] plus a constant amount of overhead. R doesn't have scalar types so there isn't even such a thing as an object for a single value, just vectors with a single element. R will use more than the object size for the data set, because it makes temporary copies of things.

        -thomas

Thomas Lumley                   Assoc. Professor, Biostatistics
tlum...@u.washington.edu        University of Washington, Seattle

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to