On Mon, Sep 14, 2009 at 10:01 PM, Henrik Bengtsson <h...@stat.berkeley.edu> wrote: > As already suggested, you're (much) better off if you specify colClasses, e.g. > > tab <- read.table("~/20090708.tab", colClasses=c("factor", "double", > "double")); > > Otherwise, R has to load all the data, make a best guess of the column > classes, and then coerce (which requires a copy).
Thanks Henrik, I tried this as well as a variant that another user sent me privately. When I tell R the colClasses, it does a much better job of allocating memory (ending up with 96M of RSS memory, which isn't great but is definitely acceptable). A couple of notes I made from testing some variants, if anyone else is interested: * giving it an nrows argument doesn't help it allocate less memory (just a guess, but maybe because it's trying the powers-of-two allocation strategy in both cases) * there's no difference in memory usage between telling it a column is "numeric" vs "double" * when telling it the types in advance, loading the table is much, much faster Maybe if I gather some more fortitude in the future, I'll poke around at the internals and see where the extra memory is going, since I'm still curious where the extra memory is going. Is that just the overhead of allocating a full object for each value (i.e. rather than just a double[] or whatever)? -- Evan Klitzke <e...@eklitzke.org> :wq ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.