As already suggested, you're (much) better off if you specify colClasses, e.g.
tab <- read.table("~/20090708.tab", colClasses=c("factor", "double", "double")); Otherwise, R has to load all the data, make a best guess of the column classes, and then coerce (which requires a copy). /Henrik On Mon, Sep 14, 2009 at 9:26 PM, Evan Klitzke <e...@eklitzke.org> wrote: > On Mon, Sep 14, 2009 at 8:35 PM, jim holtman <jholt...@gmail.com> wrote: >> When you read your file into R, show the structure of the object: > ... > > Here's the data I get: > >> tab <- read.table("~/20090708.tab") >> str(tab) > 'data.frame': 1797601 obs. of 3 variables: > $ V1: Factor w/ 6 levels "biz_details",..: 4 4 4 4 4 5 6 4 1 4 ... > $ V2: num 1.25e+09 1.25e+09 1.25e+09 1.25e+09 1.25e+09 ... > $ V3: num 0.0141 0.0468 0.0137 0.0594 0.0171 ... >> object.size(tab) > 35953640 bytes >> gc() > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 119580 6.4 1489330 79.6 2380869 127.2 > Vcells 6647905 50.8 17367032 132.5 16871956 128.8 > > Forcing a GC doesn't seem to free up an appreciable amount of memory > (memory usage reported by ps is about the same), but it's encouraging > that the output from object.size shows that the object is small. I am, > however, a little bit skeptical of this: > > 1797601 * (4 + 8 + 8) = 35952020, which is awfully close to 35953640. > My assumption is that the first column is mapped to a 32-bit integer, > plus two 8-byte numbers for the doubles, plus a little bit of overhead > to store whatever structs for the objects and the mapping of servlet > name (i.e. to store the string -> int mapping used by the factor) to > its 32-bit representation. This seems like it might be too > conservative for me, since it implies that R allocated exactly as much > memory for the lists as there were numbers in the list (e.g. typically > in an interpreter like this you'd be allocating on order-of-two > boundaries, i.e. sizeof(obj) << 21; this is how Python lists > internally work). > > Is it possible that R is counting its memory usage naively, e.g. just > adding up the size of all of the constituent objects, rather than the > amount of space it actually allocated for those objects? > > -- > Evan Klitzke <e...@eklitzke.org> :wq > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.