On Mon, Sep 14, 2009 at 8:35 PM, jim holtman <jholt...@gmail.com> wrote: > When you read your file into R, show the structure of the object: ...
Here's the data I get: > tab <- read.table("~/20090708.tab") > str(tab) 'data.frame': 1797601 obs. of 3 variables: $ V1: Factor w/ 6 levels "biz_details",..: 4 4 4 4 4 5 6 4 1 4 ... $ V2: num 1.25e+09 1.25e+09 1.25e+09 1.25e+09 1.25e+09 ... $ V3: num 0.0141 0.0468 0.0137 0.0594 0.0171 ... > object.size(tab) 35953640 bytes > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 119580 6.4 1489330 79.6 2380869 127.2 Vcells 6647905 50.8 17367032 132.5 16871956 128.8 Forcing a GC doesn't seem to free up an appreciable amount of memory (memory usage reported by ps is about the same), but it's encouraging that the output from object.size shows that the object is small. I am, however, a little bit skeptical of this: 1797601 * (4 + 8 + 8) = 35952020, which is awfully close to 35953640. My assumption is that the first column is mapped to a 32-bit integer, plus two 8-byte numbers for the doubles, plus a little bit of overhead to store whatever structs for the objects and the mapping of servlet name (i.e. to store the string -> int mapping used by the factor) to its 32-bit representation. This seems like it might be too conservative for me, since it implies that R allocated exactly as much memory for the lists as there were numbers in the list (e.g. typically in an interpreter like this you'd be allocating on order-of-two boundaries, i.e. sizeof(obj) << 21; this is how Python lists internally work). Is it possible that R is counting its memory usage naively, e.g. just adding up the size of all of the constituent objects, rather than the amount of space it actually allocated for those objects? -- Evan Klitzke <e...@eklitzke.org> :wq ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.