Hello,

Em 17-08-2012 20:27, Bert Gunter escreveu:
... so it may be just the way object.size() counts in the two cases, right?

Or maybe the way character vectors and factors are coded.
(64 bit Windows 7 or ubuntu 12.04) 80k for the character vector seems to be 8 * 1e4 for pointers plus room for the strings themselves, and 40k for the factor seems more like 32 bit ints * 1e4 in consecutive memory locations. I confess to being too lazy to go check the sources, but if this is the case then it's an other point to factors, they are indeed more efficient memory-wise. And 64 bit OSs are to become more and more used, processors aren't becoming worse.

There is also the statistical side of it. Factors are the natural way of coding nominal or categorical variables. The small/medium/large example is a good one. Or seasons, we like to see Fall or Autumn after Spring and Summer, not before. (btw, does anyone know why M/F?) And this has nothing to do with the usefullness of charaters, I like persons' names to be names, alphabetic.

I've also made a simple check, apparently, character vectors are kept as a vector of pointers and a vector of unique strings. If we change one of the strings, even for something smaller, occupying less bytes, object.size will report an increase in size. Try x[1] <- "a" and see the new size of x. It's bigger and the number of pointers to strings is the same.

For 32 and 64 bit Windows 7 and for 64 bit ubuntu 12.04, R was:
> R.version
[...]
version.string R version 2.15.1 (2012-06-22)
nickname       Roasted Marshmallows

Rui Barradas

-- Bert

On Fri, Aug 17, 2012 at 11:42 AM, Peter Langfelder
<peter.langfel...@gmail.com> wrote:
On Fri, Aug 17, 2012 at 11:34 AM, Rui Barradas <ruipbarra...@sapo.pt> wrote:
Hello,

No, factors may use less memory. System dependent?
I think it's a 32-bit vs. 64-bit distinction - I get Rui's results on
64-bit Windows and Linux installation, but Bert's result on a 32-bit
Linux machine.

Peter

x <-sample(c("small","medium","large"),1e4,rep=TRUE)
y <- factor(x)
object.size(x)
80184 bytes
object.size(y)
40576 bytes



______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to