A number of as.data.frame methods do

    names(x) <- NULL

Replacing that with

    if(!is.null(names(x)))
      names(x) <- NULL

appears to save making one copy of the data
(based on tracemem and Rprofmem in a copy of R compiled
with --enable-memory-profiling)
and gives a modest but consistent boost in speed, e.g.:

#               old                     new
#               user  system elapsed    user  system elapsed
# integer       3.412   0.060   3.472   2.788   0.020   2.809
# numeric       6.212   0.160   6.374   4.852   0.080   5.132
# logical       3.484   0.052   3.699   2.808   0.028   2.834
# factor        4.433   0.020   4.547   2.929   0.020   2.964

These visible methods can be modified as noted above:
  as.data.frame.Date
  as.data.frame.POSIXct
  as.data.frame.complex
  as.data.frame.difftime
  as.data.frame.factor
  as.data.frame.integer
  as.data.frame.logical
  as.data.frame.numeric
  as.data.frame.numeric_version
  as.data.frame.ordered
  as.data.frame.raw
  as.data.frame.vector


Here's the timing code (run in a copy of R without memory profiling):

x <- 1:10^4     # integer
system.time(for(i in 1:10^4) y <- as.data.frame(x), gc=TRUE)
x <- x + 0.0    # numeric
system.time(for(i in 1:10^4) y <- as.data.frame(x), gc=TRUE)
x <- rep(c(TRUE,FALSE), length = 10^4)  # logical
system.time(for(i in 1:10^4) y <- as.data.frame(x), gc=TRUE)
x <- factor(rep(letters[1:10], length=10^4))    # factor
system.time(for(i in 1:10^4) y <- as.data.frame(x), gc=TRUE)

I have not done timings where the inputs have names;
that is rare in my experience.

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to