Re: [Rd] speeding up perception

Robert Stojnic Sun, 03 Jul 2011 06:29:20 -0700


Hi Simon,


On 03/07/11 05:30, Simon Urbanek wrote:

This is just a quick, incomplete response, but the main misconception is really 
the use of data.frames. If you don't use the elaborate mechanics of data frames 
that involve the management of row names, then they are definitely the wrong 
tool to use, because most of the overhead is exactly to manage to row names and 
you pay a substantial penalty for that. Just drop that one feature and you get 
timings similar to a matrix:

I tried to find some documentation on why there needs to be extra rownames handling when one is just assigning values into the column of adata frame, but couldn't find any. For a while I stared at the code of`[<-.data.frame` but couldn't figure out it myself. Can you pleasesummarise what exactly is going one when one does m[1, 1] <- 1 where mis a data frame?

I found that the performance is significantly different with differentnumber of columns. For instance


# reassign first column to 1
example <- function(m){
    for(i in 1:1000)
        m[i,1] <- 1
}

m <- as.data.frame(matrix(0, ncol=2, nrow=1000))
system.time( example(m) )

   user  system elapsed
  0.164   0.000   0.163

m <- as.data.frame(matrix(0, ncol=1000, nrow=1000))
system.time( example(m) )

   user  system elapsed
 34.634   0.004  34.765

When m is a matrix, both run well under 0.1s.

Increasing the number of rows (but not the number of iterations) leadsto some increase in time, but not as drastic when increasing columnnumber. Using m[[y]][x] in this case doesn't help either.


example2 <- function(m){
    for(i in 1:1000)
        m[[1]][i] <- 1
}

m <- as.data.frame(matrix(0, ncol=1000, nrow=1000))
system.time( example2(m) )

   user  system elapsed
 36.007   0.148  36.233


r.

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] speeding up perception

Reply via email to