Re: [R] complexity of operations in R

Duncan Murdoch Thu, 19 Jul 2012 16:01:45 -0700

On 12-07-19 4:00 PM, Jan van der Laan wrote:


When the length of the end result is not known, doubling the length of
the list is also much faster than increasing the size of the list with
single items.

f <- function(n, preallocate) {
      v <- if(preallocate) vector("list",n) else list() ;
      for(i in seq_len(n)) {
         v[[i]] <- i
      }
      v
}

g <- function(n) {
      N <- 1000
      v <- vector("list", N)
      for(i in seq_len(n)) {
         if (i > N) {
             N <- 2 * N
             length(v) <- N
         }
         v[[i]] <- i
      }
      v[1:i]
}


  > system.time(f(5E4, TRUE))
     user  system elapsed
    0.968   0.000   0.975
  > system.time(f(5E4, FALSE))
     user  system elapsed
   52.611   0.136  54.197
  > system.time(g(5E4))
     user  system elapsed
    1.388   0.008   1.424


What causes these differences? I can imagine that the time needed for
memory allocations play a role: multiple small allocations will be
smaller than one large allocation. But that doesn't explain the
quadratic growth in time. I would expect that to be linear. When doing
v[[i]] <- i the list isn't copied, right?

If i is bigger than length(v), it has to be copied. To make the listbigger, R allocates a bigger list, and copies all the pointers over,then does the assignment. It could do this less frequently byover-allocating, but it was written in a time when memory was expensive,and programmers who cared about timing did pre-allocations.


Duncan Murdoch


Jan



On 07/19/2012 06:21 PM, William Dunlap wrote:

Preallocation of lists does speed things up.  The following shows
time quadratic in size when there is no preallocation and linear
growth when there is, for size in the c. 10^4 to 10^6 region:

f <- function(n, preallocate) { v <- if(preallocate)vector("list",n) else list() ; 
for(i in seq_len(n)) v[[i]] <- i ; v }
identical(f(17,pre=TRUE), f(17,pre=FALSE))

[1] TRUE

system.time(f(n=1e4, preallocate=FALSE))

     user  system elapsed
    0.324   0.000   0.326

system.time(f(n=2e4, preallocate=FALSE)) # 2x n, 4x time

     user  system elapsed
    1.316   0.012   1.329

system.time(f(n=4e4, preallocate=FALSE)) # ditto

     user  system elapsed
    5.720   0.028   5.754


system.time(f(n=1e4, preallocate=TRUE))

     user  system elapsed
    0.016   0.000   0.017

system.time(f(n=2e4, preallocate=TRUE)) # 2x n, 2x time

     user  system elapsed
    0.032   0.004   0.036

system.time(f(n=4e4, preallocate=TRUE)) # ditto

     user  system elapsed
    0.068   0.000   0.069


system.time(f(n=4e5, preallocate=TRUE)) # 10x n, 10x time

     user  system elapsed
    0.688   0.000   0.688

Above 10^6 there is some superlinearity

system.time(f(n=4e6, preallocate=TRUE)) # 10x n, 20x time

     user  system elapsed
   11.125   0.052  11.181

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

-----Original Message-----
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Bert Gunter
Sent: Thursday, July 19, 2012 9:11 AM
To: Hadley Wickham
Cc: r-help@r-project.org
Subject: Re: [R] complexity of operations in R

Hadley et. al:

Indeed. And using a loop is a poor way to do it anyway.

v <- as.list(rep(FALSE,dotot))

is way faster.

-- Bert

On Thu, Jul 19, 2012 at 8:50 AM, Hadley Wickham <had...@rice.edu> wrote:

On Thu, Jul 19, 2012 at 8:02 AM, Jan van der Laan <rh...@eoos.dds.nl> wrote:

Johan,

Your 'list' and 'array doubling' code can be written much more efficient.

The following function is faster than your g and easier to read:

g2 <- function(dotot) {
    v <- list()
    for (i in seq_len(dotot)) {
      v[[i]] <- FALSE
    }
}


Except that you don't need to pre-allocate lists...

Hadley

--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-
biostatistics/pdb-ncb-home.htm

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] complexity of operations in R

Reply via email to