Motivation: during each iteration, my code needs to collect tabular data (and 
use it only during that iteration), but the rows of data may vary. I thought I 
would speed it up by preinitializing the matrix that collects the data with 
zeros to what I know to be the maximum number of rows. I was surprised by what 
I found...

# set up (not the puzzling part)
x<-matrix(runif(20),nrow=4); y<-matrix(0,nrow=12,ncol=5); foo<-c();

# this is what surprises me... what the?
> system.time(for(i in 1:100000){n<-sample(1:4,1);y[1:n,]<-x[1:n,];});
   user  system elapsed 
  1.510   0.000   1.514 
> system.time(for(i in 1:100000){n<-sample(1:4,1);foo<-x[1:n,];});
   user  system elapsed 
  1.090   0.000   1.085

These results are very repeatable. So, if I'm interpreting them correctly, 
dynamically allocating 'foo' each time to whatever the current output size is 
runs faster than writing to a subset of a preallocated 'y'? How is that 
possible?

And, more generally, I'm sure other people have encountered this type of 
situation. Am I reinventing the wheel? Is there a best practice for storing 
temporary loop-specific data?

Thanks.

PS:  By the way, though I cannot write to foo[,] because the size is different 
each time, I tried writing to foo[] and the runtime was worse than either of 
the above examples.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to