Dear R community,
I have a 2 million by 2 matrix that looks like this: x<-sample(1:15,2000000, replace=T) y<-sample(1:10*1000, 2000000, replace=T) x y [1,] 10 4000 [2,] 3 1000 [3,] 3 4000 [4,] 8 6000 [5,] 2 9000 [6,] 3 8000 [7,] 2 10000 (...) The first column is a population expansion factor for the number in the second column (household income). I want to expand the second column with the first so that I end up with a vector beginning with 10 observations of 4000, then 3 observations of 1000 and so on. In my mind the natural approach would be to create a NULL vector and append the expansions: myvar<-NULL myvar<-append(myvar, replicate(x[1],y[1]), 1) for (i in 2:length(x)) { myvar<-append(myvar,replicate(x[i],y[i]),sum(x[1:i])+1) } to end with a vector of sum(x), which in my real database corresponds to 22 million observations. This works fine --if I only run it for the first, say, 1000 observations. If I try to perform this on all 2 million observations it takes long, way too long for this to be useful (I left it running 11 hours yesterday to no avail). I know R performs well with operations on relatively large vectors. Why is this so inefficient? And what would be the smart way to do this? Thanks in advance. Alex ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.