This takes a few seconds to do 1 million lines, and remains explicit/for loop form
numberofSalaryBands = 1000000 # 2000000 x = sample(1:15,numberofSalaryBands, replace=T) y = sample((1:10)*1000, numberofSalaryBands, replace=T) df = data.frame(x,y) finalN = sum(df$x) myVar = rep(NA, finalN) outIndex = 1 i = 1 for (i in 1:numberofSalaryBands) { kount = df$x[i] myVar[outIndex:(outIndex+kount-1)] = rep(df$y[i], kount) # Make x[i] copies of value y[i] outIndex = outIndex+kount } head(myVar) plyr::count(myVar) On Aug 18, 2011, at 12:17 AM, Alex Ruiz Euler wrote: > > > Dear R community, > > I have a 2 million by 2 matrix that looks like this: > > x<-sample(1:15,2000000, replace=T) > y<-sample(1:10*1000, 2000000, replace=T) > x y > [1,] 10 4000 > [2,] 3 1000 > [3,] 3 4000 > [4,] 8 6000 > [5,] 2 9000 > [6,] 3 8000 > [7,] 2 10000 > (...) > > > The first column is a population expansion factor for the number in the > second column (household income). I want to expand the second column > with the first so that I end up with a vector beginning with 10 > observations of 4000, then 3 observations of 1000 and so on. In my mind > the natural approach would be to create a NULL vector and append the > expansions: > > myvar<-NULL > myvar<-append(myvar, replicate(x[1],y[1]), 1) > > for (i in 2:length(x)) { > myvar<-append(myvar,replicate(x[i],y[i]),sum(x[1:i])+1) > } > > to end with a vector of sum(x), which in my real database corresponds > to 22 million observations. > > This works fine --if I only run it for the first, say, 1000 > observations. If I try to perform this on all 2 million observations > it takes long, way too long for this to be useful (I left it running > 11 hours yesterday to no avail). > > > I know R performs well with operations on relatively large vectors. Why > is this so inefficient? And what would be the smart way to do this? > > Thanks in advance. > Alex > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.