On Oct 10, 2012, at 1:31 PM, Jay Rice wrote: > New to R and having issues with loops. I am aware that I should use > vectorization whenever possible and use the apply functions, however, > sometimes a loop seems necessary. > > I have a data set of 2 million rows and have tried run a couple of loops of > varying complexity to test efficiency. If I do a very simple loop such as > add every item in a column I get an answer quickly. > > If I use a nested ifelse statement in a loop it takes me 13 minutes to get > an answer on just 50,000 rows. I am aware of a few methods to speed up > loops. Preallocating memory space and compute as much outside of the loop > as possible (or use create functions and just loop over the function) but > it seems that even with these speed ups I might have too much data to run > loops. Here is the loop I ran that took 13 minutes. I realize I can > accomplish the same goal using vectorization (and in fact did so).
You should describe what you want to do and you should learn to use the vectorized capabilities of R and leave the for-loops for process that really need them > > y<-numeric(length(x)) > > for(i in 1:length(x)) > > ifelse(!is.na(x[i]), y[i]<-x[i], Instead : y[!is.na(x)] <- x[!is.na(x)] # No loop. > > ifelse(strataID[i+1]==strataID[i], y<-x[i+1], y<-x[i-1])) When you index outside the range of the length of x you get NA as a result. Furthermore you are setting y to be only a single element. So I think 'y' will be a single NA at the end of all this. > strataID <- sample(1:2, 10, repl=TRUE) > strataID [1] 1 1 2 2 1 2 2 2 2 1 > for(i in 1:length(x)) {ifelse(strataID[i+1]==strataID[i], y<-x[i+1], > y<-x[i-1])} > y [1] NA There is no implicit indexing of the LHS of an assignment operation. How long is strataID? And why not do this inside a dataframe? > > Presumably, complicated loops would be more intensive than the nested if > statement above. If I write more efficient loops time will come down but I > wonder if I will ever be able to write efficient enough code to perform a > complicated loop over 2 million rows in a reasonable time. > > Is it useless for me to try to do any complicated loops on 2 million rows, > or if I get much better at programming in R will it be manageable even for > complicated situations? > You will gain efficiency when you learn vectorization. And when you learn to test your code for correct behavior. > > Jay > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.