Good points, Rui. On Mon, Jan 2, 2012 at 12:48 PM, Rui Barradas <ruipbarra...@sapo.pt> wrote: > Hello again, > > I believe we are all missing something. Isn't it possible to have NAs as the > first values of 'y'? > And isn't it also possible to have x[1] > 3?
Theoretically, yes, in the OPs data, maybe? If the data is a time series (or time series like), the zoo package is not a bad environment to be working in anyways. There are all sorts of handy functions (I had almost recommended na.approx() which replaces NAs with a linear interpolation) based on the OPs little example dataset. Not sure if the +2 thing is just an attempt at interpolation though or something more general. > > Here is my point (I have changed function 'f2' to predict for such cases, > 'f1' is rubbish) > > # Rui > f3 <- function(x, y){ > inx <- which(x > 3) > ynx <- which(is.na(y)) > for(i in which(inx %in% ynx)) y[ynx[i]] <- y[ynx[i]-1] + 2L > y > } > > # Jim's, as a function, 'na.rm' option added or else 'df3' would produce an > error > require(zoo) > f4 <- function(x, y){ > y <- na.locf(y, na.rm=FALSE) > inc <- cumsum(x > 3) * 2 > y + inc > } > > df <- data.frame(x = c(1,2,3,4,5), y = c(10,20,30,NA,NA)) > df > df2 <- data.frame(x = c(1,2,3,4,5), y = c(10,20,NA,40,NA)) > df2 > df3 <- data.frame(x = c(1,2,3,4,5), y = rev(c(10,20,30,NA,NA))) > df3 > > # Joshua > f(df$x, df$y) # works > f(df2$x, df2$y) # infinite loop > f(df3$x, df3$y) # infinite loop > > # Rui > f3(df$x, df$y) # works > f3(df2$x, df2$y) # works as expected? > f3(df3$x, df3$y) # works as expected? > > # Jim > f4(df$x, df$y) # works > f4(df2$x, df2$y) # works as expected? > f4(df3$x, df3$y) # works as expected? > > If this makes sense, the performance tests are very much in favour of Jim's > solution. > > > # If this is what is asked for, test the performance > # with large enough N > N <- 1.e5 > dftest <- data.frame(x=1:N, y=c(sample(c(rep(NA, 5), 10*1:5), N, > replace=TRUE))) > > sum(is.na(dftest))/N # proportion of NAs in 'dftest' > > t2 <- system.time(invisible(apply(dftest, 2, f2)))[c(1, 3)] > t3 <- system.time(invisible(f3(dftest$x, dftest$y)))[c(1, 3)] > t4 <- system.time(invisible(f4(dftest$x, dftest$y)))[c(1, 3)] > rbind(t2=t2, t3=t3, t4=t4, t2.t3=t2/t3, t2.t4=t2/t4, t3.t4=t3/t4) > > Sample output > > user.self elapsed > t2 2.93000 2.95000 > t3 0.22000 0.22000 > t4 0.01000 0.01000 > t2.t3 13.31818 13.40909 > t2.t4 293.00000 295.00000 > t3.t4 22.00000 22.00000 > > A factor of 300 over the initial solution or 20+ over the other loop based > one. > > Downside, it needs an extra package loaded, but 'zoo' is rather common > place. > > Rui Barradas > > > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Conditionally-adding-a-constant-tp4253049p4254470.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.