On Wed, Sep 8, 2010 at 12:02 PM, David Winsemius <dwinsem...@comcast.net> wrote: > > On Sep 8, 2010, at 2:24 PM, Joshua Wiley wrote: > >> Hi Jakob, >> >> You can use is.na() to create an index of which rows in column 3 are >> missing data, and then select these from column 1. Here is a simple >> example: >> >> dat <- data.frame(V1 = 1:5, V3 = c(1, NA, 3, 4, NA)) >> dat$new <- dat$V3 >> my.na <- is.na(dat$V3) >> dat$new[my.na] <- dat$V1[my.na] >> >> dat >> >> This should be quite fast. I broke the steps up to be explicit, but >> you can readily simplify them. > > I was about to post something similar except I was going to avoid the "$" > operator thinking, incorrectly as it turned out, that it would be faster. I > also include the Holtman/Rizopoulos suggestion of ifelse(). I was also > surprised that ifelse is the winning strategy:
That surprises me too. What I find really curious is the (relatively) large difference between the dlr.sign and index methods. Some of the difference is gained back if dat[, 4] <- dat[, 3] is used over dat[4] <- dat[3]. But it still lags noticeably on my old clunker (with the inventive name, index2) compared to dlr.sign: # after failed attempts with benchmark::benchmark() # I decided this is what you used > library(rbenchmark) > dat <- data.frame(V1 = 1:5, V3 = c(1, NA, 3, 4, NA)) > rbenchmark::benchmark(meth.ifelse = {dat$z.new <- ifelse(is.na(dat$V3), > dat$V1, dat$V3)}, + meth.dlr.sign = {dat$new <- dat$V3 + my.na <- is.na(dat$V3) + dat$new[my.na] <- dat$V1[my.na]}, + meth.index = {dat[4] <- dat[3]; idx <-is.na(dat[, 3]) + dat[idx, 4] <- dat[idx, 1]}, + meth.index2 = {dat[, 4] <- dat[, 3]; idx <-is.na(dat[, 3]) + dat[idx, 4] <- dat[idx, 1]}, + meth.forloop = {for (i in 1:nrow(dat)){ + if(is.na(dat[i,2])==TRUE){ + dat[i, 3] <- dat[i, 1] + } else { dat[i,3] <- dat[i,2]}} + }, + replications=5000, columns = c("test", "replications", "elapsed", + "relative", "user.self")) test replications elapsed relative user.self 2 meth.dlr.sign 5000 1.337 1.206679 1.216 5 meth.forloop 5000 16.941 15.289711 14.997 1 meth.ifelse 5000 1.108 1.000000 1.061 3 meth.index 5000 8.868 8.003610 7.164 4 meth.index2 5000 6.099 5.504513 5.136 > > dat[4] <- dat[3]; idx <-is.na(dat[, 3]) > dat[is.na(dat[, 3]), 4] <- dat[is.na(dat[, 3]), 1] > >> benchmark(meth.ifelse = {dat$z.new <- ifelse(is.na(dat$V3), dat$V1, >> dat$V3)}, > + meth.dlr.sign={dat$new <- dat$V3 > + my.na <- is.na(dat$V3) > + dat$new[my.na] <- dat$V1[my.na]}, > + meth.index ={dat[4] <- dat[3]; idx <-is.na(dat[, 3]) > + dat[idx, 4] <- dat[idx, 1]}, > + meth.forloop ={for (i in 1:nrow(dat)){ > + if (is.na(dat[i,3])==TRUE){ > + dat[i,4]<- dat[i,1]} > + else{ > + dat[i,4]<- dat[i,3]} } > + }, > + replications=5000, columns = c("test", "replications", "elapsed", > + "relative", "user.self") ) > test replications elapsed relative user.self > 2 meth.dlr.sign 5000 0.502 1.081897 0.501 > 4 meth.forloop 5000 6.419 13.834052 6.409 > 1 meth.ifelse 5000 0.464 1.000000 0.463 > 3 meth.index 5000 2.908 6.267241 2.904 > > -- > David. >> >> HTH, >> >> Josh >> >> On Wed, Sep 8, 2010 at 11:17 AM, Jakob Hedegaard >> <jakob.hedega...@agrsci.dk> wrote: >>> >>> Hi list, >>> >>> I have a data frame (m) with 169221 rows and 10 columns and would like to >>> make a new column containing the content of column 3 but replace the NAs in >>> column 3 with the data in column 1 (from the same row as the NA in column >>> 3). Column 1 has data in all rows. >>> >>> My first attempt was: >>> >>> for (i in 1:169221){ >>> if (is.na(m[i,3])==TRUE){ >>> m[i,11] <- as.character(m[i,1])} >>> else{ >>> m[i,11] <- as.character(m[i,3])} >>> } >>> >>> Works - but takes too long time. >>> I would appreciate alternative solutions. >>> >>> Best regards, Jakob >> > -- > > David Winsemius, MD > West Hartford, CT > > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.