Hi, On Wed, Oct 26, 2011 at 11:25 AM, Schatzi <adele_thomp...@cargill.com> wrote: > Sometimes I have NA values within specific columns of a dataframe (in this > example, the first two columns can have NAs). If there are NA values, I > would like them to be removed. > > I have been using the code: > > y<-c(NA,5,4,2,5,6,NA) > z<-c(NA,3,4,NA,1,3,7) > x<-1:7 > adata<-data.frame(y,z,x) > adata<-adata[-which(apply(adata[,1:2],1,function(x)any(is.na(x)))),] > > This works well if there are NA values, but when a dataset doesn't have NA > values, this code messes up the dataframe. I was trying to pick apart this > code and could not understand why it didn't work when there were no NA > values.
Thanks for the example. Your problem is because of the which() statement. If there are NA values, which() returns the row numbers where the NAs are: > which(apply(adata[,1:2],1,function(x)any(is.na(x)))) [1] 1 4 7 > bdata <- data.frame(1:7, 1:7, 1:7) > which(apply(bdata[,1:2],1,function(x)any(is.na(x)))) integer(0) But if there aren't any, which() returns 0. How does R subset on a row index of 0? Unhelpfully. Fortunately you don't need the which() at all: the logical vector returned by your apply statement is entirely sufficient (with added negation): > adata[apply(adata[,1:2],1,function(x)!any(is.na(x))), ] y z x 2 5 3 2 3 4 4 3 5 5 1 5 6 6 3 6 > bdata[apply(bdata[,1:2],1,function(x)!any(is.na(x))), ] X1.7 X1.7.1 X1.7.2 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 Sarah > > If there are no NA values and I run just the part: > apply(adata[,1:2],1,function(x)any(is.na(x))) > it results in: > 2 3 5 6 > FALSE FALSE FALSE FALSE > > I was thinking that I can put in an if statement, but I think there has to > be a better way. > > Any ideas/help? Thank you. > -- Sarah Goslee http://www.functionaldiversity.org ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.