On Mon, Feb 13, 2012 at 07:48:11AM -0800, syrvn wrote: > Hello, > > I use read.xls from the gdata package to read in xlsx files. Sometimes these > data.frames contain NA columns > and rows only. I know how to get rid of those ones but here is the R output > of a test data set read in with read.xls > > > t1 > A B X D X.1 X.2 > 1 test 1 NA NA > 2 <NA> asd NA asdasd NA > 3 NA asdasd NA > 4 NA NA NA > > t1[1,2], t1[4,5] and t1[4,6] are NA in text form in the excel sheet. I don't > understand why in the first column it is <NA> while in the last two is not. > I basically want to get rid of column 5 and 6 and row 4 as they do not > contain any relevant information. If i do a is.na.data.frame(t1): > > > is.na.data.frame(t1) > A B X D X.1 X.2 > [1,] FALSE FALSE TRUE FALSE TRUE FALSE > [2,] TRUE FALSE TRUE FALSE TRUE FALSE > [3,] FALSE FALSE TRUE FALSE TRUE FALSE > [4,] FALSE FALSE TRUE FALSE TRUE FALSE > > does not give me the result I hoped to get. > > It seems that <NA> and NA are treated as NA but in t1[4,6] it is treated as > FALSE because if I do > > > as.character(t1[4,6]) > [1] "NA "
Hi. I do not know, how "NA " appeared, however, it is possible to change them to real NA as follows. # some data frame df <- structure(list(a = c(NA, 2L, 3L, 4L), b = c("a", NA, "c", "NA "), c = structure(c(1L, 2L, NA, 4L), .Label = c("e", "f", "g", "h"), class = "factor")), .Names = c("a", "b", "c"), row.names = c(NA, -4L), class = "data.frame") df a b c 1 NA a e 2 2 <NA> f 3 3 c <NA> 4 4 NA h df[4, 2] # this is not NA, but "NA " [1] "NA " # replace all "NA " by NA in column 2 df[which(df[,2] == "NA "), 2] <- NA df a b c 1 NA a e 2 2 <NA> f 3 3 c <NA> 4 4 <NA> h Hope this helps. Petr Savicky. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.