On May 13, 2011, at 9:42 AM, Vickie S wrote:


Hi
naive question.
It is possible to get R command for omitting rows or cols with missing values present.

But
if i want to omit rows or cols with i.e . >20% missing values, I
couldĀ“t find any package-based command, probably because it is too
simple for anyone to do that manually, though not for me. Can anyone
please help me ?

?is.na

> str(fil)
'data.frame':   8 obs. of  5 variables:
 $ X1  : int  2 3 4 5 6 NA NA 6
 $ X5  : int  6 7 NA NA NA NA NA NA
 $ X8  : int  9 NA NA NA NA NA NA NA
 $ X   : logi  NA NA NA NA NA NA ...
 $ X1.1: Factor w/ 6 levels "","2","3","5",..: 2 3 1 4 5 6 1 1
> is.na(fil)
        X1    X5    X8    X  X1.1
[1,] FALSE FALSE FALSE TRUE FALSE
[2,] FALSE FALSE  TRUE TRUE FALSE
[3,] FALSE  TRUE  TRUE TRUE FALSE
[4,] FALSE  TRUE  TRUE TRUE FALSE
[5,] FALSE  TRUE  TRUE TRUE FALSE
[6,]  TRUE  TRUE  TRUE TRUE FALSE
[7,]  TRUE  TRUE  TRUE TRUE FALSE
[8,] FALSE  TRUE  TRUE TRUE FALSE
> str(is.na(fil))
 logi [1:8, 1:5] FALSE FALSE FALSE FALSE FALSE TRUE ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:5] "X1" "X5" "X8" "X" ...

So is.na() applied to a dataframe will return a logical matrix. You can run your tests for percentages with apply() using appropriate margin arguments to generate logical indices for selection of rows or columns.

--
David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to