Re: [R] Identification of Outliners and Extraction of Samples

David Winsemius Mon, 09 Aug 2010 18:54:41 -0700


On Aug 9, 2010, at 6:27 PM, Alexander Eggel wrote:

Hello everybody,
I need to know which samples (S1-S6) contain a value that is biggerthan themedian + five standard deviations of the column he is in. This isjust anexample. Command should be applied to a data frame wich is a lotbigger(over 100 columns). Any solutions? Thank you very much for yourhelp!!!
s
   Samples     A     B    C    E
1             S1   1     2     3     7
2             S2   4    NA   6     6
3             S3   7     8     9    NA
4             S4   4     5    NA   6
5             S5   2     5     6     7
6             S6   2     3     4     5
This loop works fine for a column without NA values. However itdoesn't work
for the other columns. I should have a loop that I could apply to all
columns ideally in "one command".

o <- data.frame();
for (i in 1:nrow(s))

{
      dd <- s[i,];
if (dd$A >= median(s$A, na.rm=TRUE) + 5 * sd(s$A, na.rm=TRUE))o <-
rbind(o,dd)

}

Let's look at the more general problem of how to do column-wisecalculations (since I suspect there is not much support in thisneighborhood for the notion that you have a proper definition of"outlier" and furthermore you have not provided an example where anysuch outliers exist). Let's just calculate a set of logical vectorsthat signal whether a value is greater than one sd above the median:

apply(s[-1], 2, function(x) {x > median(x, na.rm=TRUE) + sd(x,na.rm=TRUE)})


      A     B     C     E
1 FALSE FALSE FALSE  TRUE
2 FALSE    NA FALSE FALSE
3  TRUE  TRUE  TRUE    NA
4 FALSE FALSE    NA FALSE
5 FALSE FALSE FALSE  TRUE
6 FALSE FALSE FALSE FALSE

Each column is passed in turn to the function (as a vector) and thefunction then calcuates the median() and sd() with that vector as thefirst argument. The ">" operator has a vector on the lhs and a scalaron the rhs but that is perfectly fine and we get the expected resultsin a logical matrix.


--
David.

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Identification of Outliners and Extraction of Samples

Reply via email to