Re: [R] robust method to obtain a correlation coeff?

David Winsemius Mon, 24 Aug 2009 08:39:28 -0700


On Aug 24, 2009, at 11:26 AM, (Ted Harding) wrote:

On 24-Aug-09 14:47:02, Christian Meesters wrote:

Hi,
Being a R-newbie I am wondering how to calculate a correlation
coefficient (preferably with an associated p-value) for data like:

d[,1]

[1] 25.5 25.3 25.1   NA 23.3 21.5 23.8 23.2 24.2 22.7 27.6 24.2 ...

d[,2]

[1]  0.0 11.1  0.0   NA  0.0 10.1 10.6  9.5  0.0 57.9  0.0  0.0  ...

Apparently corr(d) from the boot-library fails with NAs in the data,


Yes, apparently corr() has no option for dealing with NAs.

also cor.test cannot cope with a different number of NAs.


On the other hand, cor.test() does have an option "na.action"
which, by default, is the same as what is in getOption("na.action").

In my R installation, this, by default, is "na.omit". This has the
effect that, for any pair in (x,y) where at least one of the pair
is NA, that pair will be omitted from the calculation. For example,
basing two vectors x,y on your data above, and a third z which is y
with an extra NA:

 x<-c(25.5,25.3,25.1,NA,23.3,21.5,23.8,23.2,24.2,22.7,27.6,24.2)
 y<-c( 0.0,11.1, 0.0,NA, 0.0,10.1,10.6, 9.5, 0.0,57.9, 0.0, 0.0)
 z<-y; z[8]<-NA

I get
 cor.test(x,y)
<snipped unneeded output>
 # sample estimates:
 #        cor
 # -0.4298726

So it has worked in both cases (see the difference in 'df'), despite
the different numbers of NAs in x and z.

You may not need to go through the material that follows. There arealready a set of functions to handle such concerns:


?na.omit will bring a help page describing:

na.fail(object, ...) na.omit(object, ...) na.exclude(object, ...)na.pass(object, ...)


It reminded me that:

na.action: the name of a function for treating missing values (NA's)for certain situations.


... but I do not know what those "certain situations" really are.


For functions such as corr() which do not have provision for omitting
NAs, you can fix it up for yourself before calling the function.
In the case of your two series d[,1], d[,2] you could use an index
variable to select cases:

 ix <- (!is.na(d[,1]))&(!is.na(d[,2]))
 corr(d[ix,])

With my variables x,y,z I get

 ix.1 <- (!is.na(x))&(!is.na(y))
 ix.2 <- (!is.na(x))&(!is.na(z))
 d.1  <-cbind(x,y)
 corr(d.1[ix.1,])
 # [1] -0.422542  ## (and -0.422542 from cor.test above as well)
 d.2  <- cbind(x,z)
 corr(d.2[ix.2,])
 # [1] -0.4298726 ## (and -0.4298726 from cor.test above as well)

Hoping this helps,
Ted.

Is there a
solution to this problem (calculating a correlation coefficient and
ignoring different number of NAs), e.g. Pearson's corr coeff?

If so, please point me to the relevant piece of documentation.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] robust method to obtain a correlation coeff?

Reply via email to