Re: [R] robust method to obtain a correlation coeff?

David Winsemius Mon, 24 Aug 2009 08:54:50 -0700


On Aug 24, 2009, at 11:38 AM, David Winsemius wrote:


On Aug 24, 2009, at 11:26 AM, (Ted Harding) wrote:

On 24-Aug-09 14:47:02, Christian Meesters wrote:

Hi,
Being a R-newbie I am wondering how to calculate a correlation
coefficient (preferably with an associated p-value) for data like:

d[,1]

[1] 25.5 25.3 25.1   NA 23.3 21.5 23.8 23.2 24.2 22.7 27.6 24.2 ...

d[,2]

[1]  0.0 11.1  0.0   NA  0.0 10.1 10.6  9.5  0.0 57.9  0.0  0.0  ...

Apparently corr(d) from the boot-library fails with NAs in the data,


Yes, apparently corr() has no option for dealing with NAs.

also cor.test cannot cope with a different number of NAs.


On the other hand, cor.test() does have an option "na.action"
which, by default, is the same as what is in getOption("na.action").

In my R installation, this, by default, is "na.omit". This has the
effect that, for any pair in (x,y) where at least one of the pair
is NA, that pair will be omitted from the calculation. For example,
basing two vectors x,y on your data above, and a third z which is y
with an extra NA:

x<-c(25.5,25.3,25.1,NA,23.3,21.5,23.8,23.2,24.2,22.7,27.6,24.2)
y<-c( 0.0,11.1, 0.0,NA, 0.0,10.1,10.6, 9.5, 0.0,57.9, 0.0, 0.0)
z<-y; z[8]<-NA

I get
cor.test(x,y)
<snipped unneeded output>
# sample estimates:
#        cor
# -0.4298726

So it has worked in both cases (see the difference in 'df'), despite
the different numbers of NAs in x and z.

You may not need to go through the material that follows. There arealready a set of functions to handle such concerns:


?na.omit will bring a help page describing:

na.fail(object, ...) na.omit(object, ...) na.exclude(object, ...)na.pass(object, ...)

Apologies; this was a bit hastily constructed. What I was quoting inwhat follows was from the Options help page and "Options set inpackage stats" section of that help page.

na.action: the name of a function for treating missing values (NA's)for certain situations.
... but I do not know what those "certain situations" really are.

So there are some function that may be affected by settings ofoptions("na.action") but I cannot tell you where to find a list ofsuch functions.


For functions such as corr() which do not have provision for omitting
NAs, you can fix it up for yourself before calling the function.
In the case of your two series d[,1], d[,2] you could use an index
variable to select cases:

ix <- (!is.na(d[,1]))&(!is.na(d[,2]))
corr(d[ix,])

With my variables x,y,z I get

ix.1 <- (!is.na(x))&(!is.na(y))
ix.2 <- (!is.na(x))&(!is.na(z))
d.1  <-cbind(x,y)
corr(d.1[ix.1,])
# [1] -0.422542  ## (and -0.422542 from cor.test above as well)
d.2  <- cbind(x,z)
corr(d.2[ix.2,])
# [1] -0.4298726 ## (and -0.4298726 from cor.test above as well)

Hoping this helps,
Ted.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] robust method to obtain a correlation coeff?

Reply via email to