On Aug 24, 2009, at 11:38 AM, David Winsemius wrote:
On Aug 24, 2009, at 11:26 AM, (Ted Harding) wrote:On 24-Aug-09 14:47:02, Christian Meesters wrote:Hi, Being a R-newbie I am wondering how to calculate a correlation coefficient (preferably with an associated p-value) for data like:d[,1][1] 25.5 25.3 25.1 NA 23.3 21.5 23.8 23.2 24.2 22.7 27.6 24.2 ...d[,2][1] 0.0 11.1 0.0 NA 0.0 10.1 10.6 9.5 0.0 57.9 0.0 0.0 ... Apparently corr(d) from the boot-library fails with NAs in the data,Yes, apparently corr() has no option for dealing with NAs.also cor.test cannot cope with a different number of NAs.On the other hand, cor.test() does have an option "na.action" which, by default, is the same as what is in getOption("na.action"). In my R installation, this, by default, is "na.omit". This has the effect that, for any pair in (x,y) where at least one of the pair is NA, that pair will be omitted from the calculation. For example, basing two vectors x,y on your data above, and a third z which is y with an extra NA: x<-c(25.5,25.3,25.1,NA,23.3,21.5,23.8,23.2,24.2,22.7,27.6,24.2) y<-c( 0.0,11.1, 0.0,NA, 0.0,10.1,10.6, 9.5, 0.0,57.9, 0.0, 0.0) z<-y; z[8]<-NA I get cor.test(x,y) <snipped unneeded output> # sample estimates: # cor # -0.4298726 So it has worked in both cases (see the difference in 'df'), despite the different numbers of NAs in x and z.You may not need to go through the material that follows. There are already a set of functions to handle such concerns:?na.omit will bring a help page describing:na.fail(object, ...) na.omit(object, ...) na.exclude(object, ...) na.pass(object, ...)
Apologies; this was a bit hastily constructed. What I was quoting in what follows was from the Options help page and "Options set in package stats" section of that help page.
na.action: the name of a function for treating missing values (NA's) for certain situations.So there are some function that may be affected by settings of options("na.action") but I cannot tell you where to find a list of such functions.... but I do not know what those "certain situations" really are.
For functions such as corr() which do not have provision for omitting NAs, you can fix it up for yourself before calling the function. In the case of your two series d[,1], d[,2] you could use an index variable to select cases: ix <- (!is.na(d[,1]))&(!is.na(d[,2])) corr(d[ix,]) With my variables x,y,z I get ix.1 <- (!is.na(x))&(!is.na(y)) ix.2 <- (!is.na(x))&(!is.na(z)) d.1 <-cbind(x,y) corr(d.1[ix.1,]) # [1] -0.422542 ## (and -0.422542 from cor.test above as well) d.2 <- cbind(x,z) corr(d.2[ix.2,]) # [1] -0.4298726 ## (and -0.4298726 from cor.test above as well) Hoping this helps, Ted.
David Winsemius, MD Heritage Laboratories West Hartford, CT ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.