Re: [R] cor() on sets of vectors

William Dunlap Thu, 23 Feb 2012 17:16:23 -0800

> ## sapply is a good solution (the only one I could think of too), but
> not always worth it:


Also look at vapply().  It is like sapply() but you have to tell
it what type and size of output you expect FUN to return.  E.g.,
change
   sapply(1:n, FUN=function(i) cor(x[,i],y[,i]))
to
   vapply(1:n, FUN=function(i) cor(x[,i],y[,i]), FUN.VALUE=0.0)
It can save time and space in formatting the outputs of FUN because
it knows what it will be ahead of time, but its biggest advantage
is that it makes it clear (to the code writer and readers) what it
will return.  It throws an error if FUN does not return what you
say it will.

Also, vapply works nicely for zero-length inputs.  E.g.,
   vapply(list(), range, FUN.VALUE=numeric(2))
returns a 2 by 0 matrix while
   sapply(list(), range)
returns list() and you have to add if(length(x)==0) statements to your
code to make it work in the edge cases, obscuring the basic algorithm.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> -----Original Message-----
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf Of ilai
> Sent: Thursday, February 23, 2012 2:52 PM
> To: Bert Gunter
> Cc: r-help@r-project.org; s...@gnu.org
> Subject: Re: [R] cor() on sets of vectors
> 
> On Thu, Feb 23, 2012 at 3:24 PM, Bert Gunter <gunter.ber...@gene.com> wrote:
> > Use 1:n as an index.
> >
> > e.g.
> > sapply(1:n, function(i) cor(x[,i],y[,i]))
> 
> ## sapply is a good solution (the only one I could think of too), but
> not always worth it:
> 
> # for 100 x 1000
>  x <- data.frame(matrix(rnorm(100000),nc=1000))
>  y <- data.frame(matrix(rnorm(100000),nc=1000))
>  system.time(diag(cor(x,y)))
> #   user  system elapsed
> #  0.592   0.008   0.623
> system.time(sapply(1:1000,function(i) cor(x[,i],y[,i])))
> #   user  system elapsed
> #  0.384   0.000   0.412
> 
> # Great. but for 10 x 1000
> x <- data.frame(matrix(rnorm(10000),nc=1000))
> y <- data.frame(matrix(rnorm(10000),nc=1000))
> system.time(diag(cor(x,y)))
> #   user  system elapsed
> #  0.256   0.008   0.279
> system.time(sapply(1:1000,function(i) cor(x[,i],y[,i])))
> #   user  system elapsed
> #  0.376   0.000   0.388
> 
> # or 100 x 100
>  system.time(diag(cor(x,y)))
> #   user  system elapsed
> #  0.016   0.000   0.014
>  system.time(sapply(1:100,function(i) cor(x[,i],y[,i])))
> #   user  system elapsed
> #  0.036   0.000   0.036
> 
> # Not so great.
> 
> Bottom line, as always, it depends.
> 
> Cheers
> Elai
> 
> 
> 
> 
> >
> > -- Bert
> >
> >
> >
> > On Thu, Feb 23, 2012 at 2:10 PM, Sam Steingold <s...@gnu.org> wrote:
> >> suppose I have two sets of vectors: x1,x2,...,xN and y1,y2,...,yN.
> >> I want N correlations: cor(x1,y1), cor(x2,y2), ..., cor(xN,yN).
> >> my sets of vectors are arranged as data frames x & y (vector=column):
> >>
> >>  x <- data.frame(a=rnorm(10),b=rnorm(10),c=rnorm(10))
> >>  y <- data.frame(d=rnorm(10),e=rnorm(10),f=rnorm(10))
> >>
> >> cor(x,y) returns a _matrix_ of all pairwise correlations:
> >>
> >>  cor(x,y)
> >>          d          e            f
> >> a 0.2763696 -0.3523757 -0.373518870
> >> b 0.5892742 -0.1969161 -0.007159589
> >> c 0.3094301  0.1111997 -0.094970748
> >>
> >> which is _not_ what I want.
> >>
> >> I want diag(cor(x,y)) but without the N^2 calculations.
> >>
> >> thanks.
> >>
> >> --
> >> Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 
> >> 11.0.11004000
> >> http://www.childpsy.net/ http://iris.org.il http://americancensorship.org
> >> http://dhimmi.com http://www.PetitionOnline.com/tap12009/ 
> >> http://jihadwatch.org
> >> Never argue with an idiot: he has more experience with idiotic arguments.
> >>
> >> ______________________________________________
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide 
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> > --
> >
> > Bert Gunter
> > Genentech Nonclinical Biostatistics
> >
> > Internal Contact Info:
> > Phone: 467-7374
> > Website:
> > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-
> home.htm
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cor() on sets of vectors

Reply via email to