Re: [R] a question about "by" and "ddply"

David Winsemius Tue, 29 May 2012 22:01:29 -0700


On May 29, 2012, at 6:32 PM, jacaranda tree wrote:

Hi all,
I have a data set (df, n=10 for the sake of simplicity here) where Ihave two continuous variables (age and weight) and I also have agrouping variable (group, with two levels). I want to runcorrelations for each group separately (kind of similar to "splitfile" in SPSS). I've been experimenting with different functions,and I was able to do this correctly using ddply function, but outputis a little bit difficult to read when I do the cor.test to get allthe data with p values, df, and pearson r (see below). I also triedto do it with by function. Although, with by, it shows the data fortwo groups separately, it seems like it calculates the same r forboth groups. Here is my code for both ddply and by, and the outputas well. I was wondering if there is a way to display the outputbetter with ddply or run the correlations correctly for each groupusing by.
Thanks in advance,


I would have imagined something along the lines of

lapply( split( df, df$group, function(x) cor.test(x[["age"]],x[["weight")] )


... but without an example it's only a guess.

--
David

1.with  "ddply"

r<-ddply(df, .(group), summarise, "corr" = cor.test(age, weight,method = "pearson"))


Output:
   Group                                 corr
1      1                                  Inf
2      1                                    3
3      1                                    0
4      1                                    1
5      1                                    0
6      1                            two.sided
7      1 Pearson's product-moment correlation
8      1                       age and weight
9      1                                 1, 1
10     2                             9.722211
11     2                                    3
12     2                          0.002311412
13     2                            0.9844986
14     2                                    0
15     2                            two.sided
16     2 Pearson's product-moment correlation
17     2                       age and weight
18     2                 0.7779640, 0.9990233

2. with "by"

r <- by(df, group, FUN = function(x) cor.test(age, weight, method ="pearson"))


Output:
Group: 1

        Pearson's product-moment correlation

data:  age and weight
t = 6.4475, df = 8, p-value = 0.0001988
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.6757758 0.9802100
sample estimates:
      cor
0.9157592

------------------------------------------------------------
Group: 2

        Pearson's product-moment correlation

data:  age and weight
t = 6.4475, df = 8, p-value = 0.0001988
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.6757758 0.9802100
sample estimates:
      cor
0.9157592
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] a question about "by" and "ddply"

Reply via email to