Re: [R] How to test a difference in ratios of count data in R

David Winsemius Wed, 28 Sep 2016 13:56:32 -0700

> On Sep 28, 2016, at 9:49 AM, Greg Snow <538...@gmail.com> wrote:
> 
> There are multiple ways of doing this, but here are a couple.
> 
> To just test the fixed effect of treatment you can use the glm function:
> 
> test <- read.table(text="
> replicate treatment n X
> 1 A 32 4
> 1 B 33 18
> 2 A 20 6
> 2 B 21 18
> 3 A 7 0
> 3 B 8 4
> ", header=TRUE)
> 
> fit1 <- glm( cbind(X,n-X) ~ treatment, data=test, family=binomial)
> summary(fit1)
> 
> Note that the default baseline value may differ between R and SAS,
> which would result in a reversed sign on the slope coefficient (and
> different intercept).
> 
> To include replicate as a random effect you need an additional
> package, here I use lme4 and the glmer function:
> 
> library(lme4)
> fit2 <- glmer( cbind(X, n-X) ~ treatment + (1|replicate), data=test,
> family=binomial)
> summary(fit2)
> 
> 
> 
> On Tue, Sep 27, 2016 at 9:03 PM, Shuhua Zhan <sz...@uoguelph.ca> wrote:
>> Hello R-experts,
>> I am interested to determine if the ratio of counts from two groups differ 
>> across two distinct treatments. For example, we have three replicates of 
>> treatment A, and three replicates of treatment B. For each treatment, we 
>> have counts X from one group and counts Y from another group. My 
>> understanding is that that GLIMMIX procedure in SAS can calculate whether 
>> the ratio of counts in one group (X/X+Y) significantly differs between 
>> treatments.
>> 
>> I think this is the way you do it in SAS. The replicate and treatment 
>> variables are self-explanatory. The first number (n) refers to the total 
>> counts X + Y; the second number (X) refers to the counts X. Is there a way 
>> to do this in R? Since we have 20,000 datasets to be tested, it may be 
>> easier to retrive the significant test as the given dataset below and its 
>> p>F value and mean ratios of treatments in R than SAS.
>> 
>> 
>> data test;
>> input replicate treatment$ n X;
>> datalines;
>> 1 A 32 4
>> 1 B 33 18
>> 2 A 20 6
>> 2 B 21 18
>> 3 A 7 0
>> 3 B 8 4
>> ;
>>


Greg has already shown you how that is done in R and how to do logistic 
regression:

#  I usually think of Poisson regression when I hear a desire is to model 
ratios of counts that have a denominator. The log(sample_size) is supplied as 
an offset to correct for the variation in size of subsamples.


fit1 <- glm( X ~ treatment+offset(log(n)), data=test, family=poisson)
summary(fit1)

#  And the lme4 analogue with replication:

library(lme4)
fit2 <- glmer( X ~ treatment + offset(log(n))+ (1|replicate), data=test,
family=poisson)
summary(fit2)
#----output----
Generalized linear mixed model fit by maximum likelihood (Laplace  
Approximation)
 [glmerMod]
 Family: poisson  ( log )
Formula: X ~ treatment + offset(log(n)) + (1 | replicate)
   Data: test

     AIC      BIC   logLik deviance df.resid 
    31.9     31.3    -13.0     25.9        3 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.0504 -0.4146 -0.3487  0.3956  1.0791 

Random effects:
 Groups    Name        Variance Std.Dev.
 replicate (Intercept) 0.03159  0.1777  
Number of obs: 6, groups:  replicate, 3

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -1.7875     0.3372  -5.301 1.15e-07 ***
treatmentB    1.3365     0.3529   3.787 0.000152 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
           (Intr)
treatmentB -0.838

Compare with the binomial model:
#============


 fitBin <- glmer( cbind(X,n-X) ~ treatment +  (1|replicate), data=test,
 family=binomial)
 coef(fitBin)
#----
$replicate
  (Intercept) treatmentB
1  -2.0487694   2.364695
2  -0.9908556   2.364695
3  -2.1844435   2.364695

attr(,"class")
[1] "coef.mer"
#-----
 summary(fitBin)
#---------
Generalized linear mixed model fit by maximum likelihood (Laplace  
Approximation)
 [glmerMod]
 Family: binomial  ( logit )
Formula: cbind(X, n - X) ~ treatment + (1 | replicate)
   Data: test

     AIC      BIC   logLik deviance df.resid 
    30.1     29.4    -12.0     24.1        3 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-0.88757 -0.35065 -0.03137  0.26897  0.67505 

Random effects:
 Groups    Name        Variance Std.Dev.
 replicate (Intercept) 0.4123   0.6421  
Number of obs: 6, groups:  replicate, 3

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -1.7442     0.5438  -3.208  0.00134 ** 
treatmentB    2.3647     0.4741   4.988 6.11e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
           (Intr)
treatmentB -0.568

The binomial model has a logit link. Your glimmix procedure appears to have a 
gaussian/normal distributional assumption and an identity link by default. If 
we run this using those assumptions in lme4::glmer we get these results (with a 
warning that in this case we can overlook since the results with lmer turned 
out to be identical)
#--------
 fitNorm <- glmer( I(X/n) ~ treatment +  (1|replicate), data=test,
                  family=gaussian)

#-------
Warning message:
In glmer(I(X/n) ~ treatment + (1 | replicate), data = test, family = gaussian) :
  calling glmer() with family=gaussian (identity link) as a shortcut to lmer() 
is deprecated; please call lmer() directly
> coef(fitNorm); summary(fitNorm)
$replicate
  (Intercept) treatmentB
1 0.091096925  0.4925325
2 0.324579602  0.4925325
3 0.009323473  0.4925325

attr(,"class")
[1] "coef.mer"
Linear mixed model fit by REML ['lmerMod']
Formula: I(X/n) ~ treatment + (1 | replicate)
   Data: test

REML criterion at convergence: -4.2

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-0.7864 -0.4278 -0.1152  0.5143  0.8246 

Random effects:
 Groups    Name        Variance Std.Dev.
 replicate (Intercept) 0.027895 0.16702 
 Residual              0.002356 0.04854 
Number of obs: 6, groups:  replicate, 3

Fixed effects:
            Estimate Std. Error t value
(Intercept)  0.14167    0.10042   1.411
treatmentB   0.49253    0.03963  12.427

Correlation of Fixed Effects:
           (Intr)
treatmentB -0.197

That's (probably) the model to compare to your SAS results if my reading of the 
SAS Proc GLIMMIX manual page is correct.

-- 
David.

>> proc glimmix data=test;
>> class replicate treatment;
>> model X/n = treatment / solution;
>> random intercept / subject=replicate;
>> run;
>> 
>> ods select lsmeans;
>> proc glimmix data=test;
>> class replicate treatment;
>> model X/n = treatment / solution;
>> random intercept / subject=replicate;
>> lsmeans treatment / cl ilink;
>> run;
>> 
>> I appreciate your help in advance!
>> Joshua
>> 
>> 
>>        [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> -- 
> Gregory (Greg) L. Snow Ph.D.
> 538...@gmail.com
> 
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to test a difference in ratios of count data in R

Reply via email to