Re: [R] comparing proportions

Robert A LaBudde Thu, 10 Feb 2011 22:18:20 -0800

You need to change models 2 and 3 to use ~ group+ subject. You left subject out as a fixed factor.


At 05:17 PM 2/10/2011, array chip wrote:

Robert, thank you!
I tried all 3 models you suggested. Since eachsubject only has one line of data in my dataset,would including Subject as a factor in glm() orlm() lead to 0 df for resaiduals?
Attached is my dataset and here is my version of the 3 models:

test<-read.table("test.txt",sep='\t',header=T)
library(lme4)

## model 1: generalized mixed model
fit1<-glmer(cbind(success,failure)~group+(1|subject),data=test,family=binomial)
## model 2: generalized model
fit2<-glm(cbind(success,failure)~group,data=test,family=binomial)
## model 3: linear model
fit3<-lm(prop.success~group,weights=success+failure,data=test)
> fit1
Generalized linear mixed model fit by the Laplace approximation
Formula: cbind(success, failure) ~ group + (1 | subject)
   Data: test
   AIC   BIC logLik deviance
 54.75 57.89 -24.38    48.75
Random effects:
 Groups  Name        Variance Std.Dev.
 subject (Intercept) 1.8256   1.3511
Number of obs: 21, groups: subject, 21
Fixed effects:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -0.6785     0.4950  -1.371    0.170
groupgroup2  -0.7030     0.6974  -1.008    0.313


> summary(fit2)

Call:
glm(formula = cbind(success, failure) ~ group, family = binomial,
    data = test)

Deviance Residuals:
    Min       1Q   Median       3Q      Max
-2.8204  -2.0789  -0.5407   1.0403   4.0539

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -0.4400     0.2080  -2.115   0.0344 *
groupgroup2  -0.6587     0.3108  -2.119   0.0341 *
---
Signif. codes: 0 â***â 0.001 â**â 0.01â*â 0.05 â.â 0.1 â â 1
(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 78.723  on 20  degrees of freedom
Residual deviance: 74.152  on 19  degrees of freedom

> summary(fit3)

Call:
lm(formula = prop.success ~ group, data = test, weights = success +
    failure)

Residuals:
    Min      1Q  Median      3Q     Max
-1.1080 -0.7071 -0.2261  0.4754  1.9157

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.39175    0.08809   4.447 0.000276 ***
groupgroup2 -0.14175    0.12364  -1.146 0.265838
---
Signif. codes: 0 â***â 0.001 â**â 0.01â*â 0.05 â.â 0.1 â â 1
Residual standard error: 0.8676 on 19 degrees of freedom
Multiple R-squared: 0.0647,     Adjusted R-squared: 0.01548
F-statistic: 1.314 on 1 and 19 DF,  p-value: 0.2658
As you can see, all 3 models gave quitedifferent results, the model 2 (generalizedlinear model) gave significant p value whilemodel 1 (generalized mixed model) and model 3(regular linear model) did not. Also as you canfrom the data, prop.success has some valueoutside the range 0.15 to 0.85, so maybe regularlinear model may not be appropriate?
Thank you,



John






From: Robert A LaBudde <r...@lcfltd.com>
To: array chip <arrayprof...@yahoo.com>
Cc: Bert Gunter <gunter.ber...@gene.com>; r-h...@stat.math.ethz.ch
Sent: Thu, February 10, 2011 12:54:44 PM
Subject: Re: [R] comparing proportions
1. If you use a random effects model, you shouldmake Subject the random factor. I.e., a randomintercepts model with 1|Subject. Group is afixed effect: You have only 2 groups. Even ifyou had more than 2 groups, treating Group asrandom would return a standard deviation, not aP-value as you wanted. Finally, I doubt youbelieve the groups used are meaningless, andonly the population of groups is of interest.Instead you consider them special, so Group is a fixed effect.
2. The number of observations for each Subjectis the number of trials, which you previouslyindicated were 7 to 10 in the cases listed.
3. If you have no interest in the Subjecteffect, you can use a fixed Subject factorinstead with glm() instead of glmer() or othermixed model function. This is a good idea solong as the number of subjects is, say, lessthan 10. Otherwise a mixed model would be a better idea.
I suggest you fit all three models to learnabout what you're doing: 1) glmer() orequivalent, with cbind(successes, failures) ~1|Subject + Group; 2) glm() withcbind(successes, failures) ~ Subject + Group;and 3) lm(p ~ Subject + Group), where p is theproportion success for a particular subject and group.
Then compare the results. They will probably all3 give the same conclusion to the hypothesisquestion about Group. I would guess the glmer()P-value will be larger, then the glm() andfinally the lm(), but the last two may reverse.The lm() model may actually perform fairly well,as the Edgeworth series converges rapidly tonormal for binomial distributions with p within0.15 to 0.85 and 10+ replicates, as I stated before.
I'd be interested in seeing the results of these3 fits myself just for curiosity.
At 01:21 PM 2/10/2011, array chip wrote:
> Robert and Bert, thank you both very much forthe response, really appreciated. I agree thatusing regular ANOVA (or regular t test) may notbe wise during the normality issue. So I amdebating between generalized linear model usingglm(.., family=binomial) or generalized linearmixed effect model using glmer(...,family=binomial). I will forward to Robert anoffline list email I sent to Bert about whetherusing (1|subject) versus (1|group) in mixedmodel specification. If using (1|group), bothmodels will give me the same testing for fixedeffects, which is what I am mainly interestedin. So do I really need a mixed model here?
>
> Thanks again
>
> John
>
>
> From: Bert Gunter <<mailto:gunter.ber...@gene.com>gunter.ber...@gene.com>
> To: Robert A LaBudde <<mailto:r...@lcfltd.com>r...@lcfltd.com>
> Cc: array chip <<mailto:arrayprof...@yahoo.com>arrayprof...@yahoo.com>
> Sent: Thu, February 10, 2011 10:04:06 AM
> Subject: Re: [R] comparing proportions
>
> Robert:
>
> Yes, exactly. In an offlist email exchange, he clarified this for me,
> and I suggested exactly what you did, also with the cautions that his
> initial ad hoc suggestions were unwise. His subsequent post to R-help
> and the sig-mixed-models lists were the result, although he appears to
> have specified the model incorrectly in his glmer function (as
> (1|Group) instead of (1|subject).
>
> Cheers,
> Bert
>
> On Thu, Feb 10, 2011 at 9:55 AM, Robert ALaBudde <<mailto:r...@lcfltd.com><mailto:r...@lcfltd.com>r...@lcfltd.com> wrote:> > prop.test() is applicable to a binomialexperiment in each of two classes.
> >
> > Your experiment is binomial only at the subject level. You then have
> > multiple subjects in each of your groups.
> >
> > You have a random factor "Subjects" that must be accounted for.
> >
> > The best way to analyze is a generalizedlinear mixed model with a binomial
> > distribution family and a logit or probit link. You will probably have to
> > investigate overdispersion. If you have a small number of subjects, and
> > don't care about the among-subject effect, you can model them as fixed
> > effects and use glm() instead.
> >
> > Your original question, I believe, related to doing an ANOVA assuming
> > normality. In order for this to work withthis kind of proportion problem,
> > you generally won't get good results unless the number of replicates per
> > subject is 12 or more, and the proportionsinvolved are within 0.15 to 0.85.> > Otherwise you will have biased confidenceintervals and significance tests.
> >
> >
> >
> > At 07:51 PM 2/9/2011, array chip wrote:
> >>
> >> Content-type: text/plain
> >> Content-disposition: inline
> >> Content-length: 2969
> >>
> >> Hi Bert,
> >>
> >> Thanks for your reply. If I understand correctly, prop.test() is not
> >> suitable to
> >> my situation. The input to prop.test() is 2 numbers for each group (# of
> >> success
> >> and # of trials, for example, groups 1 has 5 success out of 10 trials;
> >> group 2
> >> has 3 success out of 7 trials; etc. prop.test() tests whether the
> >> probability of
> >> success is the same across groups.
> >>
> >> In my case, each group has severalsubjects and each subject has 2 numbers
> >> (#
> >> success and # trials). So
> >>
> >> for group 1:
> >> subject 1: 5 success, 10 trials
> >> subject 2: 3 success, 8 trials
> >> :
> >> :
> >>
> >> for group 2:
> >> subject a: 7 success, 9 trials
> >> subject b: 6 success, 7 trials
> >> :
> >> :
> >>
> >> I want to test whether the probability of success in group 1 is the same
> >> as in
> >> group 2. It's like comparing 2 groups of samples using t test, what I am
> >> uncertain about is that whether regular ttest (or non-pamametric test) is
> >> still
> >> appropriate here when the response variable is actually proportions.
> >>
> >> I guess prop.test() can not be used with my dataset, or I may be wrong?
> >>
> >> Thanks
> >>
> >> John
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> ________________________________
> >> From: Bert Gunter<<mailto:gunter.ber...@gene.com><mailto:gunter.ber...@gene.com>gunter.ber...@gene.com>
> >>
> >> Sent: Wed, February 9, 2011 3:58:05 PM
> >> Subject: Re: [R] comparing proportions
> >>
> >> 1. Is this a homework problem?
> >>
> >> 2. ?prop.test
> >>
> >> 3. If you haven't done so already, get and consult a basic statistical
> >> methods book to help you with questions such as this.
> >>
> >> -- Bert
> >>
> >>
> >> > Hi, I have a dataset that has 2 groupsof samples. For each sample, then
> >> > response measured is the number of success (no.success) obatined with
> >> > the
> >> >number
> >> > of trials (no.trials). So a porportionof success (prpop.success) can be> >> > computed as no.success/no.trials. Nowthe objective is to test if there
> >> > is a
> >> > statistical significant difference inthe proportion of success between
> >> > the 2
> >> > groups of samples (say n1=20, n2=30).
> >> >
> >> > I can think of 2 ways to do the test:
> >> >
> >> > 1. regular t test based on the variable prop.success
> >> > 2. Mann-Whitney test based on the variable prop.success
> >> > 2. do a binomial regression as:
> >> >    fit<-glm(cbind(no.success,no.trials-no.success) ~ group, data=data,
> >> >          family=binomial)
> >> >    anova(fit, test='Chisq')
> >> >
> >> > My questions is:
> >> > 1. Is t test appropriate for comparing 2 groups of proportions?
> >> > 2. how about Mann-Whitney non-parametric test?
> >> > 3. Among the 3, which technique is more appropriate?
> >> > 4. any other technique you can suggest?
> >> >
> >> > Thank you,
> >> >
> >> > John
> >> >
> >> >
> >> >
> >> >        [[alternative HTML version deleted]]
> >> >
> >> >
> >> > ______________________________________________
> >> ><mailto:R-help@r-project.org><mailto:R-help@r-project.org>R-help@r-project.orgmailing list> >> ><<https://stat.ethz.ch/mailman/listinfo/r-help>https://stat.ethz.ch/mailman/listinfo/r-help>https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide
> >> ><http://www.R-project.org/posting-guide.html><http://www.r-project.org/posting-guide.html>http://www.R-project.org/posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Bert Gunter
> >> Genentech Nonclinical Biostatistics
> >>
> >>
> >>
> >>
> >>        [[alternative HTML version deleted]]
> >>
> >>
> >> ______________________________________________
> >><mailto:R-help@r-project.org><mailto:R-help@r-project.org>R-help@r-project.orgmailing list> >><<https://stat.ethz.ch/mailman/listinfo/r-help>https://stat.ethz.ch/mailman/listinfo/r-help>https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >><<http://www.r-project.org/posting-guide.html>http://www.R-project.org/posting-guide.html>http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > ================================================================
> > Robert A. LaBudde, PhD, PAS, Dpl.ACAFS e-mail: <mailto:r...@lcfltd.com><mailto:r...@lcfltd.com>r...@lcfltd.com> > Least Cost Formulations,Ltd. URL: <http://lcfltd.com/><http://lcfltd.com/>http://lcfltd.com/
> > 824 Timberlake Drive                    Tel: 757-467-0954
> > Virginia Beach, VA 23464-3239            Fax: 757-467-2947
> >
> > "Vere scire est per causas scire"
> > ================================================================
> >
> >
>
>
>
> --
> Bert Gunter
> Genentech Nonclinical Biostatistics
> 467-7374
><http://devo.gene.com/groups/devo/depts/ncb/home.shtml><http://devo.gene.com/groups/devo/depts/ncb/home.shtml>http://devo.gene.com/groups/devo/depts/ncb/home.shtml
================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail:<mailto:r...@lcfltd.com>r...@lcfltd.comLeast Cost Formulations, Ltd. URL:<http://lcfltd.com/>http://lcfltd.com/
824 Timberlake Drive                    Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"
================================================================


================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: r...@lcfltd.com
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"
================================================================

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] comparing proportions

Reply via email to