Re: [R] Sample size Determination to Compare Three Independent Proportions

Marc Schwartz via R-help Tue, 10 Aug 2021 06:29:25 -0700

Hi,

A search would suggest that there may not be an R function/package thatprovides power/sample size calculations for the specific scenarios thatyou are describing. There may be something that I am missing, and thereis also other dedicated software such as PASS(https://www.ncss.com/software/pass/) which is not free, but provides alarge library of possibly relevant functions and support.

That being said, you can run Monte Carlo simulations in R to achieve theresults you want, while providing yourself with options relative tostudy design, intended tests, and adjustments for multiple comparisonsas apropos. Many prefer this approach, since it gives you specificcontrol over this process.

Taking the simple case, where you are going to run a 3 x 2 chi-square asyour primary endpoint, and want to power for that, here is a possiblefunction, with the same sample size in each group:


ThreeGroups <- function(n, p1, p2, p3, R = 10000, power = 0.8) {

  MCSim <- function(n, p1, p2, p3) {
    ## Create a binary distribution for each group
    G1 <- rbinom(n, 1, p1)
    G2 <- rbinom(n, 1, p2)
    G3 <- rbinom(n, 1, p3)

    ## Create a 3 x 2 matrix containing the 3 group counts
    MAT <- cbind(table(G1), table(G2), table(G3))

    ## Perform a chi-square and just return the p value
    chisq.test(MAT)$p.value
  }

  ## Replicate the above R times, and get
  ## a distribution of p values
  MC <- replicate(R, MCSim(n, p1, p2, p3))

  ## Get the p value at the desired "power" quantile
  quantile(MC, power)
}

Essentially, the above internal MCSim() function generates 3 randomsamples of size 'n' from the binomial distribution, at the 3 proportionsdesired. For each run, it will perform a chi-square test of the 3 x 2matrix of counts, returning the p value for each run. The main functionwill then return the p value at the quantile (power) within thegenerated distribution of p values.

You can look at the help pages for the various functions that I useabove, to get a sense for how they work.

You increase the sample size ('n') until you get a p value returned <=0.05, if that is your desired alpha level.

You also want 'R', the number of replications within each run, to belarge enough so that the returned p value quantile is relatively stable.Values for 'R', once you get "close to" the desired p value should be onthe order of 1,000,000 or higher. Stay with lower values for 'R' untilyou get in the ballpark of your target, since larger values take muchlonger to run.


Thus, using your example proportions of 0.25, 0.25, and 0.35:

## 250 per group, 750 total - Not enough
> ThreeGroups(250, 0.25, 0.25, 0.35, R = 10000)
       80%
0.08884723

## 350 per group, 1050 total - Too high
> ThreeGroups(350, 0.25, 0.25, 0.35, R = 10000)
      80%
0.0270829

## 300 per group, 900 total - Close!
> ThreeGroups(300, 0.25, 0.25, 0.35, R = 10000)
       80%
0.04818842

So, keep tweaking the sample size until you get a returned p value atyour target alpha level, with a large enough 'R', so that you getconsistent sample sizes for multiple runs.


If I run 300 per group again, with 10,000 replicates:

> ThreeGroups(300, 0.25, 0.25, 0.35, R = 10000)
       80%
0.05033933

the returned p value is slightly higher. So, again, increase R toimprove the stability of the returned p value and run it multiple timesto be comfortable that the p value change is less than an acceptablethreshold.

Now, the tricky part is to decide if the 3 x 2 is your primary endpoint,and want to power only for that, or, if you also want to power for theother two-group comparisons, possibly having to account for p valueadjustments for the multiple comparisons, resulting in the need to powerfor a lower alpha level for those tests. In that scenario, you would endup taking the largest sample size that you identify across the varioushypotheses, recognizing that while you are powering for one hypothesis,you may be overpowering for others.

That is something that you need to decide, and perhaps considerconsulting with other local statistical expertise, as may be apropos, inthe prospective study design, possibly influenced by otherrelevant/similar research in your domain.

You can easily modify the above function for the two-group scenario aswell, and I will leave that to you.


Regards,

Marc


AbouEl-Makarim Aboueissa wrote on 8/10/21 6:34 AM:

Hi Marc:

First, thank you very much for your help in this matter.

Will perform an initial omnibus test of all three groups (e.g. 3 x 2chi-square), possibly followed byall possible 2 x 2 pairwise comparisons (e.g. 1 versus 2, 1 versus 3,2 versus 3),

We can assume _either_ the desired sample size in each group is the same_or_ proportional to the population size.


  We can set p=0.25 and set p1=p2=p3=p so that the H0 is true.

We can assume that the expected proportion of "Yes" values in each groupis 0.25

For the alternative hypotheses, for example, we can set p1 = .25,p2=.25, p3=.35



Again thank you very much in advance.

abou

______________________

*AbouEl-Makarim Aboueissa, PhD
*
*
*
*Professor, Statistics and Data Science*
*Graduate Coordinator*
*Department of Mathematics and Statistics
*
*University of Southern Maine*

On Mon, Aug 9, 2021 at 10:53 AM Marc Schwartz <marc_schwa...@me.com<mailto:marc_schwa...@me.com>> wrote:


    Hi,

    You are going to need to provide more information than what you have
    below and I may be mis-interpreting what you have provided.

    Presuming you are designing a prospective, three-group, randomized
    allocation study, there is typically an a priori specification of the
    ratios of the sample sizes for each group such as 1:1:1, indicating
    that
    the desired sample size in each group is the same.

    You would also need to specify the expected proportions of "Yes" values
    in each group.

    Further, you need to specify how you are going to compare the
    proportions in each group. Are you going to perform an initial omnibus
    test of all three groups (e.g. 3 x 2 chi-square), possibly followed by
    all possible 2 x 2 pairwise comparisons (e.g. 1 versus 2, 1 versus 3, 2
    versus 3), or are you just going to compare 2 versus 1, and 3 versus 1,
    where 1 is a control group?

    Depending upon your testing plan, you may also need to account for p
    value adjustments for multiple comparisons, in which case, you also
    need
    to specify what adjustment method you plan to use, to know what the
    target alpha level will be.

    On the other hand, if you already have the data collected, thus have
    fixed sample sizes available per your wording below, simply go ahead
    and
    perform your planned analyses, as the notion of "power" is largely an a
    priori consideration, which reflects the probability of finding a
    "statistically significant" result at a given alpha level, given that
    your a priori assumptions are valid.

    Regards,

    Marc Schwartz


    AbouEl-Makarim Aboueissa wrote on 8/9/21 9:41 AM:
     > Dear All: good morning
     >
     > *Re:* Sample Size Determination to Compare Three Independent
    Proportions
     >
     > *Situation:*
     >
     > Three Binary variables (Yes, No)
     >
     > Three independent populations with fixed sizes (*say:* N1 = 1500,
    N2 = 900,
     > N3 = 1350).
     >
     > Power = 0.80
     >
     > How to choose the sample sizes to compare the three proportions
    of “Yes”
     > among the three variables.
     >
     > If you know a reference to this topic, it will be very helpful too.
     >
     > with many thanks in advance
     >
     > abou
     > ______________________
     >
     >
     > *AbouEl-Makarim Aboueissa, PhD*
     >
     > *Professor, Statistics and Data Science*
     > *Graduate Coordinator*
     >
     > *Department of Mathematics and Statistics*
     > *University of Southern Maine*
     >


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sample size Determination to Compare Three Independent Proportions

Reply via email to