On Oct 5, 2011, at 4:08 PM, Sam wrote:

Dear list,

I have very little experience in dealing with proportions, i am sure this is a very simple question

Sometimes making that claim in a group of statisticians can provoke an extended discussion to which there will be no eventual agreement. Discussions about testing proportions in groups with small numbers is one such excellent example.

but i could find no suitable answer beyond doing a chi-sq test and then using the Marascuilo procedure as a post-hoc analysis.

I am simply wanting to know if the proportions ( i.e the number of Yes / No) significantly differ between the cases and if so which cases are significantly high or low?


proportion <- structure(list(Case = structure(1:11, .Label = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K"), class = "factor"), Yes = c(18L,
2L, 1L, 2L, 44L, 27L, 2L, 15L, 13L, 3L, 34L), No = c(171L, 11L,
5L, 8L, 146L, 80L, 5L, 30L, 22L, 5L, 42L), Num = c(189L, 13L,
6L, 10L, 190L, 107L, 7L, 45L, 35L, 8L, 76L)), .Names = c("Case",
"Yes", "No", "Num"), class = "data.frame", row.names = c(NA,
-11L))


I would probably have turned to logistic regression and used p.adjust to deal with the multiple comparisons issues. By default, the Intercept p-value estimate for a simple model with a factor variable as the predictor tests for the Intercept = 0 tests for of a proportion to 0.50 (since that is the proportion at which the log(odds) = 0). It then tests for equality to that odds for all the other levels (the same as odds ratios being 1 or their coefficients=0). (The first way I set this up I ended up with your A- level being the reference and the hypotheses being tested didn't seem reasonable or "interesting.... namely that the lowest proportion was equal to 50% and that all the rest of the proportions were equal to that lowest proportion.)

as.matrix(proportion$Yes/proportion$Num)
           [,1]
 [1,] 0.0952381
 [2,] 0.1538462
 [3,] 0.1666667
 [4,] 0.2000000
 [5,] 0.2315789
 [6,] 0.2523364
 [7,] 0.2857143
 [8,] 0.3333333
 [9,] 0.3714286
[10,] 0.3750000
[11,] 0.4473684

So I took your data and made your "K" level the reference value since its value (0.44) is the one closest to 50% and then looked at the p- values from the others as information about their similarity to the K level.

proportion$Case <- relevel(proportion$Case, "K")
casefit <- glm(cbind(Yes, Num) ~ Case, data= proportion, family="binomial")
summary(casefit)
summary(casefit)$coefficients
p.adjust( summary(casefit)$coefficients[ , 4] )

That set of hypotheses gives very few "significant" results, but most of your groups are pretty sparse so that should be expected. Another (and I think better) approach resulting in even fewer "significant" results is to use logistic regression with an offset term equal to the log(odds) for the entire group:

log(  with(proportion, sum(Yes)/sum(No) ) )   # returns [1] -1.181994
This corresponds to an expected proportion of 0.2346939

casefit <- glm(cbind(Yes, Num) ~ 0+ Case, data= proportion,
offset=rep( log( with(proportion, sum(Yes)/ sum(No) ) ), 11), # each level gets the same offset.
               family="binomial")

I used the 0+ on the RHS of the formula to force a labeled estimate for each level, since the p-values are now in reference to the overall proportion. Since only one level has a "significant" p-value and it is highly significant (due to the combination of higher numbers of "events" and "distance" from the NULL estimate) you will not see much change if you adjust its p-value.

cbind( round( summary(casefit)$coefficients, 3),
       Prpns =round( (proportion$Yes/proportion$Num)[c(11,1:10)], 3),
       N_Yes=proportion$Yes[c(11,1:10)] )

      Estimate Std. Error z value Pr(>|z|) Prpns N_Yes
CaseK    0.378      0.206   1.830    0.067 0.447    34
CaseA   -1.169      0.247  -4.741    0.000 0.095    18
CaseB   -0.690      0.760  -0.908    0.364 0.154     2
CaseC   -0.610      1.080  -0.565    0.572 0.167     1
CaseD   -0.427      0.775  -0.552    0.581 0.200     2
CaseE   -0.281      0.167  -1.679    0.093 0.232    44
CaseF   -0.195      0.215  -0.905    0.365 0.252    27
CaseG   -0.071      0.802  -0.088    0.930 0.286     2
CaseH    0.083      0.298   0.280    0.780 0.333    15
CaseI    0.192      0.325   0.590    0.555 0.371    13
CaseJ    0.201      0.677   0.297    0.766 0.375     3

Comparing the numbers and the observed proportions to the overall proportion (0.2346939) shows not many (only one actually) groups with strong evidence of "being different." I think you need larger numbers.

--
David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to