Hi, I have a seemingly simple proportional test. here is the question I am trying to answer: There is a test running each day in the lab, the test comes out as either positive or negative. So at the end of each month, we can calculate a positive rate in that month as the proportion of positive test results. The data look like: Month # positive # total tests positive rate January 24 205 11.7% February 31 234 13.2% March 26 227 11.5% : : : August 42 241 17.4% The total # of positive before August is 182, and the total # of tests before August is 1526. It appears that from January to July, the positive rate is between 11% to 13%, the rate in August is up around 17%. So the question is whether is up in August is statistically significant? I can think of 3 ways to do this test: 1.1. Use binom.test(), set “p” as the average positive rate between January and July (=182/1526): > binom.test(42,241,182/1526) Exact binomial test data: 42 and 241 number of successes = 42, number of trials = 241, p-value = 0.0125 alternative hypothesis: true probability of success is not equal to 0.1192661 95 percent confidence interval: 0.1285821 0.2281769 sample estimates: probability of success 0.1742739 2. 2. Use prop.test(), where I compare the average positive rate between January & July with the positive rate in August: > prop.test(c(182,42),c(1526,241)) 2-sample test for equality of proportions with continuity correction data: c(182, 42) out of c(1526, 241) X-squared = 5.203, df = 1, p-value = 0.02255 alternative hypothesis: two.sided 95 percent confidence interval: -0.107988625 -0.002026982 sample estimates: prop 1 prop 2 0.1192661 0.1742739 3. 2. 3. Use prop.test(), where I compare the average monthly positive rate between January & July with the positive rate in August. The average monthly # of positives is 182/7=26, the average monthly $ of total tests is 1526/7=216: > prop.test(c(26,42),c(218,241)) 2-sample test for equality of proportions with continuity correction data: c(26, 42) out of c(218, 241) X-squared = 2.3258, df = 1, p-value = 0.1272 alternative hypothesis: two.sided 95 percent confidence interval: -0.12375569 0.01374008 sample estimates: prop 1 prop 2 0.1192661 0.1742739 As you can see that the method 3 gave insignificant p value compared to method 1 & 2. While I understand each method is testing different hypothesis, but for the question I am trying to answer (does August have higher positive rate compare to earlier months?), which method is more relevant? Thanks for any suggestion, John
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.