Às 19:19 de 29/03/2025, Ebert,Timothy Aaron escreveu:
How about calculating a 95% confidence interval about the estimated proportion 
in favor. The PooledInfRate package will do this for you. If confidence 
intervals overlap then there is no significant difference.

-----Original Message-----
From: R-help <r-help-boun...@r-project.org> On Behalf Of Kevin Zembower via 
R-help
Sent: Saturday, March 29, 2025 12:10 PM
To: R-help email list <r-help@r-project.org>
Subject: [R] Setting up hypothesis tests with the infer library?

[External Email]

Hello, all,

We're now starting to cover hypothesis tests in my Stats 101 course. As usual 
in courses using the Lock5 textbook, 3rd ed., the homework answers are 
calculated using their StatKey application. In addition (and for no extra 
credit), I'm trying to solve the problems using R. In the case of hypothesis 
test, in addition to manually setting up randomized null hypothesis 
distributions and graphing them, I'm using the infer library. I've been really 
impressed with this library and enjoy solving this type of problem with it.

One of the first steps in solving a hypothesis test with infer is to set up the 
initial sampling dataset. Often, in Lock5 problems, this is a dataset that can 
be downloaded with library(Lock5Data). However, other problems are worded like 
this:

===========================
In 1980 and again in 2010, a Gallup poll asked a random sample of 1000 US citizens 
"Are you in favor of the death penalty for a person convicted of murder?" In 
1980, the proportion saying yes was 0.66. In 2010, it was 0.64. Does this data provide 
evidence that the proportion of US citizens favoring the death penalty was higher in 1980 
than it was in 2010? Use p1 for the proportion in 1980 and p2 for the proportion in 2010.
============================

I've been setting up problems like this with code similar to:
===========================
df <- data.frame(
     survey = c(rep("1980", 1000), rep("2010", 1000)),
     DP = c(rep("Y", 0.66*1000), rep("N", 1000 - (0.66*1000)),
            rep("Y", 0.64*1000), rep("N", 1000 - (0.64*1000))))

(d_hat <- df %>%
      specify(response = DP, explanatory = survey, success = "Y") %>%
      calculate(stat = "diff in props", order = c("1980", "2010"))) 
============================

My question is, is this the way I should be setting up datasets for problems of 
this type? Is there a more efficient way, that doesn't require the construction 
of the whole sample dataset?

It seems like I should be able to do something like this:
=================
(df <- data.frame(group1count = 660, #Or, group1prop = 0.66
                  group1samplesize = 1000,
                  group2count = 640, #Or, group2prop = 0.64
                  group2samplesize = 1000)) =================

Am I overlooking a way to set up these sample dataframes for infer?

Thanks for your advice and guidance.

-Kevin



______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Hello,

Package PooledInfRate seems promising.


library(PooledInfRate)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#>     filter, lag
#> The following objects are masked from 'package:base':
#>
#>     intersect, setdiff, setequal, union

year <- c(1980, 2010)
p <- c(0.66, 0.64)
n <- c(1000, 1000)
df1 <- data.frame(year, p, n)

df1 %>%
  mutate(yes = p * n, no = n - yes) %>%
  select(-p, -n) %>%
  tidyr::pivot_longer(-year, names_to = "answer", values_to = "counts") %>%
  mutate(answer = as.integer(answer == "yes")) %>%
  pooledBin(answer ~ counts | year, data = .)
#>   year           P        Lower       Upper
#> 1 1980 0.001113165 6.831431e-05 0.009052399
#> 2 2010 0.001102918 6.796757e-05 0.008755895


The CI's intersect, so support for the death penalty hasn't changed from 1980 to 2010.

Hope this helps,

Rui Barradas




--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to