Hi Rolf, Let's say we have a course called Corgiology 101, with a single moderated exam. And let's say the moderators transform initial exam scores, such that there are fixed percentages of pass rates and A grades.
Rather than count the number of passes, we can count the number of "jumps". That is, the number of people that pass the corgiology exam after moderation, that would not have passed without moderation. I've created a function to test for underdispersion, based on your expression. (I hope I got it right). Then I've gone on to create simulations, using both constant and nonconstant class sizes. The nonconstant simulations apply an (approx) discrete scaling transformation, referred to previously. We can see from the examples that there are a lot of these jumps. And more importantly, they appear to be underdispersed. ----code---- PASS.SCORE <- 0.5 A.SCORE <- 0.8 #target parameters PASS.RATE <- 0.8 A.RATE <- 0.2 #unmoderated parameters UNMOD.MEAN.SCORE <- 0.65 UNMOD.SD.SCORE <- 0.075 NCLASSES <- 2000 NSTUD.CONST <- 200 NSTUD.NONCONST.LIMS <- c (50, 800) sim.njump <- function (nstud, mean0=UNMOD.MEAN.SCORE, sd0=UNMOD.SD.SCORE, pass.score=PASS.SCORE, a.score=A.SCORE, pass.rate=PASS.RATE, a.rate=A.RATE) { x <- rnorm (nstud, mean0, sd0) q <- quantile (x, 1 - c (pass.rate, a.rate), names=FALSE) dq <- diff (q) q <- (a.score - pass.score) / dq * q y <- pass.score - q [1] + (a.score - pass.score) / dq * x sum (x < a.score & y >= a.score) } sim.nclasses <- function (nclasses, nstud, nstud.std) { nstud <- rep_len (nstud, nclasses) njump <- integer (nclasses) for (i in 1:nclasses) njump [i] <- sim.njump (nstud [i]) if (missing (nstud.std) ) njump else round (nstud.std / nstud * njump) } is.under <- function (x, n) var (x) < mean (x) * (1 - mean (x) / n) njump.hom <- sim.nclasses (NCLASSES, NSTUD.CONST) nstud <- round (runif (NCLASSES, NSTUD.NONCONST.LIMS [1], NSTUD.NONCONST.LIMS [2]) ) njump.het <- sim.nclasses (NCLASSES, nstud, NSTUD.CONST) under.hom <- is.under (njump.hom, NSTUD.CONST) under.het <- is.under (njump.het, NSTUD.CONST) main.hom <- paste0 ("const class size (under=", under.hom, ")") main.het <- paste0 ("diff class sizes (under=", under.het, ")") p0 <- par (mfrow = c (2, 1) ) hist (njump.hom, main=main.hom) hist (njump.het, main=main.het) par (p0) ----code---- best, B. On Thu, Mar 25, 2021 at 2:33 PM Rolf Turner <r.tur...@auckland.ac.nz> wrote: > > > I would like a real-life example of a data set which one might think to > model by a binomial distribution, but which is substantially > underdispersed. I.e. a sample X = {X_1, X_2, ..., X_N} where each X_i > is an integer between 0 and n (n known a priori) such that var(X) << > mean(X)*(1 - mean(X)/n). > > Does anyone know of any such examples? Do any exist? I've done > a perfunctory web search, and had a look at "A Handbook of Small > Data Sets" by Hand, Daly, Lunn, et al., and drawn a blank. > > I've seen on the web some references to underdispersed "pseudo-Poisson" > data, but not to underdispersed "pseudo-binomial" data. And of course > there's lots of *over* dispersed stuff. But that's not what I want. > > I can *simulate* data sets of the sor that I am looking for (so far the > only ideas I've had for doing this are pretty simplistic and > artificial) but I'd like to get my hands on a *real* example, if > possible. > > Grateful for any pointers/suggestions. > > cheers, > > Rolf Turner > > -- > Honorary Research Fellow > Department of Statistics > University of Auckland > Phone: +64-9-373-7599 ext. 88276 > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.