Hello, I have a question regarding bootstrap confidence intervals. Suppose we have a data set consisting of single measurements, and that the measurements are independent but the distribution is unknown. If we want a confidence interval for the population mean, when should a bootstrap confidence interval be preferred over the elementary t interval?
I was hoping the answer would be "always", but some simple simulations suggest that this is incorrect. I simulated some data and calculated 95% elementary t intervals and 95% bootstrap BCA intervals (with the boot package). I calculated the proportion of confidence intervals lying entirely above the true mean, the proportion entirely below the true mean, and the proportion containing the true mean. I used a normal distribution and a t distribution with 3 df. library(boot) samplemean <- function(x, ind) mean(x[ind]) ci.norm <- function(sample.size, n.samples, mu=0, sigma=1, boot.reps) { t.under <- 0; t.over <- 0 bca.under <- 0; bca.over <- 0 for (k in 1:n.samples) { x <- rnorm(sample.size, mu, sigma) b <- boot(x, samplemean, R = boot.reps) bci <- boot.ci(b, type="bca") if (mu < mean(x) - qt(0.975, sample.size - 1)*sd(x)/sqrt(sample.size)) t.under <- t.under + 1 if (mu > mean(x) + qt(0.975, sample.size - 1)*sd(x)/sqrt(sample.size)) t.over <- t.over + 1 if (mu < bci$bca[4]) bca.under <- bca.under + 1 if (mu > bci$bca[5]) bca.over <- bca.over + 1 } return(list(t = c(t.under, t.over, n.samples - (t.under + t.over))/n.samples, bca = c(bca.under, bca.over, n.samples - (bca.under + bca.over))/n.samples)) } ci.t <- function(sample.size, n.samples, df, boot.reps) { t.under <- 0; t.over <- 0 bca.under <- 0; bca.over <- 0 for (k in 1:n.samples) { x <- rt(sample.size, df) b <- boot(x, samplemean, R = boot.reps) bci <- boot.ci(b, type="bca") if (0 < mean(x) - qt(0.975, sample.size - 1)*sd(x)/sqrt(sample.size)) t.under <- t.under + 1 if (0 > mean(x) + qt(0.975, sample.size - 1)*sd(x)/sqrt(sample.size)) t.over <- t.over + 1 if (0 < bci$bca[4]) bca.under <- bca.under + 1 if (0 > bci$bca[5]) bca.over <- bca.over + 1 } return(list(t = c(t.under, t.over, n.samples - (t.under + t.over))/n.samples, bca = c(bca.under, bca.over, n.samples - (bca.under + bca.over))/n.samples)) } set.seed(1) ci.norm(sample.size = 10, n.samples = 1000, boot.reps = 1000) $t [1] 0.019 0.026 0.955 $bca [1] 0.049 0.059 0.892 ci.norm(sample.size = 20, n.samples = 1000, boot.reps = 1000) $t [1] 0.030 0.024 0.946 $bca [1] 0.035 0.037 0.928 ci.t(sample.size = 10, n.samples = 1000, df = 3, boot.reps = 1000) $t [1] 0.018 0.022 0.960 $bca [1] 0.055 0.076 0.869 Warning message: In norm.inter(t, adj.alpha) : extreme order statistics used as endpoints ci.t(sample.size = 20, n.samples = 1000, df = 3, boot.reps = 1000) $t [1] 0.027 0.014 0.959 $bca [1] 0.054 0.047 0.899 I don't understand the warning message, but for these examples, the ordinary t interval appears to be better than the bootstrap BCA interval. I would really appreciate any recommendations anyone can give on when bootstrap confidence intervals should be used. Thanks, Mark -- Mark Seeto National Acoustic Laboratories, Australian Hearing ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.