On Fri, Nov 5, 2010 at 3:51 AM, Tim Hesterberg <timhesterb...@gmail.com> wrote: > Faye wrote: >>Our survey is structured as : To be investigated area is divided into >>6 regions, within each region, one urban community and one rural >>community are randomly selected, then samples are randomly drawn from >>each selected uran and rural community. >> >>The problems is that in urban/rural stratum, we only have one sample. >>In this case, how to do bootstrap? > > You are lucky that your sample size is 1. If it were 2 you would > probably have proceeded without realizing that the answers were wrong. > > Suppose you had two samples in each stratum. If you proceed naturally, > drawing bootstrap samples of size 2 from each stratum, this would > underestimate variability by a factor of 2. > > In general the ordinary nonparametric bootstrap estimates of variability > are biased downward by a factor of (n-1)/n -- exactly for the mean, > approximately for other statistics. In multiple-sample and stratified > situations, the bias depends on the stratum sizes. > > Three remedies are: > * draw bootstrap samples of size n-1 > * "bootknife" sampling - omit one observation (a jackknife sample), then > draw a bootstrap sample of size n from that > * bootstrap from a kernel density estimate, with kernel covariance equal > to empirical covariance (with divisor n-1) / n. > The latter two are described in > Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling vs. > Smoothing, Proceedings of the Section on Statistics and the Environment, > American Statistical Association, 2924-2930. > http://home.comcast.net/~timhesterberg/articles/JSM04-bootknife.pdf > > All three are undefined for samples of size 1. You need to go to some > other bootstrap, e.g. a parametric bootstrap with variability estimated > from other data. >
And the 'survey' package supplies the first option. (It also supplies a bootstrap sample of size n that allows finite population corrections, designed for situations with a large n and a high sampling fraction, such as some business surveys.) With a sample size of 1 per stratum there are no design-unbiased estimators of the standard error, so as others have said you need external data. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.