On 17/09/2014 3:46 PM, Giovanni Petris wrote:
Hi Duncan,
You are right. The idea of the derivation consists in 'throwing' k placeholders
("*" in the example below) in the list of the individuals of the population.
For example, if the population is letters[1:6], and the sample size is 4, the following
code generates uniformly a 'sample'.
> n <- 6; k <- 4
> set.seed(2)
> xxx <- rep("*", n + k)
> ind <- sort(sample(2 : (n+k), k))
> xxx[setdiff(1 : (n+k), ind)] <- letters[seq.int(n)]
> noquote(xxx)
[1] a b * c d * * e f *
This represents the sample (b, d, d, f). I am still missing the "all" I need to
do that you mention, that is how I can transform the vector xxx into something more
readily usable, like c(b, d, d, f), or even a summary of counts. I guess I am looking for
a bit of R trickery here...
I think this works, but you'd better check!
Sample the placeholders:
ind <- sort( sample(n + k -1, n-1) ) # I don't think sort() is necessary...
Add placeholders at the start and end:
ind <- c(0, ind, n+k)
Take the diffs, and subtract one:
diff(ind) - 1
I think this gives the counts you want.
Duncan Murdoch
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.