I apologize if I was not clear in my response. I only mentioned x1, x2 in my example, but I did not clarify that I also knew that P(x6 = 1 | x1..5 = 1) = 0 in the original request. I also see that if he meant that he wanted to sample with replacement from the set of sequences that sample(rep(1:20, 5), 20) is fine for generating said sequences. My interpretation was that the sequences themselves should be sampling with replacement until frequency hits 5, whereupon it is not replaced. Hence my suggestion of:
bigsamp <- sample(1:20, 100, T) idx <- sort(unlist(sapply(1:20, function(x) which(bigsamp == x)[1:5])))[1:20] samp <- bigsamp[idx] I apologize for my lack of clarity, though after reading the original post I'm not sure which solution the OP was looking for. Cheers, Jon -------------------------------------- Jonathan P. Daily Technician - USGS Leetown Science Center 11649 Leetown Road Kearneysville WV, 25430 (304) 724-4480 "Is the room still a room when its empty? Does the room, the thing itself have purpose? Or do we, what's the word... imbue it." - Jubal Early, Firefly Bert Gunter <gunter.ber...@gene.com> wrote on 03/02/2011 02:42:40 PM: > [image removed] > > Re: [R] bootstrap resampling - simplified > > Bert Gunter > > to: > > Jonathan P Daily > > 03/02/2011 02:42 PM > > Cc: > > "Vokey, John", r-help, r-help-bounces > > Folks: > > On Wed, Mar 2, 2011 at 10:32 AM, Jonathan P Daily <jda...@usgs.gov> wrote: > > I will point out again that sampling a five-fold replicate of 1:20 is not > > the same as resampling with replacement, > > -- Correct. In sampling with replacement from 1:20 there is positive > probability of getting all 1's or all 2's, etc. The poster > specifically said that he wanted 0 probability of such results. So, > obviously, the poster does NOT want to "sample with replacement from > 1:20." What he does want (I think) is a re-sample of size n from the > set of all **vectors** of length 20, each element of which is an > integer from 1 to 20, and for which no individual values occur more > than 5 times in the vector. Of course I'm just > interpreting/paraphrasing the original post (if I got it right), but I > think doing so makes the nature of the task clearer: one needs to find > some way to sample with replacement from the space of all such > **sequences**. > > I think it is now clear that one may do so by rejection sampling: i.e. > sample with replacement from 1:20 and throw away any sequences that > fail the at most 5 criterion. The sequences that remain are samples of > size 1 from the population of sequences that satisfy the poster's > criteria (in theory, anyway; this might tax a pseudo RNG in practice). > A collection of n such sequences is a bootstrap sample from this > population. I **think** that's what the poster wants -- and what > others have already provided. However, maybe this clarifies why it > works. > > If I have made any error in this, **Please** post a message pointing > out my error. I sometimes get confused about this stuff, too. > > Cheers, > Bert > > > > > > although I made an error in > > reporting probabilities - the P(x2 = 1 | x1 = 1) = 4/99 and not 4/100. > > When sampling with replacement, P(x2 = 1 | x1 = 1) = P(x2 = 1 | x1 != 1) = > > 1/20. > > -------------------------------------- > > Jonathan P. Daily > > Technician - USGS Leetown Science Center > > 11649 Leetown Road > > Kearneysville WV, 25430 > > (304) 724-4480 > > "Is the room still a room when its empty? Does the room, > > the thing itself have purpose? Or do we, what's the word... imbue it." > > - Jubal Early, Firefly > > > > r-help-boun...@r-project.org wrote on 03/02/2011 01:05:01 PM: > > > >> [image removed] > >> > >> Re: [R] bootstrap resampling - simplified > >> > >> Vokey, John > >> > >> to: > >> > >> r-help > >> > >> 03/02/2011 01:07 PM > >> > >> Sent by: > >> > >> r-help-boun...@r-project.org > >> > >> On 2011-03-02, at 4:00 AM, r-help-requ...@r-project.org wrote: > >> > >> > Hello there, > >> > > >> > I have a problem concerning bootstrapping in R - especially > >> focusing on the resampling part of it. I try to sum it up in a > >> simplified way so that I would not confuse anybody. > >> > > >> > I have a small database consisting of 20 observations (basically > >> numbers from 1 to 20, I mean: 1, 2, 3, 4, 5, ... 18, 19, 20). > >> > > >> > I would like to resample this database many times for the > >> bootstrap process with the following conditions. Firstly, every > >> resampled database should also include 20 observations. Secondly, > >> when selecting a number from the above-mentioned 20 numbers, you can > >> do this selection with replacement. The difficult part comes now: > >> one number can be selected only maximum 5 times. In order to make > >> this clear I show you a couple of examples. So the resampled > >> databases might be like the following ones: > >> > > >> > (1st database) 1,2,1,2,1,2,1,2,1,2,3,3,3,3,3,4,4,4,4,4 > >> > 4 different numbers are chosen (1, 2, 3, 4), each selected - for > >> the maximum possible - 5 times. > >> > > >> > (2nd database) 1,8,8,6,8,8,8,2,3,4,5,6,6,6,6,7,19,1,1,1 > >> > Two numbers - 8 and 6 - selected 5 times (the maximum possible > >> times), number 1 selected 4 times, the others selected less than 4 > > times. > >> > > >> > (3rd database) 1,1,2,2,3,3,4,4,9,9,9,10,10,13,10,9,3,9,2,1 > >> > Number 9 chosen for the maximum possible 5 times, number 10, 3, 2, > >> 1 chosen for 3 times, number 4 selected twice and number 13 selectedonly > > once. > >> > > >> > ... > >> > > >> > Anybody knows how to implement my "tricky" condition into one of > >> the R functions - that one number can be selected only 5 times at > >> most? Are 'boot' and 'bootstrap' packages capable of managing this? > >> I guess they are, I just couldn't figure it out yet... > >> > > >> > Thanks very much! Best regards, > >> > Laszlo Bodnar > >> > >> Laszlo, > >> Create a vector consisting of 5 of each number. Then, for each > >> sample, scramble the order of the items in the vector, and select > >> the first 20. > >> > >> > >> -- > >> Please avoid sending me Word or PowerPoint attachments. > >> See <http://www.gnu.org/philosophy/no-word-attachments.html> > >> > >> -Dr. John R. Vokey > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > Bert Gunter > Genentech Nonclinical Biostatistics > 467-7374 > http://devo.gene.com/groups/devo/depts/ncb/home.shtml ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.