OOPS! The result of a calculation below somehow got omitted! (325820+326140+325289+325098+325475+325916)/ (174873+175398+174196+174445+173240+174110) # [1] 1.867351
to be compared (as at the end) with the ratio 1.867471 of the expected number of "weight=2" to expected number of "weight=1". Sorry. Ted. On 23-Jul-09 20:05:09, Ted Harding wrote: > On 23-Jul-09 17:59:56, Jim Bouldin wrote: >> Dan Nordlund wrote: >> "It would be necessary to see the code for your 'brief test' >> before anyone could meaningfully comment on your results. >> But your results for a single test could have been a valid >> "random" result." >> >> I've re-created what I did below. The problem appears to be >> with the weighting process: the unweighted sample came out much >> closer to the actual than the weighted sample (>1% error) did. >> Comments? >> Jim >> >>> x >> [1] 1 2 3 4 5 6 7 8 9 10 11 12 >>> weights >> [1] 1 1 1 1 1 1 2 2 2 2 2 2 >> >>> a = mean(replicate(1000000,(sample(x, 3, prob = weights))));a # (1 >> million samples from x, of size 3, weighted by "weights"; the mean >> should >> be 7.50) >> [1] 7.406977 >>> 7.406977/7.5 >> [1] 0.987597 >> >>> b = mean(replicate(1000000,(sample(x, 3))));b # (1 million samples >>> from >> x, of size 3, not weighted this time; the mean should be 6.50) >> [1] 6.501477 >>> 6.501477/6.5 >> [1] 1.000227 >> >> Jim Bouldin, PhD > > To obtain the result you expected, you would need to explicitly > specify "replace=TRUE", since the default for "replace" is FALSE. > (though probably what you really intended was sampling without > replacement). > > Read carefully what is said about "prob" in '?sample' -- when > replace=FALSE, the probability of inclusion of an element is not > proportional to its weight in 'prob'. > > The reason is that elements with higher weights are more likely > to be chosen early on. This then knocks that higher weight out > of the contest, making it more likely that elements with smaller > weights will be sampled subsequently. Hence the mean of the sample > will be biased slightly downwards, relative to the weighted mean > of the values in x. > > table(replicate(1000000,(sample(x, 3)))) > # 1 2 3 4 5 6 > # 250235 250743 249603 250561 249828 249777 > # 7 8 9 10 11 12 > # 249780 250478 249591 249182 249625 250597 > > (so all nice equal frequencies) > > table(replicate(1000000,(sample(x, 3,prob=weights)))) > # 1 2 3 4 5 6 > # 174873 175398 174196 174445 173240 174110 > # 7 8 9 10 11 12 > # 325820 326140 325289 325098 325475 325916 > > Note that the frequencies of the values with weight=2 are a bit > less than twice the frequencies of the values with weight=1: > > (325820+326140+325289+325098+325475+325916)/ > (174873+175398+174196+174445+173240+174110) > # [1] > > > In fact this is fairly easily caluclated. The possible combinations > (in order of sampling) of the two weights, with their probabilities, > are: > 1s 2s > ------------------------------------------- > 1 1 1 P = 6/18 * 5/17 * 4/16 3 0 > 1 1 2 P = 6/18 * 5/17 * 12/16 2 1 > 1 2 1 P = 6/18 * 12/17 * 5/15 2 1 > 1 2 2 P = 6/18 * 12/17 * 10/15 1 2 > 2 1 1 P = 12/18 * 6/16 * 5/15 2 1 > 2 1 2 P = 12/18 * 6/16 * 10/15 1 2 > 2 2 1 P = 12/18 * 10/16 * 6/14 1 2 > 2 2 2 P = 12/18 * 10/16 * 8/14 0 3 > > So the expected number of "weight=1" in the sample is > > 3*(6/18 * 5/17 * 4/16) + 2*(6/18 * 5/17 * 12/16) + > 2*(6/18 * 12/17 * 5/15) + 1*(6/18 * 12/17 * 10/15) + > 2*(12/18 * 6/16 * 5/15) + 1*(12/18 * 6/16 * 10/15) + > 1*(12/18 * 10/16 * 6/14) + 0 > = 1.046218 > > Hence the expected number of "weight=2" in the sample is > > 3 - 1.046218 = 1.953782 > > and their ratio 1.953782/1.046218 = 1.867471 > > Compare this with the value 1.867351 (above) obtained by simulation! > > Ted. > > -------------------------------------------------------------------- > E-Mail: (Ted Harding) <ted.hard...@manchester.ac.uk> > Fax-to-email: +44 (0)870 094 0861 > Date: 23-Jul-09 Time: 21:05:07 > ------------------------------ XFMail ------------------------------ -------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.hard...@manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 23-Jul-09 Time: 21:15:52 ------------------------------ XFMail ------------------------------ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.