the "something similar" is return a different result in two situations where one might expect the same result, ie when a probability vector with equal probabilities is supplied versus the default of equal probabilities.
And, assuming that by "concerns me" you mean "worries me", I have no clue why you think it does! It is a curiosity. albyn On Thu, May 23, 2013 at 04:38:18PM +0000, Nordlund, Dan (DSHS/RDA) wrote: > > -----Original Message----- > > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- > > project.org] On Behalf Of Albyn Jones > > Sent: Thursday, May 23, 2013 8:30 AM > > To: r-help@r-project.org > > Subject: Re: [R] sample(c(0, 1)...) vs. rbinom > > > > After a bit of playing around, I discovered that > > sample() does something similar in other situations: > > > > > set.seed(105021) > > > sample(1:5,1,prob=c(1,1,1,1,1)) > > [1] 3 > > > set.seed(105021) > > > sample(1:5,1) > > [1] 2 > > > > > > > set.seed(105021) > > > sample(1:5,5,prob=c(1,1,1,1,1)) > > [1] 3 4 2 1 5 > > > set.seed(105021) > > > sample(1:5,5) > > [1] 2 5 1 4 3 > > > > albyn > > > What is the "something similar" you are referring to? And I guess I still > don't understand what it is that concerns you about the sample function. > > > Dan > > Daniel J. Nordlund > Washington State Department of Social and Health Services > Planning, Performance, and Accountability > Research and Data Analysis Division > Olympia, WA 98504-5204 > > > > > > > > > On 2013-05-22 22:24, peter dalgaard wrote: > > > On May 23, 2013, at 07:01 , Jeff Newmiller wrote: > > > > > >> You seem to be building an elaborate structure for testing the > > >> reproducibility of the random number generator. I suspect that > > rbinom > > >> is calling the random number generator a different number of times > > >> when you pass prob=0.5 than otherwise. > > > > > > Nope. It's switching 0 and 1: > > > > > >> set.seed(1); sample(0:1,10,replace=TRUE,prob=c(1-pp,pp)); > > >> set.seed(1); rbinom(10,1,pp) > > > [1] 1 1 0 0 1 0 0 0 0 1 > > > [1] 0 0 1 1 0 1 1 1 1 0 > > > > > > which is curious, but of course has no implication for the > > > distributional properties. Curiouser, if you drop the prob= in > > > sample. > > > > > >> set.seed(1); sample(0:1,10,replace=TRUE); set.seed(1); > > >> rbinom(10,1,pp) > > > [1] 0 0 1 1 0 1 1 1 1 0 > > > [1] 0 0 1 1 0 1 1 1 1 0 > > > > > > However, it was never a design goal that two different random > > > functions (or even two code paths within the same function) should > > > give exactly the same values, even if they simulate the same > > > distribution, so this is nothing more than a curiosity. > > > > > > > > >>> > > >>> Appendix A: some R code that exhibits the problem > > >>> ================================================= > > >>> > > >>> ppp <- seq(0, 1, by = 0.01) > > >>> > > >>> result <- do.call(rbind, lapply(ppp, function(p) { > > >>> set.seed(1) > > >>> sampleRes <- sample(c(0, 1), size = 1, replace = TRUE, > > >>> prob=c(1-p, p)) > > >>> > > >>> set.seed(1) > > >>> rbinomRes <- rbinom(1, size = 1, prob = p) > > >>> > > >>> data.frame(prob = p, equivalent = all(sampleRes == rbinomRes)) > > >>> > > >>> })) > > >>> > > >>> result > > >>> > > >>> > > >>> Appendix B: the output from the R code > > >>> ====================================== > > >>> > > >>> prob equivalent > > >>> 1 0.00 TRUE > > >>> 2 0.01 TRUE > > >>> 3 0.02 TRUE > > >>> 4 0.03 TRUE > > >>> 5 0.04 TRUE > > >>> 6 0.05 TRUE > > >>> 7 0.06 TRUE > > >>> 8 0.07 TRUE > > >>> 9 0.08 TRUE > > >>> 10 0.09 TRUE > > >>> 11 0.10 TRUE > > >>> 12 0.11 TRUE > > >>> 13 0.12 TRUE > > >>> 14 0.13 TRUE > > >>> 15 0.14 TRUE > > >>> 16 0.15 TRUE > > >>> 17 0.16 TRUE > > >>> 18 0.17 TRUE > > >>> 19 0.18 TRUE > > >>> 20 0.19 TRUE > > >>> 21 0.20 TRUE > > >>> 22 0.21 TRUE > > >>> 23 0.22 TRUE > > >>> 24 0.23 TRUE > > >>> 25 0.24 TRUE > > >>> 26 0.25 TRUE > > >>> 27 0.26 TRUE > > >>> 28 0.27 TRUE > > >>> 29 0.28 TRUE > > >>> 30 0.29 TRUE > > >>> 31 0.30 TRUE > > >>> 32 0.31 TRUE > > >>> 33 0.32 TRUE > > >>> 34 0.33 TRUE > > >>> 35 0.34 TRUE > > >>> 36 0.35 TRUE > > >>> 37 0.36 TRUE > > >>> 38 0.37 TRUE > > >>> 39 0.38 TRUE > > >>> 40 0.39 TRUE > > >>> 41 0.40 TRUE > > >>> 42 0.41 TRUE > > >>> 43 0.42 TRUE > > >>> 44 0.43 TRUE > > >>> 45 0.44 TRUE > > >>> 46 0.45 TRUE > > >>> 47 0.46 TRUE > > >>> 48 0.47 TRUE > > >>> 49 0.48 TRUE > > >>> 50 0.49 TRUE > > >>> 51 0.50 FALSE > > >>> 52 0.51 TRUE > > >>> 53 0.52 TRUE > > >>> 54 0.53 TRUE > > >>> 55 0.54 TRUE > > >>> 56 0.55 TRUE > > >>> 57 0.56 TRUE > > >>> 58 0.57 TRUE > > >>> 59 0.58 TRUE > > >>> 60 0.59 TRUE > > >>> 61 0.60 TRUE > > >>> 62 0.61 TRUE > > >>> 63 0.62 TRUE > > >>> 64 0.63 TRUE > > >>> 65 0.64 TRUE > > >>> 66 0.65 TRUE > > >>> 67 0.66 TRUE > > >>> 68 0.67 TRUE > > >>> 69 0.68 TRUE > > >>> 70 0.69 TRUE > > >>> 71 0.70 TRUE > > >>> 72 0.71 TRUE > > >>> 73 0.72 TRUE > > >>> 74 0.73 TRUE > > >>> 75 0.74 TRUE > > >>> 76 0.75 TRUE > > >>> 77 0.76 TRUE > > >>> 78 0.77 TRUE > > >>> 79 0.78 TRUE > > >>> 80 0.79 TRUE > > >>> 81 0.80 TRUE > > >>> 82 0.81 TRUE > > >>> 83 0.82 TRUE > > >>> 84 0.83 TRUE > > >>> 85 0.84 TRUE > > >>> 86 0.85 TRUE > > >>> 87 0.86 TRUE > > >>> 88 0.87 TRUE > > >>> 89 0.88 TRUE > > >>> 90 0.89 TRUE > > >>> 91 0.90 TRUE > > >>> 92 0.91 TRUE > > >>> 93 0.92 TRUE > > >>> 94 0.93 TRUE > > >>> 95 0.94 TRUE > > >>> 96 0.95 TRUE > > >>> 97 0.96 TRUE > > >>> 98 0.97 TRUE > > >>> 99 0.98 TRUE > > >>> 100 0.99 TRUE > > >>> 101 1.00 TRUE > > >>> > > >>> Appendix C: Session information > > >>> =============================== > > >>> > > >>>> sessionInfo() > > >>> R version 3.0.0 (2013-04-03) > > >>> Platform: x86_64-redhat-linux-gnu (64-bit) > > >>> > > >>> locale: > > >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > > >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > > >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > > >>> [7] LC_PAPER=C LC_NAME=C > > >>> [9] LC_ADDRESS=C LC_TELEPHONE=C > > >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > >>> > > >>> attached base packages: > > >>> [1] stats graphics grDevices utils datasets methods > > >>> base > > >>> > > >>>> > > >>> > > >>> ______________________________________________ > > >>> R-help@r-project.org mailing list > > >>> https://stat.ethz.ch/mailman/listinfo/r-help > > >>> PLEASE do read the posting guide > > >>> http://www.R-project.org/posting-guide.html > > >>> and provide commented, minimal, self-contained, reproducible code. > > >> > > >> ______________________________________________ > > >> R-help@r-project.org mailing list > > >> https://stat.ethz.ch/mailman/listinfo/r-help > > >> PLEASE do read the posting guide > > >> http://www.R-project.org/posting-guide.html > > >> and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting- > > guide.html > > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Albyn Jones Reed College jo...@reed.edu ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.