In fact, that is what I saw in your RNG help in R 2.7.1:
'"Wichmann-Hill"' The seed, '.Random.seed[-1] == r[1:3]' is an
integer vector of length 3, where each 'r[i]' is in '1:(p[i]
- 1)', where 'p' is the length 3 vector of primes, 'p =
(30269, 30307, 30323)'. The Wichmann-Hill generator has a
cycle length of 6.9536e12 (= 'prod(p-1)/4', see _Applied
Statistics_ (1984) *33*, 123 which corrects the original
article).
You have to admit that Wichmann-Hill's correction appeared here! If it's correct, then 2.8e13,
which is claimed by Duncan Murdoch in previous email, is wrong!
Shengqiao
On Tue, 19 Aug 2008, Prof Brian Ripley wrote:
Please don't lie! You falsely claimed:
In R version 2.7.1, Brian Ripley adopted Wichmann's 1984 correction in RNG
document.
The period in the help has been unchanged since at least Jan 2000 (since
Random.Rd has not had any changes to those lines since then).
Your coincidence calculations may be correct for _independent_ draws from a
discrete distribution on M values, but independence is not satisfied.
Yet again, you are trying to do things that any good text on simulation would
warn you against, and which (in a thread on R-devel) you have already been
told a good way to do.
On Sun, 17 Aug 2008, Shengqiao Li wrote:
On Sun, 17 Aug 2008, Duncan Murdoch wrote:
Shengqiao Li wrote:
Dear all,
Recently I am generating large random samples (10M) and any duplicated
numbers are not desired.
We tried several RNGs in R and found Wichmann-Hill did not produce
duplications.
The duplication problem is the interesting birthday problem. If there are
M possible numbers, randomly draw N numbers from them,
the average number of dupilcations D = N(N-1)/2/M.
For Knuth-TAOCP and Knuth-TAOCP-2002, M=2^30, since this modulus is used.
D = 46566.12 for N=10M samples.
For Marsaglia-Multicarry, Super-Duper and Mersene-Twister, M=2^32. D =
11641.53 for N = 10M samples.
My testing results (see below) agree with above analysis. But for
Wichmann-Hill, it wasn't. Wichmann-Hill's cycle is 6.9536e12 (refer to
RNG help by ?RNG and Whichmann's correction in 1984). Thus M <=
6.9536e12. D
= 7.19052 when N=1e7 and D>= 179.763 for N=5e7.
But in my tests, duplications were surprisingly not observed.
It seems that Wichmann-Hill has a much larger cycle than the one
documented!
Anybody can solve this puzzle?
As I told you, look at the code. The cycle length of Wichmann-Hill in R
appears to be 2.8e13, and that also appears to be M in your notation.
Your birthday problem calculation does not apply here.
You could probably get a good approximation to it by rounding the
Wichmann-Hill output (e.g. look at round(2^30 * runif()), or maybe more
severe rounding).
M is larger than what was previously documented, but Brian Ripley has
corrected the documentation.
Wichmann claimed 2.78X10^13 in his 1982 original paper. He made correction
in 1984, "The period of the generator was incorrectly given as 2.78 X
10^13. In fact its period is only a quarter of that value: 6.95X10^12." In
R version 2.7.1, Brian Ripley adopted Wichmann's 1984 correction in RNG
document. That is the currently documented cycle is 6.95X10^12 not
2.78X10^13 nor your approxmation 2.8e13.
So, is the cycle claimed in 1982 correct or the one claimed in 1984?
Even if the larger one 2.78e13 claimed in 1982 is correct, around 45
duplications were expected for 50M samples. But I got 0. My testing method
is based on the example code in the last third line in RNG help.
Shengqiao Li
Duncan Murdoch
Regards,
Shengqiao Li
Department of Statistics
West Virgina Unversity
==============Testing===================
RNGkind(kind="Knuth-TAOCP");
RNGkind();
[1] "Knuth-TAOCP" "Inversion"
sum(duplicated(runif(1e7)));
[1] 46216
RNGkind(kind="Knuth-TAOCP-2002");
RNGkind();
[1] "Knuth-TAOCP-2002" "Inversion"
sum(duplicated(runif(1e7)));
[1] 46373
RNGkind(kind="Marsaglia-Multicarry");
RNGkind();
[1] "Marsaglia-Multicarry" "Inversion"
sum(duplicated(runif(1e7)));
[1] 11671
RNGkind(kind="Super-Duper");
RNGkind();
[1] "Super-Duper" "Inversion"
sum(duplicated(runif(1e7)));
[1] 11704
RNGkind(kind="Mersenne-Twister");
RNGkind();
[1] "Mersenne-Twister" "Inversion"
sum(duplicated(runif(1e7)));
[1] 11508
RNGkind(kind="Wichmann-Hill");
RNGkind();
[1] "Wichmann-Hill" "Inversion"
sum(duplicated(runif(1e7)));
[1] 0
gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 199157 5.4 646480 17.3 10149018 271.1
Vcells 4497151 34.4 108714629 829.5 169585390 1293.9
sum(duplicated(runif(5e7)))
[1] 0
sessionInfo()
R version 2.7.1 (2008-06-23)
i386-pc-mingw32
locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tools_2.7.1
==============End of Testing===================
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Brian D. Ripley, [EMAIL PROTECTED]
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.