In fact, that is what I saw in your RNG help in R 2.7.1:

 '"Wichmann-Hill"' The seed, '.Random.seed[-1] == r[1:3]' is an
          integer vector of length 3, where each 'r[i]' is in '1:(p[i]
          - 1)', where 'p' is the length 3 vector of primes, 'p =
          (30269, 30307, 30323)'. The Wichmann-Hill generator has a
          cycle length of 6.9536e12 (= 'prod(p-1)/4', see _Applied
          Statistics_ (1984) *33*, 123 which corrects the original
          article).

You have to admit that Wichmann-Hill's correction appeared here! If it's correct, then 2.8e13, which is claimed by Duncan Murdoch in previous email, is wrong!


Shengqiao

On Tue, 19 Aug 2008, Prof Brian Ripley wrote:

Please don't lie!  You falsely claimed:

In R version 2.7.1, Brian Ripley adopted Wichmann's 1984 correction in RNG document.

The period in the help has been unchanged since at least Jan 2000 (since Random.Rd has not had any changes to those lines since then).

Your coincidence calculations may be correct for _independent_ draws from a discrete distribution on M values, but independence is not satisfied. Yet again, you are trying to do things that any good text on simulation would warn you against, and which (in a thread on R-devel) you have already been told a good way to do.

On Sun, 17 Aug 2008, Shengqiao Li wrote:


On Sun, 17 Aug 2008, Duncan Murdoch wrote:

Shengqiao Li wrote:
Dear all,

Recently I am generating large random samples (10M) and any duplicated numbers are not desired. We tried several RNGs in R and found Wichmann-Hill did not produce duplications.

The duplication problem is the interesting birthday problem. If there are M possible numbers, randomly draw N numbers from them,
the average number of dupilcations D = N(N-1)/2/M.

For Knuth-TAOCP and Knuth-TAOCP-2002, M=2^30, since this modulus is used. D = 46566.12 for N=10M samples.

For Marsaglia-Multicarry, Super-Duper and Mersene-Twister, M=2^32. D = 11641.53 for N = 10M samples.

My testing results (see below) agree with above analysis. But for Wichmann-Hill, it wasn't. Wichmann-Hill's cycle is 6.9536e12 (refer to RNG help by ?RNG and Whichmann's correction in 1984). Thus M <= 6.9536e12. D
= 7.19052 when N=1e7 and D>= 179.763 for N=5e7.
But in my tests, duplications were surprisingly not observed.

It seems that Wichmann-Hill has a much larger cycle than the one documented!

Anybody can solve this puzzle?


As I told you, look at the code. The cycle length of Wichmann-Hill in R appears to be 2.8e13, and that also appears to be M in your notation. Your birthday problem calculation does not apply here. You could probably get a good approximation to it by rounding the Wichmann-Hill output (e.g. look at round(2^30 * runif()), or maybe more severe rounding).

M is larger than what was previously documented, but Brian Ripley has corrected the documentation.

Wichmann claimed 2.78X10^13 in his 1982 original paper. He made correction in 1984, "The period of the generator was incorrectly given as 2.78 X 10^13. In fact its period is only a quarter of that value: 6.95X10^12." In R version 2.7.1, Brian Ripley adopted Wichmann's 1984 correction in RNG document. That is the currently documented cycle is 6.95X10^12 not 2.78X10^13 nor your approxmation 2.8e13.

So, is the cycle claimed in 1982 correct or the one claimed in 1984?

Even if the larger one 2.78e13 claimed in 1982 is correct, around 45 duplications were expected for 50M samples. But I got 0. My testing method is based on the example code in the last third line in RNG help.

Shengqiao Li


Duncan Murdoch
Regards,

Shengqiao Li

Department of Statistics
West Virgina Unversity

==============Testing===================


RNGkind(kind="Knuth-TAOCP");
RNGkind();

[1] "Knuth-TAOCP" "Inversion"

sum(duplicated(runif(1e7)));

[1] 46216

RNGkind(kind="Knuth-TAOCP-2002");
RNGkind();

[1] "Knuth-TAOCP-2002" "Inversion"

sum(duplicated(runif(1e7)));

[1] 46373


RNGkind(kind="Marsaglia-Multicarry");
RNGkind();

[1] "Marsaglia-Multicarry" "Inversion"

sum(duplicated(runif(1e7)));

[1] 11671

RNGkind(kind="Super-Duper");
RNGkind();

[1] "Super-Duper" "Inversion"

sum(duplicated(runif(1e7)));

[1] 11704

RNGkind(kind="Mersenne-Twister");
RNGkind();

[1] "Mersenne-Twister" "Inversion"

sum(duplicated(runif(1e7)));

[1] 11508

RNGkind(kind="Wichmann-Hill");
RNGkind();

[1] "Wichmann-Hill" "Inversion"

sum(duplicated(runif(1e7)));

[1] 0


gc()

           used (Mb) gc trigger  (Mb)  max used   (Mb)
Ncells  199157  5.4     646480  17.3  10149018  271.1
Vcells 4497151 34.4  108714629 829.5 169585390 1293.9


sum(duplicated(runif(5e7)))

[1] 0



sessionInfo()

R version 2.7.1 (2008-06-23)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_2.7.1

==============End of Testing===================

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Brian D. Ripley,                  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to