On Fri, 5 Feb 2016 09:43:28 -0700, Phil Steitz wrote:
On 2/5/16 8:29 AM, Gilles wrote:
Hi.

What can I expect from drawing N "int" values from a (supposedly)
uniformly random number generator?
In fact, I'm wondering about a valid test condition that all the
samples are different.
Alternatively, how many times one should have to draw a set of N
samples so that at least one set contains samples that are all
different?

If there are p values (i.e, the sampling is from  S = {0, ...,
p-1}), then the probability that n values in a sequence are all
distinct is p* = (p!/(p - n)!) / p^n.

Hmm...  Since I would like to test "nextInt()"; it looks like
I'll have some trouble with p == 2^32 ...
Not to speak of "nextLong()".

A way out with "Gamma" (?), perhaps?

[Context: a feature (syntactic sugar actually) of the new code is
to provide a "Factory" for creating default seeds (for users who
have no clue, like me).  The test is supposed to show that it is
*highly* unlikely that calling the factory method N times will
result in obtaining a duplicate seed.]

Assuming independence, which the null hypothesis of homogeneity
supports, you can view successive n-sequence draws as Bernoulli
trials, so the probability you are asking for is the expected value
of a geometric distribution with probability of success = p*.  You
can use the quantiles of the Geometric distribution to select a
critical value to use in a test based on observing this.  I am not
sure how powerful or stable such a test would be, though.

See [1] for some standard PRNG tests.  There are a lot more.  NIST
has a test suite for crypto-grade PRNGs (which we do not claim to
have) [2]

Testing PRNGs is tricky business. I personally put a lot of stock in
the reference tests, since they validate (unless I am wrong about
their source) that we have a faithful implementation of the standard
algorithm, which itself has been validated by experts in PRNG
development and evaluation.

There is one (1) "reference" test for each implementation. That test
shows that for one (1) given seed, the CM implementation reproduces
the same "int" values as the reference (assuming that the table was
produced by another code, of course).  I totally agree that this is
the most important test, and perhaps the only useful one: as I
pointed out elsewhere, the many other tests seem to me to only
validate the conversion from "int" to the various other primitive
types (double, byte, etc.) and are IIUC utterly redundant among the
various RNG implementations.  It doesn't hurt to have them unless
it makes people overly confident that the RNGs are thoroughly tested
simply because of the existence of hundreds of unit tests.

In the code in preparation, I have seriously lowered that number
(although much redundancy is still there, and is still necessary
to minimally trust implementations for which we wouldn't have a
reference sequence).

Thanks, Phil, for this answer, which I much prefer to those you
gave to my other post, as you'll figure out from my upcoming reply...
:-}

Gilles


Phil

[1] http://cacr.uwaterloo.ca/hac/about/chap5.pdf
[2]

http://csrc.nist.gov/groups/ST/toolkit/rng/documentation_software.html



Thanks,
Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to