On Thu, 07 Jun 2018 20:43:10 +0000, Peter Pearson wrote: > On Thu, 07 Jun 2018 19:02:42 +1200, Gregory Ewing wrote: >> Steven D'Aprano wrote: >>> But if it were (let's say) 1 ULP greater or less than one half, would >>> we even know? >> >> In practice it's probably somewhat bigger than 1 ULP. A typical PRNG >> will first generate a 32-bit integer and then map it to a float, giving >> a resolution coarser than the 52 bits of an IEEE double. >> >> But even then, the probability of getting exactly 0.5 is only 1/2^32, >> which you're not likely to notice. > > But gosh, if there are only 2**32 different "random" floats, then you'd > have about a 50% chance of finding a collision among any set of 2**16 > samples. Is that really tolerable?
Why wouldn't it be? It would be shocking if a sufficiently large sequence of numbers contained no collisions at all: that would imply the values were very much NON random. If you truly were limited to 2**32 different values (we're not), then it would be exactly right and proper to expect a collision in 2**16 samples. Actually, a lot less than that: more like 78000. https://en.wikipedia.org/wiki/Birthday_problem (For 2**60 values, we'd need about 1.2 billion samples.) Greg's statement about "a typical PRNG" may or may not be true, depending on what counts as "typical". The standard C language PRNG is notoriously awful, with a tiny period and lots and lots of correlations between values. Its barely random-ish. But like many other languages, Python uses a more modern random number generator, the Mersenne Twister, which passes a battery of statistical tests for randomness (including DieHard) and has a very long period of 2**19937 - 1. I understand that Python's Mersenne Twister implementation is based on 64-bit ints. https://en.wikipedia.org/wiki/Mersenne_Twister -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list