On Fri, 8 Jun 2018 02:15:02 +0000 (UTC), Steven D'Aprano wrote: > On Thu, 07 Jun 2018 20:43:10 +0000, Peter Pearson wrote: [snip] >> >> But gosh, if there are only 2**32 different "random" floats, then you'd >> have about a 50% chance of finding a collision among any set of 2**16 >> samples. Is that really tolerable? > > Why wouldn't it be? It would be shocking if a sufficiently large sequence > of numbers contained no collisions at all: that would imply the values > were very much NON random.
[snip] > . . . I understand that Python's Mersenne Twister implementation > is based on 64-bit ints. OK, I'll relax, particularly since Michael Lamparski's experiment strongly indicates that random floats are drawn from a population much larger than 2**16. You're completely correct, of course, in noting that an absence of collisions would condemn the random-number generator just as badly as an excess. What bothered me was my feeling that a "reasonable observer" would expect the random-float population to be much larger than 2**32, and the probably-collision-free sample size to be accordingly much larger than 2**16, which is, after all, small enough to appear in many applications. Exactly what the "reasonable observer" would expect that population to be, I don't know. To a mathematician, there's zero chance of collision in any finite sample of real numbers, or even just rational numbers; but I don't think anybody would expect that from a computer. When I picture the diligent software engineer asking himself, "Wait, how large can I make this sample before I'll start seeing collisions," I imagine his first guess is going to be the size of a float's mantissa. What applications would have to worry about colliding floats? I don't know. I'm coming from cryptology, where worrying about such things becomes a reflex. -- To email me, substitute nowhere->runbox, invalid->com. -- https://mail.python.org/mailman/listinfo/python-list