Tim Peters <t...@python.org> added the comment:
I've been involved - usually very lightly, but sometimes deeply - with PRNGs for over 40 years. I've never seen anyone raise the kinds of concerns you're raising here, and, frankly, they don't make sense to me. But on the chance that I've missed recent fundamental developments, I did some searching. The state of the art for uniform selection from a range appears to be what the Go language recently adopted, shown as Algorithm 5 in this paper (which also surveys what other languages do): https://arxiv.org/pdf/1805.10941.pdf Nobody gives a rip about "correlations" when reusing an internal state. What they're aiming at is a combination of "no bias" and "peak speed". Python's job is harder, because we couldn't care less what the native machine word size is. For example, randrange(10**500) works fine. Approaches based on chopping back multiplication with a "random" C double can't get more than 53 bits (on most machines), the limit of IEEE-754 format double precision. Python dropped that approach not only because it was slightly biased, but also because it didn't scale to unbounded ranges. The Go approach works like this Python code, where NBITS would be hard coded in C to 32 or 64, depending on the machine word size. Its advantage is that it _usually_ needs no arbitrary divisions (implied by "%") at all, just masking and shifting. At worst, it _may_ need one for-real division. But it's built to work with native machine integer precision, and requires full double-width integer multiplication: from random import randrange NBITS = 64 # need NBITS x NBITS full-precision multiply POWER = 1 << NBITS MASK = POWER - 1 def rr(s): # uniform random int in range(s) assert 0 < s <= POWER x = randrange(POWER) # i.e., a random bitstring with NBITS bits m = x * s # full double-width product r = m & MASK if r < s: t = (POWER - s) % s # the expensive line while r < t: x = randrange(POWER) m = x * s r = m & MASK return m >> NBITS In a nutshell, in each [i * POWER, (i+1) * POWER) clopen interval, it rejects the first POWER % s values, leaving exactly floor(POWER/s) multiples of s in the interval. Of course that suffers "correlations" too if you reuse the seed - although not of the exact microscopic kind you're talking about. You haven't responded to questions about why that specific test is thought to be especially important, or to why you believe you can't do an obvious thing instead (use different seeds, or don't reset the seed at all). As to who's making raw assertions here, the situation isn't symmetric ;-) If you want a change, you have to _make a case_ for it. And a good one. Every core Python developer who has knowledge of this code has now told you they personally don't see any case for it. So, sorry, but the status quo is always favored in the absence of compelling reason to change. There were compelling reasons to switch to the current scheme. As you said yourself: """ And I understand the puzzlement about my test file setting the same random seed and then complaining about correlations. """ That was insightful. The difference is we never got over that puzzlement ;-) ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue39867> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com