Mark Dickinson <dicki...@gmail.com> added the comment:

This is similar to #9025 (for randrange), and in principle the random.choices 
case without explicit weights could be fixed similarly. But I'm not sure it's 
worth it, for a couple of reasons:

- The problem only really affects "virtual" population sequences like 
``range``, where the actual population isn't realised in memory. For 
populations in list form, in practice the population size will be limited to a 
few billion, and I think you'd be hard pushed to demonstrate statistically 
detectable bias at that size. (That is, my claim is that it would be 
essentially impossible to distinguish the existing "random.choices" 
implementation from a "fixed" unbiased version based on inputs and outputs 
alone.)

- The speed of random.choices is a particular selling point - we don't have 
anything else in the random module that lets you generate millions of variates 
at once. I'd expect replacing `floor(random() * n)` with `self._randbelow(n)` 
to cause a significant speed regression.

- For explicit weights, there isn't a practical route to handling those that 
doesn't involve use of floating-point in general, so for those we have to 
accept some tiny bias anyway. (Yes, we could technically do something unbiased 
for the case of integer-only weights, but that seems like significant work for 
a special case that I'd expect just doesn't matter that much in practice.) And 
again in this case, the weights list will usually be a "real" list, so we're 
again limited to populations of a few billion and it'll be hard to demonstrate 
statistically detectable bias.

@Dennis: for interests sake, do you have bandwidth to do some timings of the 
current "[population[floor(random() * n)] for i in _repeat(None, k)]" versus 
something like "[population[_randbelow(n)] for i in _repeat(None, k)]" (after 
putting "_randbelow = self._randbelow" to save an attribute lookup on each 
iteration)?

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue44080>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to