Tim Peters <t...@python.org> added the comment:

I'm not clear on that the alias method is valuable here. Because of the 
preprocessing expense, it cries out for a class instead: an object that can 
retain the preprocessed tables, be saved to disk (pickled), restored later, and 
used repeatedly to make O(1) selections. I have such a class, which uses exact 
integer arithmetic (e.g., during preprocessing, there's at least one "too 
small" bin if and only if there's at least one "too large" bin), and is as 
unbiased as choice() and randrange(). Reasoning about float vagaries is much 
harder, and I see no error analysis here.

Most troubling: that due to rounding, reducing a "too large" bin results in a 
weight that spuriously compares <= bin_size. Then a "too small" bin with a 
genuinely tiny weight can be left with nothing to pair with, and so ends up 
aliasing itself, giving it a massively too-high probability of being picked.

So if you're determined to stick to "a function", I'd stick to the simple 
inverse CDF sampling method.  Unless the number of choices is "very" large, the 
log factor doesn't much matter.

That said, the alias numerics here could be improved some by working with the 
weights directly, "normalizing" them only at the end.  Instead of comparing 
weights to bin_size, compare them instead to the mean:

mean = total / n

(note: the next step to getting rid of floats entirely is to pre-multiply the 
weights by lcm(total, n) // total  - then their mean is exactly the integer 
lcm(total, n) // n).

The mean (instead of bin_size) suffers rounding errors then, but the original 
weights don't during the "hard part" of preprocessing.  At the end,

fn = n + 0.0
U = [weights[i] / total + i / fn for i in range(n)]

instead.  The "i / fn" also suffers just one rounding error as opposed to the 
two in the current "bin_width * i".  That can make a difference for n as small 
as 5:

>>> 3/5
0.6
>>> (1/5)*3
0.6000000000000001

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue41131>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to