[issue35094] Improved algorithms for random.sample

2018-10-28 Thread Paul Crowley
New submission from Paul Crowley : random.sample currently uses either a Fisher-Yates shuffle, or rejection sampling, to achieve sampling without replacement. I propose using reservoir sampling or "cardchoose"; these are similar performance or sometimes faster, and don't n

[issue35094] Improved algorithms for random.sample

2018-10-28 Thread Paul Crowley
Change by Paul Crowley : -- keywords: +patch pull_requests: +9510 stage: -> patch review ___ Python tracker <https://bugs.python.org/issue35094> ___ ___ Py

[issue35094] Improved algorithms for random.sample

2018-10-28 Thread Paul Crowley
Paul Crowley added the comment: I would be very grateful for your help finding those dicussions! I've tried this search: https://www.google.com/search?q=python+%22Reservoir+sampling%22+rhettinger and found this discussion: https://mail.python.org/pipermail/python-ideas/2016-April/0

[issue35094] Improved algorithms for random.sample

2018-10-28 Thread Paul Crowley
Paul Crowley added the comment: Thank you for a very comprehensive and helpful answer! Yep, reservoir sampling makes n calls not k calls, and so should only be used when k is a large fraction of n; in my patch it's k/n >= 1/2. Because modern CPRNGs are so fast, I had been assum