[issue45976] Random._randbelow() loses time by caching an attribute lookup
New submission from Andrew Lin : This PR obtains a performance improvement in the random module by removing a cached attribute lookup in Random._randbelow_with_getrandbits() that costs time on average. In the best cases (on my machine) I get 10% improvement for randrange(), 7% for shuffle(), and 11% for sample(), with no change in the worst cases. To elaborate... If n is slightly less than a power of 2, _randbelow_with_getrandbits(n) almost always makes just one call to getrandbits(), so caching the attribute lookup never pays for itself. Even in the worst case, when n is exactly a power of 2, the expected number of calls to getrandbits() is still only 2, and on my machine caching just breaks even. (A friend ran it on ARM and got even more favorable results.) I included a similar change to _randbelow_without_getrandbits() on similar grounds, although I didn't benchmark it. (Brand new here; let me know if I've left anything out!) -- components: Library (Lib) messages: 407625 nosy: onethreeseven priority: normal severity: normal status: open title: Random._randbelow() loses time by caching an attribute lookup type: performance versions: Python 3.11 ___ Python tracker <https://bugs.python.org/issue45976> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45976] Random._randbelow() loses time by caching an attribute lookup
Andrew Lin added the comment: Timings are unaffected by updating to non-walrus, so I'll do so in the PR. Using _randbelow_without_getrandbits() I get 5% improvement to sample([None] * 2047, 50); 6% to shuffle([None] * 2000); and approximately 6% to randrange() for any number significantly less than 2**32. It's not surprising that the absolute speedup is smaller, because _randbelow_without_getrandbits() does a lot more work that is untouched by the change. On the other hand, the speedup is more consistent, since _randbelow_without_getrandbits(n) almost never calls random() more than once for any n significantly less than 2**32. I will clean up my benchmark script and post it. Thanks for the feedback! -- ___ Python tracker <https://bugs.python.org/issue45976> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45976] Random._randbelow() loses time by caching an attribute lookup
Andrew Lin added the comment: I finally cleaned up my benchmark script, which feels like a big hack. I should really learn to use pyperf or something. Inspired by your comment about 3.8 on the Mac I tried my Ubuntu installation's stock 3.8. I get a small degradation (~0.5%) for powers of two and 4-5% speedups just below powers of two and for shuffle. (This is under the slightly strange situation where _randbelow is using the main branch code but everything else in the class is imported from 3.8.) _randbelow_without_getrandbits() is consistently (if only slightly) faster, as you would expect for cases that are basically guaranteed to call self.random() only once. I am impressed (if not necessarily surprised) by the variation between builds, especially with my friend's ARM builds which reached nearly 20% for one case. So I completely understand if it's less uncertain just to pass. Thanks for your attention, in any event! -- ___ Python tracker <https://bugs.python.org/issue45976> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45976] Random._randbelow() loses time by caching an attribute lookup
Change by Andrew Lin : Added file: https://bugs.python.org/file50473/randbelow_timing.py ___ Python tracker <https://bugs.python.org/issue45976> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45976] Random._randbelow() loses time by caching an attribute lookup
Andrew Lin added the comment: E[calls] reduces down to 2**k / n. I only just realized from working that out that it therefore doesn't quite vary linearly over [2**k, 2**(k+1)), although by what weight one wants to average I couldn't say. Personally if the change adversely impacts performance for any n on any major platform I will retract the PR. I was under the impression based on my own machine that this would not happen, but it is only my one machine. -- ___ Python tracker <https://bugs.python.org/issue45976> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com