New submission from Raymond Hettinger <raymond.hettin...@gmail.com>:

The implementation of math.comb() is nice in that it limits the size of 
intermediate values as much as possible and it avoids unnecessary steps when 
min(k,n-k) is small with respect to k.

There are some ways it can be made better or faster:

1) For small values of n, there is no need to use PyLong objects at every step. 
 Plain C integers would suffice and we would no longer need to make repeated 
allocations for all the intermediate values.  For 32-bit builds, this could be 
done for n<=30.  For 64-bit builds, it applies to n<=62.  Adding this fast path 
would likely give a 10x to 50x speed-up.

2) For slightly larger values of n, the computation could be sped-up by 
precomputing one or more rows of binomial coefficients (perhaps for n=75, 
n=150, and n=225).  The calculation could then start from that row instead of 
from higher rows on Pascal's triangle.   

For example comb(100, 55) is currently computed as:
    comb(55,1) * 56 // 2 * 57 // 3 ... 100 // 45    <== 45 steps

Instead, it could be computed as:
    comb(75,20) * 76 // 21 * 77 // 22 ... 100 / 45  <== 25 steps
           ^-- found by table lookup 

This gives a nice speed-up in exchange for a little memory in an array of 
constants (for n=75, we would need an array of length 75//2 after exploiting 
symmetry).  Almost all cases would should show some benefit and in favorable 
cases like comb(76, 20) the speed-up would be nearly 75x.

3) When k is close to n/2, the current algorithm is slower than just computing 
(n!) / (k! * (n-k)!). However, the factorial method comes at the cost of more 
memory usage for large n.  The factorial method consumes memory proportional to 
n*log2(n) while the current early-cancellation method uses memory proportional 
to n+log2(n).  Above some threshold for memory pain, the current method should 
always be preferred.  I'm not sure the factorial method should be used at all, 
but it is embarrassing that factorial calls sometimes beat the current C 
implementation:

    $ python3.8 -m timeit -r 11 -s 'from math import comb, factorial as fact' 
-s 'n=100_000' -s 'k = n//2' 'comb(n, k)'
    1 loop, best of 11: 1.52 sec per loop
    $ python3.8 -m timeit -r 11 -s 'from math import comb, factorial as fact' 
-s 'n=100_000' -s 'k = n//2' 'fact(n) // (fact(k) * fact(n-k))'
    1 loop, best of 11: 518 msec per loop

4) For values such as n=1_000_000 and k=500_000, the running time is very long 
and the routine doesn't respond to SIGINT.  We could add checks for keyboard 
interrupts for large n.  Also consider releasing the GIL.

5) The inner-loop current does a pure python subtraction than could in many 
cases be done with plain C integers.  When n is smaller than maxsize, we could 
have a code path that replaces "PyNumber_Subtract(factor, _PyLong_One)" with 
something like "PyLong_FromUnsignedLongLong((unsigned long long)n - i)".

----------
components: Library (Lib)
messages: 345711
nosy: mark.dickinson, pablogsal, rhettinger, serhiy.storchaka, tim.peters
priority: normal
severity: normal
status: open
title: Possible optimizations for math.comb()
type: performance
versions: Python 3.8, Python 3.9

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue37295>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to