Mark Dickinson <dicki...@gmail.com> added the comment:

Some quick comments:

(1) Agree about the extra bound checks:  let's treat those as a separate issue 
and just concentrate on performance here.

(2) log2 isn't portable:  it's not part of C89 (though it is in C99) and I 
don't think it exists on Windows.  Which is a shame, since it probably *does* 
reliably work well for boundary cases on most platforms.  I'm embarrassed to 
read that snippet that Alexander found, but it's true that an alternative like 
log(n)/log(2) has problems in boundary cases, thanks to the usual 
floating-point issues.  There's a bit-counting method in the int.bit_length() 
implementation (in Objects/longobject.c) that could possibly be re-used here.  
Alternatively, if a simple for-loop to count bits doesn't have any noticeable 
impact on speed, then we could use that.

(3) Is the 'count set bits' code a bottleneck?  If not, then it looks fine to 
me as it is.  Doesn't it just get called once per factorial computation?

(4) I wonder whether the recursion in factorial_part_product slows things down; 
 it might be interesting to compare with an iterative version (but still one 
that clumps together small pieces rather than doing lots of small*big 
multiplies).  It seems possible that the cost of the recursive calls is 
insignificant compared to the cost of the various Py* calls, though.

(5) Was there a reason for using long rather than unsigned long for the 
computations?  Using unsigned long would give you an easily computable 
multiply_cutoff, and save the need for that extra static variable (it could be 
a macro instead).

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue8692>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to