Mark Dickinson <dicki...@gmail.com> added the comment: Some quick comments:
(1) Agree about the extra bound checks: let's treat those as a separate issue and just concentrate on performance here. (2) log2 isn't portable: it's not part of C89 (though it is in C99) and I don't think it exists on Windows. Which is a shame, since it probably *does* reliably work well for boundary cases on most platforms. I'm embarrassed to read that snippet that Alexander found, but it's true that an alternative like log(n)/log(2) has problems in boundary cases, thanks to the usual floating-point issues. There's a bit-counting method in the int.bit_length() implementation (in Objects/longobject.c) that could possibly be re-used here. Alternatively, if a simple for-loop to count bits doesn't have any noticeable impact on speed, then we could use that. (3) Is the 'count set bits' code a bottleneck? If not, then it looks fine to me as it is. Doesn't it just get called once per factorial computation? (4) I wonder whether the recursion in factorial_part_product slows things down; it might be interesting to compare with an iterative version (but still one that clumps together small pieces rather than doing lots of small*big multiplies). It seems possible that the cost of the recursive calls is insignificant compared to the cost of the various Py* calls, though. (5) Was there a reason for using long rather than unsigned long for the computations? Using unsigned long would give you an easily computable multiply_cutoff, and save the need for that extra static variable (it could be a macro instead). ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue8692> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com