Mark Dickinson <[EMAIL PROTECTED]> added the comment: Here's a version of the 15-bit to 30-bit patch that adds in a souped-up version of Mario Pernici's faster multiplication.
I did some testing of 100x100 digit and 1000x1000 digit multiplies. On 32-bit x86: 100 x 100 digits : around 2.5 times faster 1000 x 1000 digits : around 3 times faster. On x86_64, I'm getting more spectacular results: 100 x 100 digits : around 5 times faster 1000 x 1000 digits: around 7 times faster! The idea of the faster multiplication is quite simple: with 30-bit digits, one can fit a sum of 16 30-bit by 30-bit products in a uint64_t. This means that the inner loop for the basecase grade-school multiplication contains one fewer addition and no mask and shift. [Victor, please don't delete the old longdigit4.patch!] Added file: http://bugs.python.org/file11986/30bit_longdigit6.patch _______________________________________ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue4258> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com