Gregory Smith <gregsm...@users.sourceforge.net> added the comment: Antoine, "most uses of it are for small ints (< 2**32 or 2**64), ", that's my point exactly, the current long type is quite slow for those cases because of sign-magnitude implementation.
I don't see a problem with sign-magnitude for old long ints, or for GMP, since in those cases the assumption is that you have a fair percentage of large #s, this is not valid in Py3.0 where most are small. Most operations are add and subtract, and in each such operation you need to look at both signs, and decide if you want to really add or subtract, and if you are subtracting, you then have to do a magnitude test to see which way - all of that before you do any actual computation. That's a lot of overhead for e.g. 'i -= 1'. Especially given that the sign & length information are rolled together into a single value; you need a fair number of tests & conditional branches. With a 2's complement implementation you would test for the most common case where both values are 1 digit (which would include 0 value) and in that case do an add (or subtract) followed by a check for overflow, and that's the whole thing. I'm guessing this is 10% - 25% as much time as in the current implementation - this is one of those cases where the time for tests and setup dominate over the actual computation. Mark, what you've written looks fine to me. It would be a bit faster with an add-with-carry, but there's no conditional branches in the generated code, so it's not bad, it's still faster than it would be if you were extracting carries from the msb of each result and masking them afterwards. Bear in mind that some RISC processors don't have an add-with-carry (lacking condition codes altogether) so the method of comparing the sum to the inputs is the only method available. Given that in Python the operation is in a loop, I think it's unreasonable to to expect a compiler to identify an add-with-carry application, but at the same time it's not necessary. Now that the long int is the *only* int type, multi-digit ops will be relatively uncommon, and you want to focus on reducing the overhead before and after the operation. (And, for speed freaks who want to use long ints to implement large bits arrays and want fast bit ops - my original motivation for this issue - it would be possible to use SSE2 vector instructions on specific processors. That can also be done with the 15-bit implementation, but it works much better with 32). The 'algorithmic C' package (http://www.mentor.com/products/esl/high_level_synthesis/ac_datatypes) is a C++ template class for arbitrary-length signed/unsigned integers. It's not suitable for use in Python because of license issues and because its integers are controlled by templates and are fixed-size at compile time - but it uses Kx32-bit 2's complement format, and implements adds, shifts, multiplies, logic ops, and limited division ops. It works very well, with extremely low overhead on small values - often the code is exactly the same as if you used int type - that would not be possible with a sign/magnitude approach. As for multiplies and divides - it's certainly possible to proceed through a multiply entirely in 2's complement, so no overhead is needed for abs value, or changing sign at the end. For division, it's possible if the denominator is > 0, and it may be possible if <0 (but probably not worth the hassle). I would be happy to do a design doc for this and write some of the inner loops, but if (a) it's already being done or (b) there's no chance of it being deployed then it would be a waste of time... if there was definite interest in it (and reasonable schedule expectations) I could write a lot more of it. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue1087418> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com