[issue1087418] long int bitwise ops speedup (patch included)

Mark Dickinson Sat, 24 Oct 2009 02:10:23 -0700

Mark Dickinson <dicki...@gmail.com> added the comment:

[Greg]
> It would seem to me that a more suitable implementation would be to
> store numbers as 32*k-bit 2's complement integers; I've used this in 
> C++ libraries. Compared to the rad-15 sign/magnitude approach, it may
> seem trickier to do carries but in practice it's not that big a deal.


Do you know of any publicly available code that takes this approach, 
while remaining simple and portable?

For 32-bit limbs (on a 32-bit platform, say, with no C99 support and no 
uint64_t available), how *do* you deal with carries?  You can write 
portable C89 that does the correct thing, but you then have to cross 
your fingers and hope that the compiler can turn it into sensible 
assembler, and it often can't (I'll post an example of this that I ran 
into just yesterday in a second).  And even if your compiler can 
optimize this effectively, what about all the other compilers out there?  
The alternative is to use inline assembler for selected platforms;  at 
that point, you lose simplicity.

The sign-magnitude versus two's complement is an orthogonal issue, I 
think.  I'm not convinced of the value of two's complement.  Certainly 
two's complement would be more convenient for the bitwise operations 
under discussion, and probably also works well for addition and 
subtraction.  But it's less convenient for negation, multiplication, 
division, power, conversion to and from human-readable strings, etc.  
It's worth noting that GMP uses sign-magnitude, so I can't believe there 
could be much performance gain in moving to two's complement.  
(Actually, I'm not sure I've seen any arbitrary-precision arithmetic 
package that uses two's complement.  Has anyone here?)

Here's the example promised earlier:  yesterday I wanted to add two 128-
bit unsigned integers a and b, each represented as a pair of type 
uint64_t integers, on a 64-bit machine.  In portable C99, the code looks 
something like:

#include <stdint.h>

/* *sumhigh:*sumlow = ahigh:alow + bhigh:blow */

void
add_128(uint64_t *sumhigh, uint64_t *sumlow,
           uint64_t ahigh, uint64_t alow,
           uint64_t bhigh, uint64_t blow)
{
  alow += blow;
  ahigh += bhigh + (alow < blow);
  *sumlow = alow;
  *sumhigh = ahigh;
}

Ideally, the compiler would manage to optimize this to a simple 'addq, 
adcq' pair of instructions.  But gcc-4.4 -O3, on OS X 10.6/x86_64 gives 
me:

_add_128:
LFB0:
        leaq    (%r9,%rcx), %rcx
        xorl    %eax, %eax
        leaq    (%r8,%rdx), %rdx
        pushq   %rbp
LCFI0:
        cmpq    %r9, %rcx
        movq    %rcx, (%rsi)
        setb    %al
        movq    %rsp, %rbp
LCFI1:
        addq    %rax, %rdx
        movq    %rdx, (%rdi)
        leave
        ret

(Here it looks like alow and blow are in r9 and rcx, ahigh and bhigh are 
in r8 and rdx, and rsi and rdi are sumlow and sumhigh.)

How do you write the C code in such a way that gcc produces the right 
result?  I eventually gave up and just put the assembler in directly.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue1087418>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1087418] long int bitwise ops speedup (patch included)

Reply via email to