On Thu, Oct 6, 2011 at 11:31 PM, Florian Weimer <f...@deneb.enyo.de> wrote:
> * Ulf Magnusson:
>
>> I've been experimenting with different methods for emulating the
>> signed overflow of an 8-bit CPU. The method I've found that seems to
>> generate the most efficient code on both ARM and x86 is
>>
>> bool overflow(unsigned int a, unsigned int b) {
>>     const unsigned int sum = (int8_t)a + (int8_t)b;
>>     return (int8_t)sum != sum;
>> }
>
> There's a GCC extension which is relevant here:
>
> | For conversion to a type of width N, the value is reduced modulo 2^N
> | to be within range of the type; no signal is raised.
>
> <http://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html#Integers-implementation>
>
> Using that, you can replace the final "& 0x80" with a signed
> comparison to zero, which should be give you the best possible code
> (for the generic RISC).  You only need to hunt down a copy of Hacker's
> Delight or find the right bit twiddling by other means. 8-)
>

Are you thinking of something like this?

bool overflow_bit2(unsigned int a, unsigned int b) {
    const unsigned int ashift = a << 24;
    const unsigned int bshift = b << 24;
    const unsigned int sum = a + b;
    return (int)(~(a ^ b) & (a ^ sum)) < 0;
}

That version generates

  80:   180b            adds    r3, r1, r0
  82:   4041            eors    r1, r0
  84:   ea83 0200       eor.w   r2, r3, r0
  88:   ea22 0001       bic.w   r0, r2, r1
  8c:   0fc0            lsrs    r0, r0, #31
  8e:   4770            bx      lr

Whereas the unshifted version generates

  40:   180b            adds    r3, r1, r0
  42:   4041            eors    r1, r0
  44:   ea83 0200       eor.w   r2, r3, r0
  48:   ea22 0001       bic.w   r0, r2, r1
  4c:   f3c0 10c0       ubfx    r0, r0, #7, #1
  50:   4770            bx      lr

So maybe a bit better. (I'm no ARM pro, but the compiler does seem to
take advantage of the fact that it's testing the real sign bit at
least.)

Btw, & 0x80000000 generates the same code.

/Ulf

Reply via email to