https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115734

            Bug ID: 115734
           Summary: Missed optimization: carry chains with __builtin_addc
                    missed except on x86
           Product: gcc
           Version: 14.1.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: lloyd at randombit dot net
  Target Milestone: ---

Consider the following example (extracted from cryptographic integer code)

--------
#define USE_BUILTIN_ADDC 1

#include <stdint.h>
#include <stdlib.h>

inline uint32_t add32(uint32_t x, uint32_t y, uint32_t* carry) {
#if USE_BUILTIN_ADDC
return __builtin_addc(x, y, *carry & 1, carry);
#else

   const uint32_t c = *carry & 1;
   uint32_t z = x + y;
   uint32_t c1 = (z < x);
   z += c;
   *carry = c1 | (z < c);
   return z;
#endif
}

uint32_t add32_4(uint32_t a[4], uint32_t b[4], uint32_t c) {
    a[0] = add32(a[0], b[0], &c);
    a[1] = add32(a[1], b[1], &c);
    a[2] = add32(a[2], b[2], &c);
    a[3] = add32(a[3], b[3], &c);
    return c;
}

template<size_t N>
uint32_t add32_N(uint32_t a[N], uint32_t b[N], uint32_t c) {
   for(size_t i = 0; i != N; ++i) {
    a[i] = add32(a[i], b[i], &c);
   }
    return c;
}

uint32_t add32_8(uint32_t* a, uint32_t* b, uint32_t c) {
   return add32_N<8>(a, b, c); 
}
--------

GCC 14 on x86 *will* convert both `add32_4` and `add32_8` to a carry chain
sequence:

        mov     edx, DWORD PTR [rsi]
        adc     DWORD PTR [rdi], edx
        mov     edx, DWORD PTR [rsi+4]
        adc     DWORD PTR [rdi+4], edx
        ...

(The template case is only good with `-O3` or if `#pragma GCC unroll N` is used
- otherwise the code is terrible. But at least it can be made to work)

Every other architecture I've tried (especially interested in armv7/aarch64
here), a carry chain is not generated for either function.

A carry chain is never generated if __builtin_addc is not not used.

Clang generates a carry chain on x86, armv7, and aarch64, with *or without* the
use of __builtin_addc.

Godbolt link: https://godbolt.org/z/adMT55a9r

Reply via email to