https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115734
Bug ID: 115734 Summary: Missed optimization: carry chains with __builtin_addc missed except on x86 Product: gcc Version: 14.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: lloyd at randombit dot net Target Milestone: --- Consider the following example (extracted from cryptographic integer code) -------- #define USE_BUILTIN_ADDC 1 #include <stdint.h> #include <stdlib.h> inline uint32_t add32(uint32_t x, uint32_t y, uint32_t* carry) { #if USE_BUILTIN_ADDC return __builtin_addc(x, y, *carry & 1, carry); #else const uint32_t c = *carry & 1; uint32_t z = x + y; uint32_t c1 = (z < x); z += c; *carry = c1 | (z < c); return z; #endif } uint32_t add32_4(uint32_t a[4], uint32_t b[4], uint32_t c) { a[0] = add32(a[0], b[0], &c); a[1] = add32(a[1], b[1], &c); a[2] = add32(a[2], b[2], &c); a[3] = add32(a[3], b[3], &c); return c; } template<size_t N> uint32_t add32_N(uint32_t a[N], uint32_t b[N], uint32_t c) { for(size_t i = 0; i != N; ++i) { a[i] = add32(a[i], b[i], &c); } return c; } uint32_t add32_8(uint32_t* a, uint32_t* b, uint32_t c) { return add32_N<8>(a, b, c); } -------- GCC 14 on x86 *will* convert both `add32_4` and `add32_8` to a carry chain sequence: mov edx, DWORD PTR [rsi] adc DWORD PTR [rdi], edx mov edx, DWORD PTR [rsi+4] adc DWORD PTR [rdi+4], edx ... (The template case is only good with `-O3` or if `#pragma GCC unroll N` is used - otherwise the code is terrible. But at least it can be made to work) Every other architecture I've tried (especially interested in armv7/aarch64 here), a carry chain is not generated for either function. A carry chain is never generated if __builtin_addc is not not used. Clang generates a carry chain on x86, armv7, and aarch64, with *or without* the use of __builtin_addc. Godbolt link: https://godbolt.org/z/adMT55a9r