https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982
Bug ID: 113982
Summary: Poor codegen for 64-bit add with carry widening
functions
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: janschultke at googlemail dot com
Target Milestone: ---
I was trying to get optimal codegen for a 64-bit addition with a carry, but
it's tough to do with GCC:
> struct add_result {
> unsigned long long sum;
> bool carry;
> };
>
> add_result add_wide_1(unsigned long long x, unsigned long long y) {
> auto r = (unsigned __int128) x + y;
> return add_result{static_cast<unsigned long long>(r), bool(r >> 64)};
> }
>
> add_result add_wide_2(unsigned long long x, unsigned long long y) {
> unsigned long long r;
> bool carry = __builtin_add_overflow(x, y, &r);
> return add_result{r, carry};
> }
## Expected output (clang -march=x86-64-v4 -O3)
add_wide_1(unsigned long long, unsigned long long):
mov rax, rdi
add rax, rsi
setb dl
ret
add_wide_2(unsigned long long, unsigned long long):
mov rax, rdi
add rax, rsi
setb dl
ret
## Actual output (GCC -march=x86-64-v4 -O3) (https://godbolt.org/z/qGc9WeEvK)
add_wide_1(unsigned long long, unsigned long long):
mov rcx, rdi
lea rax, [rdi+rsi]
xor edx, edx
xor edi, edi
add rsi, rcx
adc rdi, 0
mov dl, dil
and dl, 1
ret
add_wide_2(unsigned long long, unsigned long long):
add rdi, rsi
mov edx, 0
mov rax, rdi
setc dl
ret
The output for the 128-bit version looks pretty bad.
It looks like GCC isn't aware that we only access the carry bit, so it doesn't
need to do full 128-bit arithmetic so to speak.
The add_wide_2 output also isn't optimal. Why would it output "mov edx, 0"
instead of "xor edx, edx"?