The assembly GCC generates are just wrong, very wrong here. People have reported the similar bugs since 2015 over and over again but GCC still could not get it done right. Even MSVC is doing the right thing.
If you are not going to fix them, okay, I will start to write assembly manually. I have tested all cases, including _addcarry_u64 and _subborrow_u64. #include <stdint.h> #include <x86intrin.h> void add256(uint64_t a[4], uint64_t b[4]){ uint8_t carry = 0; for (int i = 0; i < 4; ++i) carry = _addcarry_u64(carry, a[i], b[i], (unsigned long long*)(a+i)); } Assembly GCC generates add256: movq (%rsi), %rax addq (%rdi), %rax setc %dl movq %rax, (%rdi) movq 8(%rdi), %rax addb $-1, %dl adcq 8(%rsi), %rax setc %dl movq %rax, 8(%rdi) movq 16(%rdi), %rax addb $-1, %dl adcq 16(%rsi), %rax setc %dl movq %rax, 16(%rdi) movq 24(%rsi), %rax addb $-1, %dl adcq %rax, 24(%rdi) ret setc?? LOLOLOL. This is a joke for me. Clang generates: add256: movq (%rsi), %rax addq %rax, (%rdi) movq 8(%rsi), %rax adcq %rax, 8(%rdi) movq 16(%rsi), %rax adcq %rax, 16(%rdi) movq 24(%rsi), %rax adcq %rax, 24(%rdi) retq Sent from Mail<> for Windows 10