https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114040
--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> --- Ah, there is one difference, for the unsigned foo (unsigned _BitInt(8671) x, unsigned y, unsigned _BitInt(512) z) { unsigned _BitInt (8671) r = x * __builtin_sub_overflow_p (y * z, 0, (unsigned _BitInt(255)) 0); return r; } int main () { if (foo (1, 1, 0xfffa46471e7c2dd60000000000000000wb)) __builtin_abort (); } case bitint lowering chooses the same source and destination array, as source it is ulong[8] - 8 limbs of unsigned _BitInt(512), as destination ulong[8] as well, 4 limbs for the __real__ part and 4 limbs for the overflow flag (0/1). But I don't see anything wrong with that in the bitintlower1 dump, all the bitint.3 loads are done before the stores to the same location.