https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99548
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|rtl-optimization |target --- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> --- For aarch64 we get: ldp x4, x3, [x1] mov x7, 38 ldp x6, x5, [x2] ldr x8, [x1, 16] add x6, x4, x6 cmp x4, x6 adc x5, x3, x5 cmp x3, x5 ldr x3, [x1, 24] ldp x4, x1, [x2, 16] adc x4, x8, x4 cmp x8, x4 adc x1, x3, x1 cmp x3, x1 csetm x2, cs and x2, x2, x7 add x2, x2, x6 cmp x2, x6 cinc x6, x5, ls cmp x6, x5 cinc x5, x4, ls cmp x5, x4 cinc x3, x1, ls cmp x3, x1 csetm x1, ls and x1, x1, x7 add x1, x1, x2 str x1, [x0] cmp x1, x2 cinc x1, x6, ls str x1, [x0, 8] cmp x1, x6 cinc x1, x5, ls cmp x1, x5 cinc x3, x3, ls stp x1, x3, [x0, 16] ret There is only one missing optimization there really: csetm x1, ls and x1, x1, x7 Should be turned into: csel x1, x7, xzw, ls Which is repeated a few times. It looks like x86_64 has other issues.