https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84067
--- Comment #4 from ktkachov at gcc dot gnu.org --- (In reply to ktkachov from comment #3) > (In reply to Richard Biener from comment #2) > > So any hint on whether the code after r257077 is better or worse than > > before? > > Looks worse unfortunately: > For aarch64 at -O2 it generates: > foo: > mov w3, 44 > mov w2, 40 > mov w5, 1 > mov w4, 2 > smull x3, w1, w3 > smull x2, w1, w2 > str w5, [x0, x3] > add x2, x2, 400 > add x1, x2, x1, sxtw 2 > str w4, [x0, x1] > ret > > whereas with r257077 it generates the shorter: Sorry, I meant to write "with r257077 reverted..." > foo: > mov w3, 40 > sxtw x2, w1 > mov w4, 1 > smaddl x0, w1, w3, x0 > mov w3, 2 > add x1, x0, x2, lsl 2 > str w4, [x0, x2, lsl 2] > str w3, [x1, 400] > ret