https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84067

--- Comment #3 from ktkachov at gcc dot gnu.org ---
(In reply to Richard Biener from comment #2)
> So any hint on whether the code after r257077 is better or worse than before?

Looks worse unfortunately:
For aarch64 at -O2 it generates:
foo:
        mov     w3, 44
        mov     w2, 40
        mov     w5, 1
        mov     w4, 2
        smull   x3, w1, w3
        smull   x2, w1, w2
        str     w5, [x0, x3]
        add     x2, x2, 400
        add     x1, x2, x1, sxtw 2
        str     w4, [x0, x1]
        ret

whereas with r257077 it generates the shorter:
foo:
        mov     w3, 40
        sxtw    x2, w1
        mov     w4, 1
        smaddl  x0, w1, w3, x0
        mov     w3, 2
        add     x1, x0, x2, lsl 2
        str     w4, [x0, x2, lsl 2]
        str     w3, [x1, 400]
        ret

Reply via email to