https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101200
--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> --- (In reply to Andrew Pinski from comment #3) > For aarch64 we get: > adrp x1, .LANCHOR0 > add x0, x1, :lo12:.LANCHOR0 > add x0, x0, 8 > ldrb w1, [x1, #:lo12:.LANCHOR0] > and x2, x1, 15 > ubfx x1, x1, 4, 4 > ldr w2, [x0, x2, lsl 2] > str w2, [x0, x1, lsl 2] > ret > > Note the shift and and is combined into one instruction (ubfx) but really > only a shift instruction is needed. > Here we have: > Trying 21 -> 22: > 21: r112:SI=r92:SI 0>>0x4 > REG_DEAD r92:SI > 22: r113:DI=sign_extend(r112:SI) > REG_DEAD r112:SI > Successfully matched this instruction: > (set (reg:DI 113) > (zero_extract:DI (subreg:DI (reg:SI 92 [ d.0_1 ]) 0) > (const_int 4 [0x4]) > (const_int 4 [0x4]))) > > The multiple modes issue is part of the problem. If I was redesigning the > backends, I would only allow DI mode (and SI mode for i386) and always have > the zero extends on loads. Note the aarch64 issue has been solved (maybe by accident). forwprop props the sign_extend into the load early on. ``` propagating insn 22 into insn 23, replacing: (set (mem:SI (plus:DI (mult:DI (reg:DI 121 [ _3 ]) (const_int 4 [0x4])) (reg/f:DI 111)) [1 c.b[_3]+0 S4 A32]) (reg:SI 103 [ _4 ])) successfully matched this instruction to *movsi_aarch64: (set (mem:SI (plus:DI (mult:DI (sign_extend:DI (reg:SI 120 [ _3 ])) (const_int 4 [0x4])) (reg/f:DI 111)) [1 c.b[_3]+0 S4 A32]) (reg:SI 103 [ _4 ])) rescanning insn with uid = 23. updating insn 23 in-place ``` And then combine does not seen the sign_extend at all. But the sign_extend here is still an issue since it is not needed either. we now get: ``` add x0, x0, 16 ldrb w1, [x1, #:lo12:.LANCHOR0] and w2, w1, 15 lsr w1, w1, 4 ldr w2, [x0, w2, sxtw 2] str w2, [x0, w1, sxtw 2] ``` The sxtw is not needed, it should just be lsl for both cases ...