https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101200
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed| |2021-06-24 Target| |x86_64-linux-gnu --- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> --- For aarch64 we get: adrp x1, .LANCHOR0 add x0, x1, :lo12:.LANCHOR0 add x0, x0, 8 ldrb w1, [x1, #:lo12:.LANCHOR0] and x2, x1, 15 ubfx x1, x1, 4, 4 ldr w2, [x0, x2, lsl 2] str w2, [x0, x1, lsl 2] ret Note the shift and and is combined into one instruction (ubfx) but really only a shift instruction is needed. Here we have: Trying 21 -> 22: 21: r112:SI=r92:SI 0>>0x4 REG_DEAD r92:SI 22: r113:DI=sign_extend(r112:SI) REG_DEAD r112:SI Successfully matched this instruction: (set (reg:DI 113) (zero_extract:DI (subreg:DI (reg:SI 92 [ d.0_1 ]) 0) (const_int 4 [0x4]) (const_int 4 [0x4]))) The multiple modes issue is part of the problem. If I was redesigning the backends, I would only allow DI mode (and SI mode for i386) and always have the zero extends on loads.