https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99830
--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
So more details. The i2 insn is:
(insn 16 15 17 2 (set (zero_extract:DI (subreg:DI (reg/v:TI 103 [ f ]) 0)
(const_int 8 [0x8])
(const_int 16 [0x10]))
(subreg:DI (reg:SI 96 [ _7 ]) 0)) "pr99830.c":7:3 744 {*insv_regdi}
(expr_list:REG_DEAD (reg:SI 96 [ _7 ])
(nil)))
and can_combine_p makes through the expand_field_assignment call i2src
(ior:TI (and:TI (reg/v:TI 103 [ f ])
(const_int -16711681 [0xffffffffff00ffff]))
(ashift:TI (and:TI (clobber:TI (const_int 0 [0]))
(const_int 255 [0xff]))
(const_int 16 [0x10])))
out of this.
i3 is
(insn 20 19 21 2 (set (reg:SI 108 [ f ])
(zero_extend:SI (subreg:QI (reg/v:TI 103 [ f ]) 0))) "pr99830.c":8:9
114 {*zero_extendqisi2_aarch64}
(expr_list:REG_DEAD (reg/v:TI 103 [ f ])
(nil)))
so, I think it is perfectly fine that when i3 only cares about the low 8 bits
of pseudo 103 that it figures out that it is just the low 8 bits of the
original pseudo 103, not ored with anything else, because (unsigned char)
((whatever & 255) << 16) is 0. So, I don't see anything wrong on i2 -> i3
combination turning it into
(insn 20 19 21 2 (set (reg:SI 108 [ f ])
(zero_extend:SI (subreg:QI (reg/v:TI 103 [ f ]) 0))) "pr99830.c":8:9
114 {*zero_extendqisi2_aarch64}
(nil))
In particular, it is combine_simplify_rtx that is called on:
(zero_extend:SI (subreg:QI (ior:TI (and:TI (reg/v:TI 103 [ f ])
(const_int -16711681 [0xffffffffff00ffff]))
(ashift:TI (and:TI (clobber:TI (const_int 0 [0]))
(const_int 255 [0xff]))
(const_int 16 [0x10]))) 0))
which simplifies it into
(and:SI (subreg:SI (reg/v:TI 103 [ f ]) 0)
(const_int 255 [0xff]))
But, there is also
(debug_insn 18 17 19 2 (var_location:HI c (subreg:HI (ashiftrt:SI
(sign_extend:SI (subreg:HI (reg/v:SI 100 [ c ]) 0))
(zero_extend:SI (subreg:QI (reg/v:TI 103 [ f ]) 0))) 0))
"pr99830.c":8:5 -1
(nil))
into which that try_combine propagate_for_debug the (reg/v:TI 103 [ f ])
i2dest and replace it with the i2src mentioned above.
In this case it is similarly used in a (subreg:QI ...) so in theory it could
also optimize into just the low bits of older r103. Except that
propagate_for_debug uses only simplify-rtx.c APIs and doesn't have
combine_simplify_rtx for it. But in theory it could also be used in other
contexts in the debug insn too.