On Sun, Apr 23, 2023 at 10:14 PM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > This patch fixes PR rtl-optimization/109476, which is a code quality > regression affecting AVR. The cause is that the lower-subreg pass is > sometimes overly aggressive, lowering the LSHIFTRT below: > > (insn 7 4 8 2 (set (reg:HI 51) > (lshiftrt:HI (reg/v:HI 49 [ b ]) > (const_int 8 [0x8]))) "t.ii":4:36 557 {lshrhi3} > (nil)) > > into a pair of QImode SUBREG assignments: > > (insn 19 4 20 2 (set (subreg:QI (reg:HI 51) 0) > (reg:QI 54 [ b+1 ])) "t.ii":4:36 86 {movqi_insn_split} > (nil)) > (insn 20 19 8 2 (set (subreg:QI (reg:HI 51) 1) > (const_int 0 [0])) "t.ii":4:36 86 {movqi_insn_split} > (nil)) > > but this idiom, SETs of SUBREGs, interferes with combine's ability > to associate/fuse instructions. The solution, on targets that > have a suitable ZERO_EXTEND (i.e. where the lower-subreg pass > wouldn't itself split a ZERO_EXTEND, so "splitting_zext" is false), > is to split/lower LSHIFTRT to a ZERO_EXTEND. > > To answer Richard's question in comment #10 of the bugzilla PR, > the function resolve_shift_zext is called with one of four RTX > codes, ASHIFTRT, LSHIFTRT, ZERO_EXTEND and ASHIFT, but only with > LSHIFTRT can the setting of low_part and high_part SUBREGs be > replaced by a ZERO_EXTEND. For ASHIFTRT, we require a sign > extension, so don't set the high_part to zero; if we're splitting > a ZERO_EXTEND then it doesn't make sense to replace it with a > ZERO_EXTEND, and for ASHIFT we've played games to swap the > high_part and low_part SUBREGs, so that we assign the low_part > to zero (for double word shifts by greater than word size bits). > > This patch has been tested on x86_64-pc-linux-gnu with a make > bootstrap and make -k check, both 64-bit and 32-bit, with no > new regressions. Many thanks to Jeff Law for testing this patch > on his build farm, which spotted an issue on xstormy16, which > should now be fixed by (either of) my recent xstormy16 patches. > Ok for mainline?
OK. Thanks, Richard. > > 2023-04-23 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > PR rtl-optimization/109476 > * lower-subreg.cc: Include explow.h for force_reg. > (find_decomposable_shift_zext): Pass an additional SPEED_P argument. > If decomposing a suitable LSHIFTRT and we're not splitting > ZERO_EXTEND (based on the current SPEED_P), then use a ZERO_EXTEND > instead of setting a high part SUBREG to zero, which helps combine. > (decompose_multiword_subregs): Update call to resolve_shift_zext. > > gcc/testsuite/ChangeLog > PR rtl-optimization/109476 > * gcc.target/avr/mmcu/pr109476.c: New test case. > > > Thanks in advance, > Roger > -- >