Hi Roger, on 2022/6/27 04:56, Roger Sayle wrote: > > > This patch tweaks the code generated on POWER for integer multiplications > > by a constant, by making use of rldimi instructions. Much like x86's > > lea instruction, rldimi can be used to implement a shift and add pair > > in some circumstances. For rldimi this is when the shifted operand > > is known to have no bits in common with the added operand. > > > > Hence for the new testcase below: > > > > int foo(int x) > > { > > int t = x & 42; > > return t * 0x2001; > > } > > > > when compiled with -O2, GCC currently generates: > > > > andi. 3,3,0x2a > > slwi 9,3,13 > > add 3,9,3 > > extsw 3,3 > > blr > > > > with this patch, we now generate: > > > > andi. 3,3,0x2a > > rlwimi 3,3,13,0,31-13 > > extsw 3,3 > > blr > > > > It turns out this optimization already exists in the form of a combine > > splitter in rs6000.md, but the constraints on combine splitters, > > requiring three of four input instructions (and generating one or two > > output instructions) mean it doesn't get applied as often as it could. > > This patch converts the define_split into a define_insn_and_split to > > catch more cases (such as the one above). > > > > The one bit that's tricky/controversial is the use of RTL's > > nonzero_bits which is accurate during the combine pass when this > > pattern is first recognized, but not as advanced (not kept up to > > date) when this pattern is eventually split. To support this, > > I've used a "|| reload_completed" idiom. Does this approach seem > > reasonable? [I've another patch of x86 that uses the same idiom]. > >
I tested this patch on powerpc64-linux-gnu, it caused the below ICE against test case gcc/testsuite/gcc.c-torture/compile/pr93098.c. gcc/testsuite/gcc.c-torture/compile/pr93098.c: In function ‘foo’: gcc/testsuite/gcc.c-torture/compile/pr93098.c:10:1: error: unrecognizable insn: (insn 104 32 34 2 (set (reg:SI 185 [+4 ]) (ior:SI (and:SI (reg:SI 200 [+4 ]) (const_int 4294967295 [0xffffffff])) (ashift:SI (reg:SI 140) (const_int 32 [0x20])))) "gcc/testsuite/gcc.c-torture/compile/pr93098.c":6:11 -1 (nil)) during RTL pass: subreg3 dump file: pr93098.c.291r.subreg3 gcc/testsuite/gcc.c-torture/compile/pr93098.c:10:1: internal compiler error: in extract_insn, at recog.cc:2791 0x101f664b _fatal_insn(char const*, rtx_def const*, char const*, int, char const*) gcc/rtl-error.cc:108 0x101f6697 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*) gcc/rtl-error.cc:116 0x10ae427f extract_insn(rtx_insn*) gcc/recog.cc:2791 0x11b239bb decompose_multiword_subregs gcc/lower-subreg.cc:1555 0x11b25013 execute gcc/lower-subreg.cc:1818 The above trace shows we fails to recog the pattern again due to the inaccurate nonzero_bits information as you pointed out above. There was another patch [1] which wasn't on trunk but touched this same define_split, not sure if that can help or we can follow the similar idea. [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-December/585841.html BR, Kewen