Hi Roger,

on 2022/6/27 04:56, Roger Sayle wrote:
>  
> 
> This patch tweaks the code generated on POWER for integer multiplications
> 
> by a constant, by making use of rldimi instructions.  Much like x86's
> 
> lea instruction, rldimi can be used to implement a shift and add pair
> 
> in some circumstances.  For rldimi this is when the shifted operand
> 
> is known to have no bits in common with the added operand.
> 
>  
> 
> Hence for the new testcase below:
> 
>  
> 
> int foo(int x)
> 
> {
> 
>   int t = x & 42;
> 
>   return t * 0x2001;
> 
> }
> 
>  
> 
> when compiled with -O2, GCC currently generates:
> 
>  
> 
>         andi. 3,3,0x2a
> 
>         slwi 9,3,13
> 
>         add 3,9,3
> 
>         extsw 3,3
> 
>         blr
> 
>  
> 
> with this patch, we now generate:
> 
>  
> 
>         andi. 3,3,0x2a
> 
>         rlwimi 3,3,13,0,31-13
> 
>         extsw 3,3
> 
>         blr
> 
>  
> 
> It turns out this optimization already exists in the form of a combine
> 
> splitter in rs6000.md, but the constraints on combine splitters,
> 
> requiring three of four input instructions (and generating one or two
> 
> output instructions) mean it doesn't get applied as often as it could.
> 
> This patch converts the define_split into a define_insn_and_split to
> 
> catch more cases (such as the one above).
> 
>  
> 
> The one bit that's tricky/controversial is the use of RTL's
> 
> nonzero_bits which is accurate during the combine pass when this
> 
> pattern is first recognized, but not as advanced (not kept up to
> 
> date) when this pattern is eventually split.  To support this,
> 
> I've used a "|| reload_completed" idiom.  Does this approach seem
> 
> reasonable? [I've another patch of x86 that uses the same idiom].
> 
>  

I tested this patch on powerpc64-linux-gnu, it caused the below ICE
against test case gcc/testsuite/gcc.c-torture/compile/pr93098.c.

gcc/testsuite/gcc.c-torture/compile/pr93098.c: In function ‘foo’:
gcc/testsuite/gcc.c-torture/compile/pr93098.c:10:1: error: unrecognizable insn:
(insn 104 32 34 2 (set (reg:SI 185 [+4 ])
        (ior:SI (and:SI (reg:SI 200 [+4 ])
                (const_int 4294967295 [0xffffffff]))
            (ashift:SI (reg:SI 140)
                (const_int 32 [0x20])))) 
"gcc/testsuite/gcc.c-torture/compile/pr93098.c":6:11 -1
     (nil))
during RTL pass: subreg3
dump file: pr93098.c.291r.subreg3
gcc/testsuite/gcc.c-torture/compile/pr93098.c:10:1: internal compiler error: in 
extract_insn, at recog.cc:2791
0x101f664b _fatal_insn(char const*, rtx_def const*, char const*, int, char 
const*)
        gcc/rtl-error.cc:108
0x101f6697 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
        gcc/rtl-error.cc:116
0x10ae427f extract_insn(rtx_insn*)
        gcc/recog.cc:2791
0x11b239bb decompose_multiword_subregs
        gcc/lower-subreg.cc:1555
0x11b25013 execute
        gcc/lower-subreg.cc:1818

The above trace shows we fails to recog the pattern again due to
the inaccurate nonzero_bits information as you pointed out above.

There was another patch [1] which wasn't on trunk but touched this
same define_split, not sure if that can help or we can follow the
similar idea.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-December/585841.html

BR,
Kewen

Reply via email to