Hi,
on 2024/3/1 10:41, HAO CHEN GUI wrote:
> Hi,
> This patch fixes regression cases in gcc.target/powerpc/rlwimi-2.c. In
> combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode
> lshiftrt with an out AND. It matches a DImode rotate and mask insert on
> rs6000.
>
> Trying 2 -> 7:
> 2: r122:DI=r129:DI
> REG_DEAD r129:DI
> 7: r125:SI=r122:DI#0 0>>0x1f
> REG_DEAD r122:DI
> Failed to match this instruction:
> (set (subreg:DI (reg:SI 125 [ x ]) 0)
> (zero_extract:DI (reg:DI 129)
> (const_int 32 [0x20])
> (const_int 1 [0x1])))
> Successfully matched this instruction:
> (set (subreg:DI (reg:SI 125 [ x ]) 0)
> (and:DI (lshiftrt:DI (reg:DI 129)
> (const_int 31 [0x1f]))
> (const_int 4294967295 [0xffffffff])))
>
> This conversion blocks the further combination which combines to a SImode
> rotate and mask insert insn.
>
> Trying 9, 7 -> 10:
> 9: r127:SI=r130:DI#0&0xfffffffffffffffe
> REG_DEAD r130:DI
> 7: r125:SI#0=r129:DI 0>>0x1f&0xffffffff
> REG_DEAD r129:DI
> 10: r124:SI=r127:SI|r125:SI
> REG_DEAD r125:SI
> REG_DEAD r127:SI
> Failed to match this instruction:
> (set (reg:SI 124)
> (ior:SI (and:SI (subreg:SI (reg:DI 130) 0)
> (const_int -2 [0xfffffffffffffffe]))
> (subreg:SI (zero_extract:DI (reg:DI 129)
> (const_int 32 [0x20])
> (const_int 1 [0x1])) 0)))
> Failed to match this instruction:
> (set (reg:SI 124)
> (ior:SI (and:SI (subreg:SI (reg:DI 130) 0)
> (const_int -2 [0xfffffffffffffffe]))
> (subreg:SI (and:DI (lshiftrt:DI (reg:DI 129)
> (const_int 31 [0x1f]))
> (const_int 4294967295 [0xffffffff])) 0)))
>
> The root cause of the issue is if it's necessary to do the widen mode for
> lshiftrt when the target already has the narrow mode lshiftrt and its cost
> is not high. My former patch tried to fix the problem but not accepted yet.
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624852.html
Hope Segher can chime in this proposal updating combine pass, I can understand
the new proposal by introducing new patterns in target code is able to fix the
issue, but IMHO it's likely there are some other mis-optimization which don't
get noticed and they need some similar pattern extension (duplicate some pattern
& adjust with subreg) to optimize, from this perspective, it would be nice if
it's possible to have a more general fix.
Some minor comments for this patch itself are inlined.
>
> As it's stage 4 now, I drafted this patch to fix the regression by adding
> subreg patterns of SImode rotate and mask insert. It actually does reversed
> things and narrow the mode for lshiftrt so that it can matches the SImode
> rotate and mask insert.
>
> The case "rlwimi-2.c" is fixed and restore the corresponding number of
> insns to original ones. The case "rlwinm-0.c" is also changed and 9 "rlwinm"
> is replaced with 9 "rldicl" as the sequence of combine is changed. It's not
> a regression as the total number of insns isn't changed.
>
> Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is it OK for the trunk?
>
> Thanks
> Gui Haochen
>
>
> ChangeLog
> rs6000: Add subreg patterns for SImode rotate and mask insert
>
> In combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode
> lshiftrt with an AND. The new pattern matches rotate and mask insert on
> rs6000. Thus it blocks the pattern to be further combined to a SImode rotate
> and mask insert pattern. This patch fixes the problem by adding two subreg
> pattern for SImode rotate and mask insert patterns.
>
> gcc/
> PR target/93738
> * config/rs6000/rs6000.md (*rotlsi3_insert_9): New.
> (*rotlsi3_insert_8): New.
>
> gcc/testsuite/
> PR target/93738
> * gcc.target/powerpc/rlwimi-2.c: Adjust the number of 64bit and 32bit
> rotate instructions.
> * gcc.target/powerpc/rlwinm-0.c: Likewise.
>
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index bc8bc6ab060..b0b40f91e3e 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -4253,6 +4253,36 @@ (define_insn "*rotl<mode>3_insert"
> ; difference between rlwimi and rldimi. We also might want dot forms,
> ; but not for rlwimi on POWER4 and similar processors.
>
> +; Subreg pattern of insn "*rotlsi3_insert"
> +(define_insn_and_split "*rotlsi3_insert_9"
Nit: "*rotlsi3_insert_subreg" seems a better name, ...
> + [(set (match_operand:SI 0 "gpc_reg_operand" "=r")
> + (ior:SI (and:SI
> + (match_operator:SI 8 "lowpart_subreg_operator"
> + [(and:DI (match_operator:DI 4 "rotate_mask_operator"
> + [(match_operand:DI 1 "gpc_reg_operand" "r")
> + (match_operand:SI 2 "const_int_operand" "n")])
> + (match_operand:DI 3 "const_int_operand" "n"))])
> + (match_operand:SI 5 "const_int_operand" "n"))
> + (and:SI (match_operand:SI 6 "gpc_reg_operand" "0")
> + (match_operand:SI 7 "const_int_operand" "n"))))]
> + "rs6000_is_valid_insert_mask (operands[5], operands[4], SImode)
> + && GET_CODE (operands[4]) == LSHIFTRT
> + && INTVAL (operands[3]) == 0xffffffff
> + && UINTVAL (operands[5]) + UINTVAL (operands[7]) + 1 == 0"
> + "#"
> + "&& 1"
> + [(set (match_dup 0)
> + (ior:SI (and:SI (lshiftrt:SI (match_dup 9)
> + (match_dup 2))
> + (match_dup 5))
> + (and:SI (match_dup 6)
> + (match_dup 7))))]
> +{
> + int offset = BYTES_BIG_ENDIAN ? 4 : 0;
> + operands[9] = gen_rtx_SUBREG (SImode, operands[1], offset);
> +}
> + [(set_attr "type" "insert")])
> +
> (define_insn "*rotl<mode>3_insert_2"
> [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
> (ior:GPR (and:GPR (match_operand:GPR 5 "gpc_reg_operand" "0")
> @@ -4331,6 +4361,31 @@ (define_insn "*rotlsi3_insert_4"
> "rlwimi %0,%1,32-%h2,%h2,31"
> [(set_attr "type" "insert")])
>
> +; Subreg pattern of insn "*rotlsi3_insert_4"
> +(define_insn_and_split "*rotlsi3_insert_8"
..., and "*rotlsi3_insert_4_subreg" for this.
> + [(set (match_operand:SI 0 "gpc_reg_operand" "=r")
> + (ior:SI (and:SI (match_operand:SI 3 "gpc_reg_operand" "0")
> + (match_operand:SI 4 "const_int_operand" "n"))
> + (match_operator:SI 6 "lowpart_subreg_operator"
> + [(and:DI
> + (lshiftrt:DI (match_operand:DI 1 "gpc_reg_operand" "r")
> + (match_operand:SI 2 "const_int_operand" "n"))
> + (match_operand:DI 5 "const_int_operand" "n"))])))]
> + "INTVAL (operands[2]) + exact_log2 (-UINTVAL (operands[4])) == 32
> + && INTVAL (operands[5]) == 0xffffffff"
> + "#"
> + "&& 1"
> + [(set (match_dup 0)
> + (ior:SI (and:SI (match_dup 3)
> + (match_dup 4))
> + (lshiftrt:SI (match_dup 7)
> + (match_dup 2))))]
> +{
> + int offset = BYTES_BIG_ENDIAN ? 4 : 0;
> + operands[7] = gen_rtx_SUBREG (SImode, operands[1], offset);
> +}
> + [(set_attr "type" "insert")])
> +
> (define_insn "*rotlsi3_insert_5"
> [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r")
> (ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r")
> diff --git a/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c
> b/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c
> index bafa371db73..62344a95aa0 100644
> --- a/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c
> +++ b/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c
> @@ -6,10 +6,9 @@
> /* { dg-final { scan-assembler-times {(?n)^\s+blr} 6750 } } */
> /* { dg-final { scan-assembler-times {(?n)^\s+mr} 643 { target ilp32 } } } */
> /* { dg-final { scan-assembler-times {(?n)^\s+mr} 11 { target lp64 } } } */
> -/* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 7790 { target lp64 } }
> } */
> +/* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 6728 { target lp64 } }
> } */
>
> -/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 { target ilp32 }
> } } */
> -/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1666 { target lp64 } }
> } */
> +/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 } } */
>
> /* { dg-final { scan-assembler-times {(?n)^\s+mulli} 5036 } } */
>
> diff --git a/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c
> b/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c
> index 4f4fca2d8ef..a10d9174306 100644
> --- a/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c
> +++ b/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c
> @@ -4,10 +4,10 @@
> /* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 6739 { target ilp32 } }
> } */
> /* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 9716 { target lp64 } }
> } */
> /* { dg-final { scan-assembler-times {(?n)^\s+blr} 3375 } } */
> -/* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 3081 { target lp64 } }
> } */
> +/* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 3090 { target lp64 } }
> } */
>
> /* { dg-final { scan-assembler-times {(?n)^\s+rlwinm} 3197 { target ilp32 }
> } } */
> -/* { dg-final { scan-assembler-times {(?n)^\s+rlwinm} 3093 { target lp64 } }
> } */
> +/* { dg-final { scan-assembler-times {(?n)^\s+rlwinm} 3084 { target lp64 } }
> } */
> /* { dg-final { scan-assembler-times {(?n)^\s+rotlwi} 154 } } */
> /* { dg-final { scan-assembler-times {(?n)^\s+srwi} 13 { target ilp32 } } }
> */
> /* { dg-final { scan-assembler-times {(?n)^\s+srdi} 13 { target lp64 } } } */
Peter's commit r14-9085-g81e5f276c59897 has fixed this with same counts, not
rebased?
BR,
Kewen