[ The more verbose explanation than usual is primarily for Daniel and
  Shreya's benefit. ]

--

So here's the other case I was just looking at. This is a slightly modified version of some code from 500.perlbench which shows another nop logical operation:

void frob (void);
typedef struct av AV;
typedef unsigned int U32;
struct av
{
  void *dummy;
  U32 sv_refcnt;
  U32 sv_flags;
};
void
Perl_save_ary (AV *const oav)
{
  AV *av;
  unsigned int x1 = oav->sv_flags;
  unsigned int x2 = x1 & 3221225472;
  if (x2 == 2147483648)
    frob ();
}


https://godbolt.org/z/941vqfGE6

It's not as obvious, but this is probably a regression as well. I would expect the gcc-14 code to execute in 1c faster than the current trunk code on a superscalar design:

gcc-14:                               trunk:
        lw      a5,12(a0)                   lw      a5,12(a0)
        li      a3,-1073741824              li      a3,-2
        li      a4,-2147483648

        and     a5,a5,a3                    srai    a4,a5,30

        beq     a5,a4,.L4                   andi    a4,a4,-1

                                            beq     a4,a3,.L4

Essentially the "li" instrutions can execute in parallel with the lw. But the rest of the sequence has data dependencies forcing the instructions to execute serially. Thus that extra andi extends the critical path by 1c.

Removing the useless andi should make the two sequences perform the same and reduces the codesize.

Much like the prior case we walk backwards using -fdump-rtl-all -dp to find the andi:

        andi    a4,a4,-1        # 26    [c=4 l=4]  *anddi3/1

The UID is 26. And just like the prior case it first shows up in the .split2 dump:

grep insn\ 26 j.c.*
j.c.326r.split2:(insn 26 25 27 2 (set (reg:DI 14 a4 [144])
j.c.327r.ree:(insn 26 25 27 2 (set (reg:DI 14 a4 [144])
j.c.329r.pro_and_epilogue:(insn 26 25 27 2 (set (reg:DI 14 a4 [144])
j.c.330r.dse2:(insn 26 25 27 2 (set (reg:DI 14 a4 [144])


In the .split2 dump:

Splitting with gen_split_77 (riscv.md:3184)
scanning new insn with uid = 25.
scanning new insn with uid = 26.
scanning new insn with uid = 27.
scanning new insn with uid = 28.
deleting insn with uid = 12.
deleting insn with uid = 12.

So insn 12 is where we want to look.

(jump_insn 12 6 13 2 (parallel [
            (set (pc)
                (if_then_else (ne (and:DI (reg:DI 15 a5 [orig:138 
oav_3(D)->sv_flags ] [138])
                            (const_int -1073741824 [0xffffffffc0000000]))
                        (const_int -2147483648 [0xffffffff80000000]))
                    (label_ref:DI 18)
                    (pc)))
            (clobber (reg:DI 14 a4 [144]))
            (clobber (reg:DI 13 a3 [145]))
        ]) "j.c":16:6 361 {*branchdi_shiftedarith_ne_shifted}
     (int_list:REG_BR_PROB 856416484 (nil))
 -> 18)

So that's a conditional branch with the condition

(a5 & 0xffffffffc0000000) != 0xffffffff80000000

Note how those instructions have many low bits as zeros and that the constants likely require some kind of constant synthesis. We can conceptually do an arithmetic right shift of a5 and both constants and get the same result, likely making the constants easier to synthesize.

And that's precisely what this pattern is designed to do:

(define_insn_and_split "*branch<ANYI:mode>_shiftedarith_<optab>_shifted"
  [(set (pc)
        (if_then_else (any_eq
                    (and:ANYI (match_operand:ANYI 1 "register_operand" "r")
                          (match_operand 2 "shifted_const_arith_operand" "i"))
                    (match_operand 3 "shifted_const_arith_operand" "i"))
         (label_ref (match_operand 0 "" ""))
         (pc)))
   (clobber (match_scratch:X 4 "=&r"))
   (clobber (match_scratch:X 5 "=&r"))]
  "!SMALL_OPERAND (INTVAL (operands[2]))
    && !SMALL_OPERAND (INTVAL (operands[3]))
    && SMALL_AFTER_COMMON_TRAILING_SHIFT (INTVAL (operands[2]),
                                             INTVAL (operands[3]))"
  "#"
  "&& reload_completed"
  [(set (match_dup 4) (ashiftrt:X (match_dup 1) (match_dup 7)))
   (set (match_dup 4) (and:X (match_dup 4) (match_dup 8)))
   (set (match_dup 5) (match_dup 9))
   (set (pc) (if_then_else (any_eq (match_dup 4) (match_dup 5))
                           (label_ref (match_dup 0)) (pc)))]
{
  HOST_WIDE_INT mask1 = INTVAL (operands[2]);
  HOST_WIDE_INT mask2 = INTVAL (operands[3]);
  int trailing_shift = COMMON_TRAILING_ZEROS (mask1, mask2);

  operands[7] = GEN_INT (trailing_shift);
  operands[8] = GEN_INT (mask1 >> trailing_shift);
  operands[9] = GEN_INT (mask2 >> trailing_shift);
}
It finds the number of low bits in both that must be zero. In this case it's 30 bits. So it shifts the register right by 30 bits. Then constructs the two new constants, one of which is -1 after shifting. And we emit (set (match_dup 4) (and (match_dup 4) (const_int -1))

And since this splits after register allocation nothing eliminates the useless and dest,src,-1 and boom we have a regression.

The fix this time is a bit different. I really don't want to open code the new RTL. So instead I create a new operand for the source of the AND statement. If the constant is going to be -1 then that operand has the same value as the destination operand (ie, a nop move). Otherwise it is the appropriate AND expression.

The nop-move will get eliminated thus resolving the regression.

I suspect some of the other patterns in riscv.md are subject to similar issues, though I haven't seem them trigger, so I'm leaving them alone for now.

This has been tested in my tester and it'll obviously go through the upstream CI flow before I push it to the trunk.

Jeff








diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 84bce409bc7..26a247c2b96 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -3198,7 +3198,7 @@ (define_insn_and_split 
"*branch<ANYI:mode>_shiftedarith_<optab>_shifted"
   "#"
   "&& reload_completed"
   [(set (match_dup 4) (ashiftrt:X (match_dup 1) (match_dup 7)))
-   (set (match_dup 4) (and:X (match_dup 4) (match_dup 8)))
+   (set (match_dup 4) (match_dup 10))
    (set (match_dup 5) (match_dup 9))
    (set (pc) (if_then_else (any_eq (match_dup 4) (match_dup 5))
                           (label_ref (match_dup 0)) (pc)))]
@@ -3210,6 +3210,16 @@ (define_insn_and_split 
"*branch<ANYI:mode>_shiftedarith_<optab>_shifted"
   operands[7] = GEN_INT (trailing_shift);
   operands[8] = GEN_INT (mask1 >> trailing_shift);
   operands[9] = GEN_INT (mask2 >> trailing_shift);
+
+  /* This splits after reload, so there's little chance to clean things
+     up.  Rather than emit a ton of RTL here, we can just make a new
+     operand for that RHS and use it.  For the case where the AND would
+     have been redundant, we can make it a NOP move, which does get
+     cleaned up.  */
+  if (operands[8] == CONSTM1_RTX (word_mode))
+    operands[10] = operands[4];
+  else
+    operands[10] = gen_rtx_AND (word_mode, operands[4], operands[8]);
 }
 [(set_attr "type" "branch")])
 
diff --git a/gcc/testsuite/gcc.target/riscv/redundant-andi-2.c 
b/gcc/testsuite/gcc.target/riscv/redundant-andi-2.c
new file mode 100644
index 00000000000..ff0789d7d0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/redundant-andi-2.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcb -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+void frob (void);
+typedef struct av AV;
+typedef unsigned int U32;
+struct av
+{
+  void *dummy;
+  U32 sv_refcnt;
+  U32 sv_flags;
+};
+void
+Perl_save_ary (AV *const oav)
+{
+  AV *av;
+  unsigned int x1 = oav->sv_flags;
+  unsigned int x2 = x1 & 3221225472;
+  if (x2 == 2147483648)
+    frob ();
+}
+
+/* { dg-final { scan-assembler-not "andi\t" } } */
+

Reply via email to