[to-be-committed][RISC-V] Slightly improve broadcasting small constants into vectors

Jeff Law Fri, 11 Oct 2024 06:25:57 -0700

I probably spent way more time on this than it's worth...

I was looking at the code we generate for vector SAD and noticed that wewere being a bit silly. Specifically:


        li      a4,0            # 272   [c=4 l=4]  *movsi_internal/1

Followed shortly by:

        vmv.s.x v3,a4   # 261   [c=4 l=4]  *pred_broadcastrvvm1si/6

And no other uses of a4.  We could have used x0 trivially.

First we adjust the expander so that it doesn't force the constant intoa register. In the matching pattern we change the appropriate sourceconstraints from "r" to "rJ" and the output template is changed to use%z for the operand. The net is we drop the li completely and emitvmv.s.x,v3,x0.

But wait, there's more. If we're broadcasting a constant in the range[-16..15] into a vector, we currently load the constant into a registerand use vmv.v.r. We can instead use vmv.v.i, which avoids loading theconstant into a GPR. For that case we again avoid forcing the constantinto a register in the expander and adjust the output template to emitvmv.v.x or vmv.v.i based on whether or not the appropriate operand is aconstant or general purpose register. So again, we'll drop a loadimmediate into a scalar for this case.

Whether or not we should use vmv.v.i vs vmv.s.x for loading [-16..15]into the 0th element is probably uarch dependent. The tradeoff isloading the GPR vs the broadcast in the vector unit. I didn't botherwith this case.

Tested in my tester (which tests rv64gcv as a default codegen option).Will wait for the pre-commit tester to render a verdict.


Jeff

        * config/riscv/constraints.md (P): New constraint for constant
        integers -16..15.
        * config/riscv/vector.md (pred_broadcast<mode> expander): Do not force
        constants into registers quite so aggressively.
        (pred_broadcast<mode> insn & splitter): Adjust constraints to allow
        constants in a few cases and adjust output appropriately.

diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index 45f8e9602d2..9638942b733 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -70,6 +70,11 @@ (define_constraint "c08"
   (and (match_code "const_int")
        (match_test "ival == 8")))
 
+(define_constraint "P"
+  "A 5-bit signed immediate for vmv.v.i."
+  (and (match_code "const_int")
+       (match_test "IN_RANGE (ival, -16, 15)")))
+
 (define_constraint "K"
   "A 5-bit unsigned immediate for CSR access instructions."
   (and (match_code "const_int")
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 7c8780dc7c7..b3038087aa5 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -2118,6 +2118,16 @@ (define_expand "@pred_broadcast<mode>"
       emit_move_insn (tmp, gen_int_mode (value, Pmode));
       operands[3] = gen_rtx_SIGN_EXTEND (<VEL>mode, tmp);
     }
+  /* Never load (const_int 0) into a register, that's silly.  */
+  else if (operands[3] == CONST0_RTX (<VEL>mode))
+    ;
+  /* If we're broadcasting [-16..15] across more than just
+     element 0, then we can use vmv.v.i directly, thus avoiding
+     the load of the constant into a GPR.  */
+  else if (CONST_INT_P (operands[3])
+          && IN_RANGE (INTVAL (operands[3]), -16, 15)
+          && !satisfies_constraint_Wb1 (operands[1]))
+    ;
   else
     operands[3] = force_reg (<VEL>mode, operands[3]);
 })
@@ -2134,18 +2144,18 @@ (define_insn_and_split "*pred_broadcast<mode>"
             (reg:SI VL_REGNUM)
             (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
          (vec_duplicate:V_VLSI
-           (match_operand:<VEL> 3 "direct_broadcast_operand"       " r,  
r,Wdm,Wdm,Wdm,Wdm,  r,  r"))
-         (match_operand:V_VLSI 2 "vector_merge_operand"            "vu,  0, 
vu,  0, vu,  0, vu,  0")))]
+           (match_operand:<VEL> 3 "direct_broadcast_operand"       
"rP,rP,Wdm,Wdm,Wdm,Wdm, rJ, rJ"))
+         (match_operand:V_VLSI 2 "vector_merge_operand"            "vu, 0, vu, 
 0, vu,  0, vu,  0")))]
   "TARGET_VECTOR"
   "@
-   vmv.v.x\t%0,%3
-   vmv.v.x\t%0,%3
+   vmv.v.%o3\t%0,%3
+   vmv.v.%o3\t%0,%3
    vlse<sew>.v\t%0,%3,zero,%1.t
    vlse<sew>.v\t%0,%3,zero,%1.t
    vlse<sew>.v\t%0,%3,zero
    vlse<sew>.v\t%0,%3,zero
-   vmv.s.x\t%0,%3
-   vmv.s.x\t%0,%3"
+   vmv.s.x\t%0,%z3
+   vmv.s.x\t%0,%z3"
   "(register_operand (operands[3], <VEL>mode)
   || CONST_POLY_INT_P (operands[3]))
   && GET_MODE_BITSIZE (<VEL>mode) > GET_MODE_BITSIZE (Pmode)"

[to-be-committed][RISC-V] Slightly improve broadcasting small constants into vectors

Reply via email to