On 4/17/25 1:30 AM, pan2...@intel.com wrote:
From: Pan Li <pan2...@intel.com>

This patch would like to combine the vec_duplicate + vadd.vv to the
vadd.vx.  From example as below code.  The related pattern will depend
on the cost of vec_duplicate from GR2VR, it will:

* The pattern matching will be inactive if GR2VR cost is zero.
* The cost of GR2VR will be added to the total cost of pattern, and
   the late-combine will decide to perform the replacement or not
   based on the cost value.
[ ... ]
So we've got 3 patches all touching on the same basic area, so we need to be careful about staging in.

Alex's patch is mostly focused on improving pred_broadcast and is probably independent except perhaps potentially wanting to use costing to guide code generation.

Paul-Antoine's patch is focused on multiply-add type instructions. So no real conflicts on the patterns, but there's likely overlap on costing models.

And finally this patch which focuses on simple integer arithmetic and some costing model work

So don't be surprised if most review time is focused on how the costing model works since that's a common theme across these patches.




gcc/ChangeLog:

        * config/riscv/autovec-opt.md (*<optab>_vx_<mode>): Add new
        combine to convert vec_duplicate + vadd.vv to vaddvx on GR2VR
        cost.
        * config/riscv/riscv.cc (riscv_rtx_costs): Extract vector
        cost into a separated func.
        (riscv_vector_rtx_costs): Add new func to take care of the
        cost of vector rtx, default to 1 and append GR2VR cost to
        vec_duplicate rtx.
        * config/riscv/vector-iterators.md: Add new iterator for vx.

Signed-off-by: Pan Li <pan2...@intel.com>
---
  gcc/config/riscv/autovec-opt.md      | 22 ++++++++++++++++++++++
  gcc/config/riscv/riscv.cc            | 26 ++++++++++++++++++++------
  gcc/config/riscv/vector-iterators.md |  4 ++++
  3 files changed, 46 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 0c3b0cc7e05..ab45fe2511b 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -1673,3 +1673,25 @@ (define_insn_and_split "*vandn_<mode>"
      DONE;
    }
    [(set_attr "type" "vandn")])
+
+;; 
=============================================================================
+;; Combine vec_duplicate + op.vv to op.vx
+;; Include
+;; - vadd.vx
+;; 
=============================================================================
+(define_insn_and_split "*<optab>_vx_<mode>"
+ [(set (match_operand:V_VLSI    0 "register_operand")
+       (any_int_binop_no_shift_vx:V_VLSI
+        (vec_duplicate:V_VLSI
+          (match_operand:<VEL> 1 "register_operand"))
+        (match_operand:V_VLSI  2 "<binop_rhs2_predicate>")))]
+  "TARGET_VECTOR && can_create_pseudo_p () && get_vector_costs ()->regmove->GR2VR 
!= 0"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+    rtx ops[] = {operands[0], operands[2], operands[1]};
+    riscv_vector::emit_vlmax_insn (code_for_pred_scalar (<CODE>, <MODE>mode),
+                                  riscv_vector::BINARY_OP, ops);
+  }
+  [(set_attr "type" "vialu")])
So there's a bit of a natural tension between generating better code early and waiting for combine in this space. As we've discussed broadcasting across the vector in the loop header may perform better than using a .vx/.vf form in a loop where we have to pay the cost to transfer data between units.

We could attack this problem in the expander and that would be my strong preference in the scalar space. But it's less clear for vector, in particular it's unclear if we'd perform LICM and broadcast the value across the vector in a loop pre-header if we've already embedded the value as a GPR/FPR in the vector instruction.

I don't particularly like how this interacts with costing. Rather than testing GR2VR like you've done, the better solution is to add the appropriate code to riscv_rtx_costs to actually cost the vector insn as a whole. Part of that cost computation would be to look at the operand and do something sensible with it (which would likely reference the cost to move data between units such as GR2VR). If you do that, then the standard costing tests in combine should get used.



diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 38f3ae7cd84..9bd0dbcf5f6 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3856,16 +3856,30 @@ riscv_extend_cost (rtx op, bool unsigned_p)
  #define SINGLE_SHIFT_COST 1
static bool
-riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno 
ATTRIBUTE_UNUSED,
-                int *total, bool speed)
+riscv_vector_rtx_costs (rtx x, machine_mode mode, int *total)
  {
+  gcc_assert (riscv_v_ext_mode_p (mode));
+
    /* TODO: We set RVV instruction cost as 1 by default.
       Cost Model need to be well analyzed and supported in the future. */
+  int cost_val = 1;
+  enum rtx_code rcode = GET_CODE (x);
+
+  /* Aka (vec_duplicate:RVVM1DI (reg/v:DI 143 [ x ]))  */
+  if (rcode == VEC_DUPLICATE && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 0))))
+    cost_val += get_vector_costs ()->regmove->GR2VR;
+
+  *total = COSTS_N_INSNS (cost_val);
+
+  return true;
+}
+
+static bool
+riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno 
ATTRIBUTE_UNUSED,
+                int *total, bool speed)
+{
    if (riscv_v_ext_mode_p (mode))
-    {
-      *total = COSTS_N_INSNS (1);
-      return true;
-    }
+    return riscv_vector_rtx_costs (x, mode, total);
This feels very much like a hack to me. Why not just handle VEC_DUPLICATE like the rest of the opcodes in riscv_rtx_costs? Ultimately all that code needs to work together rather than having separate paths for vector/scalar.


+(define_code_iterator any_int_binop_no_shift_vx
+ [plus
+])
+
Did you plan to support any other binary ops? ISTM that any binary op that provides a .vx form could be handled here.

Jeff

Reply via email to