On 4/17/25 1:30 AM, pan2...@intel.com wrote:
From: Pan Li <pan2...@intel.com>
This patch would like to combine the vec_duplicate + vadd.vv to the
vadd.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR, it will:
* The pattern matching will be inactive if GR2VR cost is zero.
* The cost of GR2VR will be added to the total cost of pattern, and
the late-combine will decide to perform the replacement or not
based on the cost value.
[ ... ]
So we've got 3 patches all touching on the same basic area, so we need
to be careful about staging in.
Alex's patch is mostly focused on improving pred_broadcast and is
probably independent except perhaps potentially wanting to use costing
to guide code generation.
Paul-Antoine's patch is focused on multiply-add type instructions. So
no real conflicts on the patterns, but there's likely overlap on costing
models.
And finally this patch which focuses on simple integer arithmetic and
some costing model work
So don't be surprised if most review time is focused on how the costing
model works since that's a common theme across these patches.
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*<optab>_vx_<mode>): Add new
combine to convert vec_duplicate + vadd.vv to vaddvx on GR2VR
cost.
* config/riscv/riscv.cc (riscv_rtx_costs): Extract vector
cost into a separated func.
(riscv_vector_rtx_costs): Add new func to take care of the
cost of vector rtx, default to 1 and append GR2VR cost to
vec_duplicate rtx.
* config/riscv/vector-iterators.md: Add new iterator for vx.
Signed-off-by: Pan Li <pan2...@intel.com>
---
gcc/config/riscv/autovec-opt.md | 22 ++++++++++++++++++++++
gcc/config/riscv/riscv.cc | 26 ++++++++++++++++++++------
gcc/config/riscv/vector-iterators.md | 4 ++++
3 files changed, 46 insertions(+), 6 deletions(-)
diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 0c3b0cc7e05..ab45fe2511b 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -1673,3 +1673,25 @@ (define_insn_and_split "*vandn_<mode>"
DONE;
}
[(set_attr "type" "vandn")])
+
+;;
=============================================================================
+;; Combine vec_duplicate + op.vv to op.vx
+;; Include
+;; - vadd.vx
+;;
=============================================================================
+(define_insn_and_split "*<optab>_vx_<mode>"
+ [(set (match_operand:V_VLSI 0 "register_operand")
+ (any_int_binop_no_shift_vx:V_VLSI
+ (vec_duplicate:V_VLSI
+ (match_operand:<VEL> 1 "register_operand"))
+ (match_operand:V_VLSI 2 "<binop_rhs2_predicate>")))]
+ "TARGET_VECTOR && can_create_pseudo_p () && get_vector_costs ()->regmove->GR2VR
!= 0"
+ "#"
+ "&& 1"
+ [(const_int 0)]
+ {
+ rtx ops[] = {operands[0], operands[2], operands[1]};
+ riscv_vector::emit_vlmax_insn (code_for_pred_scalar (<CODE>, <MODE>mode),
+ riscv_vector::BINARY_OP, ops);
+ }
+ [(set_attr "type" "vialu")])
So there's a bit of a natural tension between generating better code
early and waiting for combine in this space. As we've discussed
broadcasting across the vector in the loop header may perform better
than using a .vx/.vf form in a loop where we have to pay the cost to
transfer data between units.
We could attack this problem in the expander and that would be my strong
preference in the scalar space. But it's less clear for vector, in
particular it's unclear if we'd perform LICM and broadcast the value
across the vector in a loop pre-header if we've already embedded the
value as a GPR/FPR in the vector instruction.
I don't particularly like how this interacts with costing. Rather than
testing GR2VR like you've done, the better solution is to add the
appropriate code to riscv_rtx_costs to actually cost the vector insn as
a whole. Part of that cost computation would be to look at the operand
and do something sensible with it (which would likely reference the cost
to move data between units such as GR2VR). If you do that, then the
standard costing tests in combine should get used.
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 38f3ae7cd84..9bd0dbcf5f6 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3856,16 +3856,30 @@ riscv_extend_cost (rtx op, bool unsigned_p)
#define SINGLE_SHIFT_COST 1
static bool
-riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno
ATTRIBUTE_UNUSED,
- int *total, bool speed)
+riscv_vector_rtx_costs (rtx x, machine_mode mode, int *total)
{
+ gcc_assert (riscv_v_ext_mode_p (mode));
+
/* TODO: We set RVV instruction cost as 1 by default.
Cost Model need to be well analyzed and supported in the future. */
+ int cost_val = 1;
+ enum rtx_code rcode = GET_CODE (x);
+
+ /* Aka (vec_duplicate:RVVM1DI (reg/v:DI 143 [ x ])) */
+ if (rcode == VEC_DUPLICATE && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 0))))
+ cost_val += get_vector_costs ()->regmove->GR2VR;
+
+ *total = COSTS_N_INSNS (cost_val);
+
+ return true;
+}
+
+static bool
+riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno
ATTRIBUTE_UNUSED,
+ int *total, bool speed)
+{
if (riscv_v_ext_mode_p (mode))
- {
- *total = COSTS_N_INSNS (1);
- return true;
- }
+ return riscv_vector_rtx_costs (x, mode, total);
This feels very much like a hack to me. Why not just handle
VEC_DUPLICATE like the rest of the opcodes in riscv_rtx_costs?
Ultimately all that code needs to work together rather than having
separate paths for vector/scalar.
+(define_code_iterator any_int_binop_no_shift_vx
+ [plus
+])
+
Did you plan to support any other binary ops? ISTM that any binary op
that provides a .vx form could be handled here.
Jeff