Thanks Robin for comments and help.

> Yes, at first we must deconstruct all relevant patterns as above for PLUS.
> The basic cost for the add is COST_N_INSNS (1) == 4.  If one operand is a 
> VEC_DUPLICATE then we increase the basic cost by GR2VR * COST_N_INSNS (1).  
> Is 
> that not sufficient for the combination to not happen?

The below changes will make the late-combine fail to happen, aka

vmv(4) + vadd.vv(4) = 8 < plus (v, vec_dup(x)) (4 + GR2VR * 4 = 4 + 8 = 12)

+      if (riscv_v_ext_mode_p (mode))
+       {
+         cost = 1;
+         rtx op_0 = XEXP (x, 0);
+         rtx op_1 = XEXP (x, 1);
+
+         if (GET_CODE (op_0) == VEC_DUPLICATE
+             || GET_CODE (op_1) == VEC_DUPLICATE)
+           cost += get_vector_costs ()->regmove->GR2VR;
+
+         *total = COSTS_N_INSNS (cost);
+         return true;
+       }

But this is not that good enough here if my understanding is correct.
As vmv.v.x is somehow equivalent to vec_dup but doesn't ref GR2VR,
thus shall we introduce 3 options and enrich the  get_vector_costs () instead? 
Aka

--param=vv_alu_cost=?
--param=vx_alu_cost=?
--param=xv_mov_cost=?

And then pick up the value directly from the options, or let the option pollute
the get_vector_costs respectively. They should be something similar as blow
in riscv_rtx_costs (take add as example).

case PLUS:
   if (op_0 is VEC_DUP or op_1 is VEC_DUP)
       cost = get_vector_costs ()->regalu->GR2VR;
   if (op_0 is reg && op_1 is reg)
       cost = get_vector_costs ()->regalu->VR2VR;

case VEC_DUP:
   cost = get_vector_costs ()->regmove->GR2VR;

> If we have
>   for (...)
>     {
>       vmv.vx
>       vadd.vv
>     }

> then this should be combined even if COST (vmv.vx) + COST (vadd.vv) == COST 
> (vadd.vx) because we save an instruction and need to perform the broadcast 
> anyway.

> For
>  vmv.vx
>   for (...)
>     {
>       vadd.vv
>     }

> the combination should not take place (when the costs are equal) because of 
> the 
> frequency consideration in late-combine's costing.
> When COST (vmv.vx) + COST (vadd.vv) > COST (vadd.vx) it should take place.

I see, will enrich the test cases for above two cases.

Pan 

-----Original Message-----
From: Robin Dapp <rdapp....@gmail.com> 
Sent: Monday, April 28, 2025 3:39 PM
To: Li, Pan2 <pan2...@intel.com>; Robin Dapp <rdapp....@gmail.com>; 
gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; Chen, 
Ken <ken.c...@intel.com>; Liu, Hongtao <hongtao....@intel.com>; Robin Dapp 
<rdapp....@gmail.com>
Subject: Re: [PATCH v2 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx 
on GR2VR cost

> Make sense to me, it looks like the combine will always take place if GR2VR 
> is 0, 1 or 2 for now.
> I am try to customize the cost here to make it fail to combine but get failed 
> with below change.
>
> +  if (rcode == VEC_DUPLICATE && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 0)))) {
> +    cost_val = 1;
> +  }
> +
> +  if (rcode == PLUS && riscv_v_ext_mode_p (GET_MODE (XEXP (x, 0)))
> +      && riscv_v_ext_mode_p (GET_MODE (XEXP (x, 1)))) {
> +    cost_val = 8;
> +  }
> +
> +  if (rcode == PLUS && riscv_v_ext_mode_p (GET_MODE (XEXP (x, 0)))
> +      && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 1)))) {
> +    cost_val = 2; // never picked up during combine.
> +  }

I think this slightly is too simple (as you also showed) due to COST_N_INSNS 
(0) == 4.  We need to make sure to match the full patterns and then set their 
costs.  There must be three distinct costing paths: vadd.vv, vadd.vx and 
vmv.vx.

>
> It takes 8 for original cost as well as replacement(see below combine log). 
> Thus, it will be always
> keep replacement during combine. 
>
>   51   │ trying to combine definition of r135 in:
>   52   │    11: r135:RVVM1DI=vec_duplicate(r150:DI)
>   53   │ into:
>   54   │    18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI
>   55   │       REG_DEAD r146:RVVM1DI
>   56   │ successfully matched this instruction to *add_vx_rvvm1di:
>   57   │ (set (reg:RVVM1DI 147 [ vect__6.8_16 ])
>   58   │     (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ]))
>   59   │         (reg:RVVM1DI 146)))
>   60   │ original cost = 4 + 32 (weighted: 262.469092), replacement cost = 32 
> (weighted: 258.909092); keeping replacement
>   61   │ rescanning insn with uid = 18.
>   62   │ updating insn 18 in-place
>   63   │ verify found no changes in insn with uid = 18.
>   64   │ deleting insn 11
>   65   │ deleting insn with uid = 11.
>
> Based on above, I have another try to understand how late-combine leverage 
> the RTX_COST.
> Aka, set vadd v1, (vec_dup(x1)) to 8 and others to 1.
>
> +  if (rcode == PLUS) {
> +    rtx arg0 = XEXP (x, 0);
> +    rtx arg1 = XEXP (x, 1);
> +
> +    if (riscv_v_ext_mode_p (GET_MODE (arg1))
> +       && GET_CODE (arg0) == VEC_DUPLICATE) {
> +       cost_val = 8;
> +    }
> +  }
>
> Then the late-combine reject the replacement as expected. Thus, the condition 
> failed to combine may
> Looks like vmv.vx + vadd.vv < vadd.vx here if my understanding is correct.  
> If so, it will also impact the
> --param we would like to introduce, a single --param=gr2vr_cost=XXX is not 
> good enough to make sure that
> the condition is true, we may need --param=vv_cost/vx_cost=XXX.

Yes, at first we must deconstruct all relevant patterns as above for PLUS.
The basic cost for the add is COST_N_INSNS (1) == 4.  If one operand is a 
VEC_DUPLICATE then we increase the basic cost by GR2VR * COST_N_INSNS (1).  Is 
that not sufficient for the combination to not happen?

If we have
  for (...)
    {
      vmv.vx
      vadd.vv
    }

then this should be combined even if COST (vmv.vx) + COST (vadd.vv) == COST 
(vadd.vx) because we save an instruction and need to perform the broadcast 
anyway.

For
  vmv.vx
  for (...)
    {
      vadd.vv
    }

the combination should not take place (when the costs are equal) because of the 
frequency consideration in late-combine's costing.
When COST (vmv.vx) + COST (vadd.vv) > COST (vadd.vx) it should take place.

We first need to get these basic building blocks correct before considering 
something else.

> Btw, is there any approach to set the cost attached to the 
> define_insn_and_split? Which may be more
> friendly to catch it from RTX_COST up to a point.
>
>   51   │ trying to combine definition of r135 in:
>   52   │    11: r135:RVVM1DI=vec_duplicate(r150:DI)
>   53   │ into:
>   54   │    18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI
>   55   │       REG_DEAD r146:RVVM1DI
>   56   │ successfully matched this instruction to *add_vx_rvvm1di:
>   57   │ (set (reg:RVVM1DI 147 [ vect__6.8_16 ])
>   58   │     (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ]))
>   59   │         (reg:RVVM1DI 146)))
>   60   │ original cost = 4 + 4 (weighted: 35.923637), replacement cost = 32 
> (weighted: 258.909092); rejecting replacement
>   61   │

In the end we'll have to capture all patterns (predicated and unpredicated) 
anyway so just tackling the unsplit ones only helps so much.

-- 
Regards
 Robin

Reply via email to