Thanks Robin for help.

> as I suggested initializes total with an estimate of the mode size (total = 8 
> for me) before we get to riscv_rtx_cost.  This makes the rest of the
> costs (which we assume to be relative to 4) inaccurate.

I see, that explains how cost value 8 comes from.

> Then we should perform the combination for GR2VR == 0 and not for GR2VR > 0.

Yes, that is correct, will resend the v3 within this change.

Pan

-----Original Message-----
From: Robin Dapp <rdapp....@gmail.com> 
Sent: Tuesday, April 29, 2025 9:47 PM
To: Li, Pan2 <pan2...@intel.com>; Robin Dapp <rdapp....@gmail.com>; 
gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; Chen, 
Ken <ken.c...@intel.com>; Liu, Hongtao <hongtao....@intel.com>; Robin Dapp 
<rdapp....@gmail.com>
Subject: Re: [PATCH v2 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx 
on GR2VR cost

> I see, let the vec_dup enter the rtx_cost again to append the total to vmv, I 
> have a try testing.  For example with below change:
>
> +       switch (rcode)
> +       {
> +         case VEC_DUPLICATE:
> +           *total += get_vector_costs ()->regmove->GR2VR * COSTS_N_INSNS 
> (1);
> +           break;
> +         case PLUS:
> +           {
> +           rtx op_0 = XEXP (x, 0);   +           rtx op_1 = XEXP (x, 1);
> +           if (GET_CODE (op_0) == VEC_DUPLICATE
> +               || GET_CODE (op_1) == VEC_DUPLICATE)
> +             *total += get_vector_costs ()->regmove->GR2VR * COSTS_N_INSNS 
> (1);
> +           else
> +             *total = COSTS_N_INSNS (1);
> +           break;
> +           }
> +         default:
> +           *total = COSTS_N_INSNS (1);
> +           break;
> +       }
> +
> +       return true;
>
> For case_0, GR2VR is 0, we will have late-combine as blow:
>   51   │ trying to combine definition of r135 in:
>   52   │    11: r135:RVVM1SI=vec_duplicate(r150:DI#0)
>   53   │ into:
>   54   │    18: r147:RVVM1SI=r146:RVVM1SI+r135:RVVM1SI
>   55   │       REG_DEAD r146:RVVM1SI
>   56   │ successfully matched this instruction to *add_vx_rvvm1si:
>   57   │ (set (reg:RVVM1SI 147 [ vect__6.8_16 ])
>   58   │     (plus:RVVM1SI (vec_duplicate:RVVM1SI (subreg/s/u:SI (reg:DI 150 
>   [ x ]) 0))
>   59   │         (reg:RVVM1SI 146)))
>   60   │ original cost = 8 + 4 (weighted: 39.483637), replacement cost = 8 
>   (weighted: 64.727273); rejecting replacement
>
>
> The vadd v, vec_dup(x) seems has the same cost as vec_dup here. I am also 
> confused about the how we calculate the
> vadd v, vec_dup(x), can we just set its' cost to vadd.vx? given we have 
> define_insn_and_split to match the pattern and
> emit the vadd.vx directly. And it matches the expr we mentioned vadd.vv + vec 
> == vadd.vx.
> Please help to correct me if misunderstanding.

Yes, that doesn't look quite correct yet.
I think the issue is that using

  *total += get_vector_costs ()->regmove->GR2VR * COSTS_N_INSNS (1);

as I suggested initializes total with an estimate of the mode size (total = 8 
for me) before we get to riscv_rtx_cost.  This makes the rest of the
costs (which we assume to be relative to 4) inaccurate.

So try
  *total = get_vector_costs ()->regmove->GR2VR * COSTS_N_INSNS (1);
for the vec_dup case and
  *total = COST_N_INSNS (1) + get_vector_costs ()->regmove->GR2VR * 
  COSTS_N_INSNS (1);
for the vx case.

Then we should perform the combination for GR2VR == 0 and not for GR2VR > 0.

Reply via email to