> Ah, I see, thanks.  So vec_dup costs 1 + 2 and vadd.vv costs 1 totalling 4 
> while vadd.vx costs 1 + 2, making it cheaper?

Yes, looks we need to just assign the GR2VR when vec_dup. I also tried diff 
cost here to see
the impact to late-combine.

+  if (rcode == VEC_DUPLICATE && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 0)))) {
+    cost_val = get_vector_costs ()->regmove->GR2VR;
+  }

---- cut line ----

If GR2VR is 2, we will perform the combine as below.

 51 trying to combine definition of r135 in:
 52    11: r135:RVVM1DI=vec_duplicate(r150:DI)
 53 into:
 54    18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI
 55       REG_DEAD r146:RVVM1DI
 56 successfully matched this instruction to *add_vx_rvvm1di:
 57 (set (reg:RVVM1DI 147 [ vect__6.8_16 ])
 58     (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ]))
 59         (reg:RVVM1DI 146)))
 60 original cost = 8 + 4 (weighted: 39.483637), replacement cost = 4 
(weighted: 32.363637); keeping replacement
 61 rescanning insn with uid = 18.
 62 updating insn 18 in-place
 63 verify found no changes in insn with uid = 18.
 64 deleting insn 11
 65 deleting insn with uid = 11.

---- cut line ----

If GR2VR is 1, we will perform the combine as below.

  51   │ trying to combine definition of r135 in:
  52   │    11: r135:RVVM1DI=vec_duplicate(r150:DI)
  53   │ into:
  54   │    18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI
  55   │       REG_DEAD r146:RVVM1DI
  56   │ successfully matched this instruction to *add_vx_rvvm1di:
  57   │ (set (reg:RVVM1DI 147 [ vect__6.8_16 ])
  58   │     (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ]))
  59   │         (reg:RVVM1DI 146)))
  60   │ original cost = 4 + 4 (weighted: 35.923637), replacement cost = 4 
(weighted: 32.363637); keeping replacement
  61   │ rescanning insn with uid = 18.
  62   │ updating insn 18 in-place
  63   │ verify found no changes in insn with uid = 18.
  64   │ deleting insn 11
  65   │ deleting insn with uid = 11.

---- cut line ----

If GR2VR is 0, it will be normalized to 1 as below, thus the combine log looks 
like the same as above.

return cost > 0 ? cost : COSTS_N_INSNS (1); gcc/rtlanal.cc:5766

it looks like we need to reconcile the vadd.vv cost either here ? Or am I 
missing something here.

> With such a change the tests wouldn't pass by default (AFAICT) and we would 
> need a --param=xx.  I wouldn't worry about exposing those details to the user 
> for now as we're so early in the cycle and can easily iterate on it.  I would
> suggest just adding something in order to make the tests work as expected and 
> change things later (if needed).

I see, will append the --param patch into the series.

Pan


-----Original Message-----
From: Robin Dapp <rdapp....@gmail.com> 
Sent: Wednesday, April 23, 2025 3:01 PM
To: Li, Pan2 <pan2...@intel.com>; Robin Dapp <rdapp....@gmail.com>; 
gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; Chen, 
Ken <ken.c...@intel.com>; Liu, Hongtao <hongtao....@intel.com>; Robin Dapp 
<rdapp....@gmail.com>
Subject: Re: [PATCH v2 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx 
on GR2VR cost

>> The only thing I think we want for the patch (as Pan also raised last time) 
>> is 
>> the param to set those .vx costs to zero in order to ensure the tests test 
>> the 
>> right thing (--param=vx_preferred/gr2vr_cost or something).
>
> I see, shall we start a new series for this? AFAIK, we may need some more 
> alignment
> for something like --param=xx that exposing to end-user.
>
>> According to patchwork the tests you add pass but shouldn't they actually 
>> fail 
>> with a GR2VR cost of 2?  I must be missing something.
>
> For now the cost of GR2VR is 2, take test vx_vadd-1-i64.c for example, the 
> vec_dup + vadd.vv
> has higher cost than vadd.vx, thus perform the late-combine as below.

Ah, I see, thanks.  So vec_dup costs 1 + 2 and vadd.vv costs 1 totalling 4 
while vadd.vx costs 1 + 2, making it cheaper?

IMHO vec_dup should just cost 2 (=GR2VR) rather than 3.  All it does is 
broadcast (no additional operation), while vadd.vx performs the broadcast (cost 
2) as well as an operation (cost 1).  So vec_dup + vadd.vv should cost 3, the 
same as vadd.vx.  In late combine when comparing costs we scale the them by 
"frequency".  The vadd.vx inside the loop should have higher frequency making 
it more costly by default.

With such a change the tests wouldn't pass by default (AFAICT) and we would 
need a --param=xx.  I wouldn't worry about exposing those details to the user 
for now as we're so early in the cycle and can easily iterate on it.  I would
suggest just adding something in order to make the tests work as expected and 
change things later (if needed).

-- 
Regards
 Robin

Reply via email to