Thanks Robin for comments.

> I think we concluded a while ago that we don't want this turned on 
> universally.
> For the example/tests you provide it will be a de-optimization on any uarch
> that has non-zero GPR -> VR latency

I see, didn't aware of that. I am not sure if we need to consider vsetvl here?
As there are extra 2 insn here.

  10   │ test_binary_vx_add:
  11   │     beq a3,zero,.L8
  12   │     vsetvli a5,zero,e32,m1,ta,ma // eliminated
  13   │     vmv.v.x v2,a2                // Ditto.
  14   │     slli    a3,a3,32
  15   │     srli    a3,a3,32
  16   │ .L3:
  17   │     vsetvli a5,a3,e32,m1,ta,ma
  18   │     vle32.v v1,0(a1)

> So at least we need to define RTL costs for the combined variant and make them
> depend on the VR <-> GPR costs (so we don't do this if the latency/cost is >
> 0).

I see, need to consider the cost here. Any example I can reference? Sorry I 
haven't touch cost model in previous.

> Does the optimization happen in combine or late-combine BTW?  I thought
> late-combine because we need to look through the unary op (vec_duplicate).

Yes, you are right. It is 302r.late_combine1 pass.

Pan

-----Original Message-----
From: Robin Dapp <rdapp....@gmail.com> 
Sent: Wednesday, November 27, 2024 8:48 PM
To: Li, Pan2 <pan2...@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; Robin 
Dapp <rdapp....@gmail.com>
Subject: Re: [PATCH v1 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx

> This patch would like to combine the vec_duplicate + vadd.vv to the
> vadd.vx.  From example as below:

I think we concluded a while ago that we don't want this turned on universally.
For the example/tests you provide it will be a de-optimization on any uarch
that has non-zero GPR -> VR latency.

So at least we need to define RTL costs for the combined variant and make them
depend on the VR <-> GPR costs (so we don't do this if the latency/cost is >
0).

 

-- 
Regards
 Robin

Reply via email to