Thanks Robin for comments. > I think we concluded a while ago that we don't want this turned on > universally. > For the example/tests you provide it will be a de-optimization on any uarch > that has non-zero GPR -> VR latency
I see, didn't aware of that. I am not sure if we need to consider vsetvl here? As there are extra 2 insn here. 10 │ test_binary_vx_add: 11 │ beq a3,zero,.L8 12 │ vsetvli a5,zero,e32,m1,ta,ma // eliminated 13 │ vmv.v.x v2,a2 // Ditto. 14 │ slli a3,a3,32 15 │ srli a3,a3,32 16 │ .L3: 17 │ vsetvli a5,a3,e32,m1,ta,ma 18 │ vle32.v v1,0(a1) > So at least we need to define RTL costs for the combined variant and make them > depend on the VR <-> GPR costs (so we don't do this if the latency/cost is > > 0). I see, need to consider the cost here. Any example I can reference? Sorry I haven't touch cost model in previous. > Does the optimization happen in combine or late-combine BTW? I thought > late-combine because we need to look through the unary op (vec_duplicate). Yes, you are right. It is 302r.late_combine1 pass. Pan -----Original Message----- From: Robin Dapp <rdapp....@gmail.com> Sent: Wednesday, November 27, 2024 8:48 PM To: Li, Pan2 <pan2...@intel.com>; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; Robin Dapp <rdapp....@gmail.com> Subject: Re: [PATCH v1 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx > This patch would like to combine the vec_duplicate + vadd.vv to the > vadd.vx. From example as below: I think we concluded a while ago that we don't want this turned on universally. For the example/tests you provide it will be a de-optimization on any uarch that has non-zero GPR -> VR latency. So at least we need to define RTL costs for the combined variant and make them depend on the VR <-> GPR costs (so we don't do this if the latency/cost is > 0). -- Regards Robin