> Ah, I see, thanks. So vec_dup costs 1 + 2 and vadd.vv costs 1 totalling 4 > while vadd.vx costs 1 + 2, making it cheaper?
Yes, looks we need to just assign the GR2VR when vec_dup. I also tried diff cost here to see the impact to late-combine. + if (rcode == VEC_DUPLICATE && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 0)))) { + cost_val = get_vector_costs ()->regmove->GR2VR; + } ---- cut line ---- If GR2VR is 2, we will perform the combine as below. 51 trying to combine definition of r135 in: 52 11: r135:RVVM1DI=vec_duplicate(r150:DI) 53 into: 54 18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI 55 REG_DEAD r146:RVVM1DI 56 successfully matched this instruction to *add_vx_rvvm1di: 57 (set (reg:RVVM1DI 147 [ vect__6.8_16 ]) 58 (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ])) 59 (reg:RVVM1DI 146))) 60 original cost = 8 + 4 (weighted: 39.483637), replacement cost = 4 (weighted: 32.363637); keeping replacement 61 rescanning insn with uid = 18. 62 updating insn 18 in-place 63 verify found no changes in insn with uid = 18. 64 deleting insn 11 65 deleting insn with uid = 11. ---- cut line ---- If GR2VR is 1, we will perform the combine as below. 51 │ trying to combine definition of r135 in: 52 │ 11: r135:RVVM1DI=vec_duplicate(r150:DI) 53 │ into: 54 │ 18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI 55 │ REG_DEAD r146:RVVM1DI 56 │ successfully matched this instruction to *add_vx_rvvm1di: 57 │ (set (reg:RVVM1DI 147 [ vect__6.8_16 ]) 58 │ (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ])) 59 │ (reg:RVVM1DI 146))) 60 │ original cost = 4 + 4 (weighted: 35.923637), replacement cost = 4 (weighted: 32.363637); keeping replacement 61 │ rescanning insn with uid = 18. 62 │ updating insn 18 in-place 63 │ verify found no changes in insn with uid = 18. 64 │ deleting insn 11 65 │ deleting insn with uid = 11. ---- cut line ---- If GR2VR is 0, it will be normalized to 1 as below, thus the combine log looks like the same as above. return cost > 0 ? cost : COSTS_N_INSNS (1); gcc/rtlanal.cc:5766 it looks like we need to reconcile the vadd.vv cost either here ? Or am I missing something here. > With such a change the tests wouldn't pass by default (AFAICT) and we would > need a --param=xx. I wouldn't worry about exposing those details to the user > for now as we're so early in the cycle and can easily iterate on it. I would > suggest just adding something in order to make the tests work as expected and > change things later (if needed). I see, will append the --param patch into the series. Pan -----Original Message----- From: Robin Dapp <rdapp....@gmail.com> Sent: Wednesday, April 23, 2025 3:01 PM To: Li, Pan2 <pan2...@intel.com>; Robin Dapp <rdapp....@gmail.com>; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; Chen, Ken <ken.c...@intel.com>; Liu, Hongtao <hongtao....@intel.com>; Robin Dapp <rdapp....@gmail.com> Subject: Re: [PATCH v2 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx on GR2VR cost >> The only thing I think we want for the patch (as Pan also raised last time) >> is >> the param to set those .vx costs to zero in order to ensure the tests test >> the >> right thing (--param=vx_preferred/gr2vr_cost or something). > > I see, shall we start a new series for this? AFAIK, we may need some more > alignment > for something like --param=xx that exposing to end-user. > >> According to patchwork the tests you add pass but shouldn't they actually >> fail >> with a GR2VR cost of 2? I must be missing something. > > For now the cost of GR2VR is 2, take test vx_vadd-1-i64.c for example, the > vec_dup + vadd.vv > has higher cost than vadd.vx, thus perform the late-combine as below. Ah, I see, thanks. So vec_dup costs 1 + 2 and vadd.vv costs 1 totalling 4 while vadd.vx costs 1 + 2, making it cheaper? IMHO vec_dup should just cost 2 (=GR2VR) rather than 3. All it does is broadcast (no additional operation), while vadd.vx performs the broadcast (cost 2) as well as an operation (cost 1). So vec_dup + vadd.vv should cost 3, the same as vadd.vx. In late combine when comparing costs we scale the them by "frequency". The vadd.vx inside the loop should have higher frequency making it more costly by default. With such a change the tests wouldn't pass by default (AFAICT) and we would need a --param=xx. I wouldn't worry about exposing those details to the user for now as we're so early in the cycle and can easily iterate on it. I would suggest just adding something in order to make the tests work as expected and change things later (if needed). -- Regards Robin