Thanks Robin for help. > as I suggested initializes total with an estimate of the mode size (total = 8 > for me) before we get to riscv_rtx_cost. This makes the rest of the > costs (which we assume to be relative to 4) inaccurate.
I see, that explains how cost value 8 comes from. > Then we should perform the combination for GR2VR == 0 and not for GR2VR > 0. Yes, that is correct, will resend the v3 within this change. Pan -----Original Message----- From: Robin Dapp <rdapp....@gmail.com> Sent: Tuesday, April 29, 2025 9:47 PM To: Li, Pan2 <pan2...@intel.com>; Robin Dapp <rdapp....@gmail.com>; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; Chen, Ken <ken.c...@intel.com>; Liu, Hongtao <hongtao....@intel.com>; Robin Dapp <rdapp....@gmail.com> Subject: Re: [PATCH v2 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx on GR2VR cost > I see, let the vec_dup enter the rtx_cost again to append the total to vmv, I > have a try testing. For example with below change: > > + switch (rcode) > + { > + case VEC_DUPLICATE: > + *total += get_vector_costs ()->regmove->GR2VR * COSTS_N_INSNS > (1); > + break; > + case PLUS: > + { > + rtx op_0 = XEXP (x, 0); + rtx op_1 = XEXP (x, 1); > + if (GET_CODE (op_0) == VEC_DUPLICATE > + || GET_CODE (op_1) == VEC_DUPLICATE) > + *total += get_vector_costs ()->regmove->GR2VR * COSTS_N_INSNS > (1); > + else > + *total = COSTS_N_INSNS (1); > + break; > + } > + default: > + *total = COSTS_N_INSNS (1); > + break; > + } > + > + return true; > > For case_0, GR2VR is 0, we will have late-combine as blow: > 51 │ trying to combine definition of r135 in: > 52 │ 11: r135:RVVM1SI=vec_duplicate(r150:DI#0) > 53 │ into: > 54 │ 18: r147:RVVM1SI=r146:RVVM1SI+r135:RVVM1SI > 55 │ REG_DEAD r146:RVVM1SI > 56 │ successfully matched this instruction to *add_vx_rvvm1si: > 57 │ (set (reg:RVVM1SI 147 [ vect__6.8_16 ]) > 58 │ (plus:RVVM1SI (vec_duplicate:RVVM1SI (subreg/s/u:SI (reg:DI 150 > [ x ]) 0)) > 59 │ (reg:RVVM1SI 146))) > 60 │ original cost = 8 + 4 (weighted: 39.483637), replacement cost = 8 > (weighted: 64.727273); rejecting replacement > > > The vadd v, vec_dup(x) seems has the same cost as vec_dup here. I am also > confused about the how we calculate the > vadd v, vec_dup(x), can we just set its' cost to vadd.vx? given we have > define_insn_and_split to match the pattern and > emit the vadd.vx directly. And it matches the expr we mentioned vadd.vv + vec > == vadd.vx. > Please help to correct me if misunderstanding. Yes, that doesn't look quite correct yet. I think the issue is that using *total += get_vector_costs ()->regmove->GR2VR * COSTS_N_INSNS (1); as I suggested initializes total with an estimate of the mode size (total = 8 for me) before we get to riscv_rtx_cost. This makes the rest of the costs (which we assume to be relative to 4) inaccurate. So try *total = get_vector_costs ()->regmove->GR2VR * COSTS_N_INSNS (1); for the vec_dup case and *total = COST_N_INSNS (1) + get_vector_costs ()->regmove->GR2VR * COSTS_N_INSNS (1); for the vx case. Then we should perform the combination for GR2VR == 0 and not for GR2VR > 0.