Thanks Robin for comments and help. > Yes, at first we must deconstruct all relevant patterns as above for PLUS. > The basic cost for the add is COST_N_INSNS (1) == 4. If one operand is a > VEC_DUPLICATE then we increase the basic cost by GR2VR * COST_N_INSNS (1). > Is > that not sufficient for the combination to not happen?
The below changes will make the late-combine fail to happen, aka vmv(4) + vadd.vv(4) = 8 < plus (v, vec_dup(x)) (4 + GR2VR * 4 = 4 + 8 = 12) + if (riscv_v_ext_mode_p (mode)) + { + cost = 1; + rtx op_0 = XEXP (x, 0); + rtx op_1 = XEXP (x, 1); + + if (GET_CODE (op_0) == VEC_DUPLICATE + || GET_CODE (op_1) == VEC_DUPLICATE) + cost += get_vector_costs ()->regmove->GR2VR; + + *total = COSTS_N_INSNS (cost); + return true; + } But this is not that good enough here if my understanding is correct. As vmv.v.x is somehow equivalent to vec_dup but doesn't ref GR2VR, thus shall we introduce 3 options and enrich the get_vector_costs () instead? Aka --param=vv_alu_cost=? --param=vx_alu_cost=? --param=xv_mov_cost=? And then pick up the value directly from the options, or let the option pollute the get_vector_costs respectively. They should be something similar as blow in riscv_rtx_costs (take add as example). case PLUS: if (op_0 is VEC_DUP or op_1 is VEC_DUP) cost = get_vector_costs ()->regalu->GR2VR; if (op_0 is reg && op_1 is reg) cost = get_vector_costs ()->regalu->VR2VR; case VEC_DUP: cost = get_vector_costs ()->regmove->GR2VR; > If we have > for (...) > { > vmv.vx > vadd.vv > } > then this should be combined even if COST (vmv.vx) + COST (vadd.vv) == COST > (vadd.vx) because we save an instruction and need to perform the broadcast > anyway. > For > vmv.vx > for (...) > { > vadd.vv > } > the combination should not take place (when the costs are equal) because of > the > frequency consideration in late-combine's costing. > When COST (vmv.vx) + COST (vadd.vv) > COST (vadd.vx) it should take place. I see, will enrich the test cases for above two cases. Pan -----Original Message----- From: Robin Dapp <rdapp....@gmail.com> Sent: Monday, April 28, 2025 3:39 PM To: Li, Pan2 <pan2...@intel.com>; Robin Dapp <rdapp....@gmail.com>; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; Chen, Ken <ken.c...@intel.com>; Liu, Hongtao <hongtao....@intel.com>; Robin Dapp <rdapp....@gmail.com> Subject: Re: [PATCH v2 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx on GR2VR cost > Make sense to me, it looks like the combine will always take place if GR2VR > is 0, 1 or 2 for now. > I am try to customize the cost here to make it fail to combine but get failed > with below change. > > + if (rcode == VEC_DUPLICATE && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 0)))) { > + cost_val = 1; > + } > + > + if (rcode == PLUS && riscv_v_ext_mode_p (GET_MODE (XEXP (x, 0))) > + && riscv_v_ext_mode_p (GET_MODE (XEXP (x, 1)))) { > + cost_val = 8; > + } > + > + if (rcode == PLUS && riscv_v_ext_mode_p (GET_MODE (XEXP (x, 0))) > + && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 1)))) { > + cost_val = 2; // never picked up during combine. > + } I think this slightly is too simple (as you also showed) due to COST_N_INSNS (0) == 4. We need to make sure to match the full patterns and then set their costs. There must be three distinct costing paths: vadd.vv, vadd.vx and vmv.vx. > > It takes 8 for original cost as well as replacement(see below combine log). > Thus, it will be always > keep replacement during combine. > > 51 │ trying to combine definition of r135 in: > 52 │ 11: r135:RVVM1DI=vec_duplicate(r150:DI) > 53 │ into: > 54 │ 18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI > 55 │ REG_DEAD r146:RVVM1DI > 56 │ successfully matched this instruction to *add_vx_rvvm1di: > 57 │ (set (reg:RVVM1DI 147 [ vect__6.8_16 ]) > 58 │ (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ])) > 59 │ (reg:RVVM1DI 146))) > 60 │ original cost = 4 + 32 (weighted: 262.469092), replacement cost = 32 > (weighted: 258.909092); keeping replacement > 61 │ rescanning insn with uid = 18. > 62 │ updating insn 18 in-place > 63 │ verify found no changes in insn with uid = 18. > 64 │ deleting insn 11 > 65 │ deleting insn with uid = 11. > > Based on above, I have another try to understand how late-combine leverage > the RTX_COST. > Aka, set vadd v1, (vec_dup(x1)) to 8 and others to 1. > > + if (rcode == PLUS) { > + rtx arg0 = XEXP (x, 0); > + rtx arg1 = XEXP (x, 1); > + > + if (riscv_v_ext_mode_p (GET_MODE (arg1)) > + && GET_CODE (arg0) == VEC_DUPLICATE) { > + cost_val = 8; > + } > + } > > Then the late-combine reject the replacement as expected. Thus, the condition > failed to combine may > Looks like vmv.vx + vadd.vv < vadd.vx here if my understanding is correct. > If so, it will also impact the > --param we would like to introduce, a single --param=gr2vr_cost=XXX is not > good enough to make sure that > the condition is true, we may need --param=vv_cost/vx_cost=XXX. Yes, at first we must deconstruct all relevant patterns as above for PLUS. The basic cost for the add is COST_N_INSNS (1) == 4. If one operand is a VEC_DUPLICATE then we increase the basic cost by GR2VR * COST_N_INSNS (1). Is that not sufficient for the combination to not happen? If we have for (...) { vmv.vx vadd.vv } then this should be combined even if COST (vmv.vx) + COST (vadd.vv) == COST (vadd.vx) because we save an instruction and need to perform the broadcast anyway. For vmv.vx for (...) { vadd.vv } the combination should not take place (when the costs are equal) because of the frequency consideration in late-combine's costing. When COST (vmv.vx) + COST (vadd.vv) > COST (vadd.vx) it should take place. We first need to get these basic building blocks correct before considering something else. > Btw, is there any approach to set the cost attached to the > define_insn_and_split? Which may be more > friendly to catch it from RTX_COST up to a point. > > 51 │ trying to combine definition of r135 in: > 52 │ 11: r135:RVVM1DI=vec_duplicate(r150:DI) > 53 │ into: > 54 │ 18: r147:RVVM1DI=r146:RVVM1DI+r135:RVVM1DI > 55 │ REG_DEAD r146:RVVM1DI > 56 │ successfully matched this instruction to *add_vx_rvvm1di: > 57 │ (set (reg:RVVM1DI 147 [ vect__6.8_16 ]) > 58 │ (plus:RVVM1DI (vec_duplicate:RVVM1DI (reg:DI 150 [ x ])) > 59 │ (reg:RVVM1DI 146))) > 60 │ original cost = 4 + 4 (weighted: 35.923637), replacement cost = 32 > (weighted: 258.909092); rejecting replacement > 61 │ In the end we'll have to capture all patterns (predicated and unpredicated) anyway so just tackling the unsplit ones only helps so much. -- Regards Robin