I see, let the vec_dup enter the rtx_cost again to append the total to vmv, I have a try testing. For example with below change:

+       switch (rcode)
+       {
+         case VEC_DUPLICATE:
+ *total += get_vector_costs ()->regmove->GR2VR * COSTS_N_INSNS (1);
+           break;
+         case PLUS:
+           {
+           rtx op_0 = XEXP (x, 0);     +           rtx op_1 = XEXP (x, 1);
+           if (GET_CODE (op_0) == VEC_DUPLICATE
+               || GET_CODE (op_1) == VEC_DUPLICATE)
+ *total += get_vector_costs ()->regmove->GR2VR * COSTS_N_INSNS (1);
+           else
+             *total = COSTS_N_INSNS (1);
+           break;
+           }
+         default:
+           *total = COSTS_N_INSNS (1);
+           break;
+       }
+
+       return true;

For case_0, GR2VR is 0, we will have late-combine as blow:
  51   │ trying to combine definition of r135 in:
  52   │    11: r135:RVVM1SI=vec_duplicate(r150:DI#0)
  53   │ into:
  54   │    18: r147:RVVM1SI=r146:RVVM1SI+r135:RVVM1SI
  55   │       REG_DEAD r146:RVVM1SI
  56   │ successfully matched this instruction to *add_vx_rvvm1si:
  57   │ (set (reg:RVVM1SI 147 [ vect__6.8_16 ])
58 │ (plus:RVVM1SI (vec_duplicate:RVVM1SI (subreg/s/u:SI (reg:DI 150 [ x ]) 0))
  59   │         (reg:RVVM1SI 146)))
60 │ original cost = 8 + 4 (weighted: 39.483637), replacement cost = 8 (weighted: 64.727273); rejecting replacement


The vadd v, vec_dup(x) seems has the same cost as vec_dup here. I am also confused about the how we calculate the vadd v, vec_dup(x), can we just set its' cost to vadd.vx? given we have define_insn_and_split to match the pattern and emit the vadd.vx directly. And it matches the expr we mentioned vadd.vv + vec == vadd.vx.
Please help to correct me if misunderstanding.

Yes, that doesn't look quite correct yet.
I think the issue is that using

 *total += get_vector_costs ()->regmove->GR2VR * COSTS_N_INSNS (1);

as I suggested initializes total with an estimate of the mode size (total = 8 for me) before we get to riscv_rtx_cost. This makes the rest of the
costs (which we assume to be relative to 4) inaccurate.

So try
 *total = get_vector_costs ()->regmove->GR2VR * COSTS_N_INSNS (1);
for the vec_dup case and
*total = COST_N_INSNS (1) + get_vector_costs ()->regmove->GR2VR * COSTS_N_INSNS (1);
for the vx case.

Then we should perform the combination for GR2VR == 0 and not for GR2VR > 0.

Reply via email to