> This patch would like to combine the vec_duplicate + vadd.vv to the > vadd.vx. From example as below:
I think we concluded a while ago that we don't want this turned on universally. For the example/tests you provide it will be a de-optimization on any uarch that has non-zero GPR -> VR latency. So at least we need to define RTL costs for the combined variant and make them depend on the VR <-> GPR costs (so we don't do this if the latency/cost is > 0). Does the optimization happen in combine or late-combine BTW? I thought late-combine because we need to look through the unary op (vec_duplicate). -- Regards Robin