On 11/27/24 5:48 AM, Robin Dapp wrote:
This patch would like to combine the vec_duplicate + vadd.vv to the
vadd.vx.  From example as below:

I think we concluded a while ago that we don't want this turned on universally.
For the example/tests you provide it will be a de-optimization on any uarch
that has non-zero GPR -> VR latency.

So at least we need to define RTL costs for the combined variant and make them
depend on the VR <-> GPR costs (so we don't do this if the latency/cost is >
0).

Does the optimization happen in combine or late-combine BTW?  I thought
late-combine because we need to look through the unary op (vec_duplicate).
Yea, I think that's the general agreement. Essentially realizing that there may be varying costs for accessing GPR or FPR data in the vector unit depending on the uarch.

Also note this isn't a bugfix and so it ought to be a gcc-16 thing.

Jeff

Reply via email to