On 3/1/24 12:48 AM, 钟居哲 wrote:
Hi, han. I understand you are trying to support optimize vector- splat_vector into vector-scalar in "expand" stage, that is,

vv -> vx or vv -> vf.

It's a known issue that we know for a long time.

This patch is trying to transform vv->vf when the splat vector is duplicate from a constant (by recognize it is a CONST_VECTOR in expand stage),
but can't transform vv->vf when splat vector is duplicate from a register.

For example, like a[i] = b[i] > x ? c[i] : d[i], the x is a register, this case can not be optimized with your patch.

Actually, we have a solution to do all possible transformation (including the case I mentioned above) from vv to vx or vf by late- combine PASS which is contributed by ARM Richard Sandiford: https://patchwork.ozlabs.org/ project/gcc/patch/mptr0ljn9eh....@arm.com/
You can try to apply this patch and experiment it locally yourself.

And I believe it will be landed in GCC-15. So I don't think we need this patch to do the optimization.
And FWIW, the late-combine patch landed a month or so ago. So in theory this should be working now.

One thing Robin and I were discussing last week was that eliminating the vector broadcast may not actually be a good thing to do all the time.

It's fairly common to have a penalty for accessing operands from a different register file. So if we originally had the broadcast outside the loop and .vv forms in the loop and late-combine changes things so that we drop the broadcast but use a .vx or .vf form in the loop we may actually get worse performance.

That's obviously going to be uarch specific, but something to keep in mind.


Jeff

Reply via email to