On 3/1/24 12:48 AM, 钟居哲 wrote:
Hi, han. I understand you are trying to support optimize vector-
splat_vector into vector-scalar in "expand" stage, that is,
vv -> vx or vv -> vf.
It's a known issue that we know for a long time.
This patch is trying to transform vv->vf when the splat vector is
duplicate from a constant (by recognize it is a CONST_VECTOR in expand
stage),
but can't transform vv->vf when splat vector is duplicate from a register.
For example, like a[i] = b[i] > x ? c[i] : d[i], the x is a register,
this case can not be optimized with your patch.
Actually, we have a solution to do all possible transformation
(including the case I mentioned above) from vv to vx or vf by late-
combine PASS which
is contributed by ARM Richard Sandiford: https://patchwork.ozlabs.org/
project/gcc/patch/mptr0ljn9eh....@arm.com/
You can try to apply this patch and experiment it locally yourself.
And I believe it will be landed in GCC-15. So I don't think we need this
patch to do the optimization.
And FWIW, the late-combine patch landed a month or so ago. So in theory
this should be working now.
One thing Robin and I were discussing last week was that eliminating the
vector broadcast may not actually be a good thing to do all the time.
It's fairly common to have a penalty for accessing operands from a
different register file. So if we originally had the broadcast outside
the loop and .vv forms in the loop and late-combine changes things so
that we drop the broadcast but use a .vx or .vf form in the loop we may
actually get worse performance.
That's obviously going to be uarch specific, but something to keep in mind.
Jeff