On 11/22/22 04:08, Richard Biener via Gcc-patches wrote:
On Tue, 22 Nov 2022, Richard Sandiford wrote:
Tamar Christina <tamar.christ...@arm.com> writes:
-----Original Message-----
From: Richard Biener <rguent...@suse.de>
Sent: Tuesday, November 22, 2022 10:59 AM
To: Richard Sandiford <richard.sandif...@arm.com>
Cc: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>; Tamar
Christina <tamar.christ...@arm.com>; Richard Biener
<richard.guent...@gmail.com>; nd <n...@arm.com>
Subject: Re: [PATCH 1/8]middle-end: Recognize scalar reductions from
bitfields and array_refs
On Tue, 22 Nov 2022, Richard Sandiford wrote:
Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
So it's not easily possible the within current infrastructure. But
it does look like ARM might eventually benefit from something like STV
on x86?
I'm not sure. The problem with trying to do this in RTL is that
you'd have to be able to decide from two psuedos whether they come
from extracts that are sequential. When coming in from a hard
register that's easy yes. When coming in from a load, or any other
operation that produces psuedos that becomes harder.
Yeah.
Just in case anyone reading the above is tempted to implement STV for
AArch64: I think it would set a bad precedent if we had a
paste-&-adjust version of the x86 pass. AFAIK, the target
capabilities and constraints are mostly modelled correctly using
existing mechanisms, so I don't think there's anything particularly
target-specific about the process of forcing things to be on the general or
SIMD/FP side.
So if we did have an STV-ish thing for AArch64, I think it should be a
target-independent pass that uses hooks and recog, even if the pass is
initially enabled for AArch64 only.
Agreed - maybe some of the x86 code can be leveraged, but of course the
cost modeling is the most difficult to get right - IIRC the x86 backend resorts
to backend specific tuning flags rather than trying to get rtx_cost or insn_cost
"correct" here.
(FWIW, on the patch itself, I tend to agree that this is really an SLP
optimisation. If the vectoriser fails to see the benefit, or if it
fails to handle more complex cases, then it would be good to try to
fix that.)
Also agreed - but costing is hard ;)
I guess, I still disagree here but I've clearly been out-Richard. The problem
is still
that this is just basic codegen. I still don't think it requires -O2 to be
usable.
So I guess the only correct implementation is to use an STV-like patch. But
given
that this is already the second attempt, first RTL one was rejected by Richard,
second GIMPLE one was rejected by Richi I'd like to get an agreement on this STV
thing before I waste months more..
I don't think this in itself is a good motivation for STV. My comment
above was more about the idea of STV for AArch64 in general (since it
had been raised).
Personally I still think the reduction should be generated in gimple.
I agree, and the proper place to generate the reduction is in SLP.
Sorry to have sent things astray with my earlier ACK. It looked
reasonable to me.
jeff