On Tue, 22 Nov 2022, Richard Sandiford wrote:

> Tamar Christina <tamar.christ...@arm.com> writes:
> >> -----Original Message-----
> >> From: Richard Biener <rguent...@suse.de>
> >> Sent: Tuesday, November 22, 2022 10:59 AM
> >> To: Richard Sandiford <richard.sandif...@arm.com>
> >> Cc: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>; Tamar
> >> Christina <tamar.christ...@arm.com>; Richard Biener
> >> <richard.guent...@gmail.com>; nd <n...@arm.com>
> >> Subject: Re: [PATCH 1/8]middle-end: Recognize scalar reductions from
> >> bitfields and array_refs
> >>
> >> On Tue, 22 Nov 2022, Richard Sandiford wrote:
> >>
> >> > Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> >> > >> So it's not easily possible the within current infrastructure.  But
> >> > >> it does look like ARM might eventually benefit from something like STV
> >> on x86?
> >> > >>
> >> > >
> >> > > I'm not sure.  The problem with trying to do this in RTL is that
> >> > > you'd have to be able to decide from two psuedos whether they come
> >> > > from extracts that are sequential. When coming in from a hard
> >> > > register that's easy yes.  When coming in from a load, or any other
> >> operation that produces psuedos that becomes harder.
> >> >
> >> > Yeah.
> >> >
> >> > Just in case anyone reading the above is tempted to implement STV for
> >> > AArch64: I think it would set a bad precedent if we had a
> >> > paste-&-adjust version of the x86 pass.  AFAIK, the target
> >> > capabilities and constraints are mostly modelled correctly using
> >> > existing mechanisms, so I don't think there's anything particularly
> >> > target-specific about the process of forcing things to be on the general 
> >> > or
> >> SIMD/FP side.
> >> >
> >> > So if we did have an STV-ish thing for AArch64, I think it should be a
> >> > target-independent pass that uses hooks and recog, even if the pass is
> >> > initially enabled for AArch64 only.
> >>
> >> Agreed - maybe some of the x86 code can be leveraged, but of course the
> >> cost modeling is the most difficult to get right - IIRC the x86 backend 
> >> resorts
> >> to backend specific tuning flags rather than trying to get rtx_cost or 
> >> insn_cost
> >> "correct" here.
> >>
> >> > (FWIW, on the patch itself, I tend to agree that this is really an SLP
> >> > optimisation.  If the vectoriser fails to see the benefit, or if it
> >> > fails to handle more complex cases, then it would be good to try to
> >> > fix that.)
> >>
> >> Also agreed - but costing is hard ;)
> >
> > I guess, I still disagree here but I've clearly been out-Richard.  The 
> > problem is still
> > that this is just basic codegen.  I still don't think it requires -O2 to be 
> > usable.
> >
> > So I guess the only correct implementation is to use an STV-like patch.  
> > But given
> > that this is already the second attempt, first RTL one was rejected by 
> > Richard,
> > second GIMPLE one was rejected by Richi I'd like to get an agreement on 
> > this STV
> > thing before I waste months more..
> 
> I don't think this in itself is a good motivation for STV.  My comment
> above was more about the idea of STV for AArch64 in general (since it
> had been raised).
> 
> Personally I still think the reduction should be generated in gimple.

I agree, and the proper place to generate the reduction is in SLP.

Richard.

Reply via email to