> -----Original Message----- > From: Richard Biener <rguent...@suse.de> > Sent: Tuesday, November 22, 2022 10:59 AM > To: Richard Sandiford <richard.sandif...@arm.com> > Cc: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>; Tamar > Christina <tamar.christ...@arm.com>; Richard Biener > <richard.guent...@gmail.com>; nd <n...@arm.com> > Subject: Re: [PATCH 1/8]middle-end: Recognize scalar reductions from > bitfields and array_refs > > On Tue, 22 Nov 2022, Richard Sandiford wrote: > > > Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > > >> So it's not easily possible the within current infrastructure. But > > >> it does look like ARM might eventually benefit from something like STV > on x86? > > >> > > > > > > I'm not sure. The problem with trying to do this in RTL is that > > > you'd have to be able to decide from two psuedos whether they come > > > from extracts that are sequential. When coming in from a hard > > > register that's easy yes. When coming in from a load, or any other > operation that produces psuedos that becomes harder. > > > > Yeah. > > > > Just in case anyone reading the above is tempted to implement STV for > > AArch64: I think it would set a bad precedent if we had a > > paste-&-adjust version of the x86 pass. AFAIK, the target > > capabilities and constraints are mostly modelled correctly using > > existing mechanisms, so I don't think there's anything particularly > > target-specific about the process of forcing things to be on the general or > SIMD/FP side. > > > > So if we did have an STV-ish thing for AArch64, I think it should be a > > target-independent pass that uses hooks and recog, even if the pass is > > initially enabled for AArch64 only. > > Agreed - maybe some of the x86 code can be leveraged, but of course the > cost modeling is the most difficult to get right - IIRC the x86 backend > resorts > to backend specific tuning flags rather than trying to get rtx_cost or > insn_cost > "correct" here. > > > (FWIW, on the patch itself, I tend to agree that this is really an SLP > > optimisation. If the vectoriser fails to see the benefit, or if it > > fails to handle more complex cases, then it would be good to try to > > fix that.) > > Also agreed - but costing is hard ;)
I guess, I still disagree here but I've clearly been out-Richard. The problem is still that this is just basic codegen. I still don't think it requires -O2 to be usable. So I guess the only correct implementation is to use an STV-like patch. But given that this is already the second attempt, first RTL one was rejected by Richard, second GIMPLE one was rejected by Richi I'd like to get an agreement on this STV thing before I waste months more.. Thanks, Tamar > > Richard.