On Tue, 22 Nov 2022, Richard Sandiford wrote: > Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > >> So it's not easily possible the within current infrastructure. But it > >> does look > >> like ARM might eventually benefit from something like STV on x86? > >> > > > > I'm not sure. The problem with trying to do this in RTL is that you'd have > > to be > > able to decide from two psuedos whether they come from extracts that are > > sequential. When coming in from a hard register that's easy yes. When > > coming in > > from a load, or any other operation that produces psuedos that becomes > > harder. > > Yeah. > > Just in case anyone reading the above is tempted to implement STV for > AArch64: I think it would set a bad precedent if we had a paste-&-adjust > version of the x86 pass. AFAIK, the target capabilities and constraints > are mostly modelled correctly using existing mechanisms, so I don't > think there's anything particularly target-specific about the process > of forcing things to be on the general or SIMD/FP side. > > So if we did have an STV-ish thing for AArch64, I think it should be > a target-independent pass that uses hooks and recog, even if the pass > is initially enabled for AArch64 only.
Agreed - maybe some of the x86 code can be leveraged, but of course the cost modeling is the most difficult to get right - IIRC the x86 backend resorts to backend specific tuning flags rather than trying to get rtx_cost or insn_cost "correct" here. > (FWIW, on the patch itself, I tend to agree that this is really an > SLP optimisation. If the vectoriser fails to see the benefit, or if > it fails to handle more complex cases, then it would be good to try > to fix that.) Also agreed - but costing is hard ;) Richard.