[Bug target/120157] No use of SVE early break vectorisation in FP loop

ktkachov at gcc dot gnu.org via Gcc-bugs Wed, 07 May 2025 07:07:50 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120157


--- Comment #4 from ktkachov at gcc dot gnu.org ---
(In reply to ktkachov from comment #2)
> (In reply to Tamar Christina from comment #1)
> > (In reply to ktkachov from comment #0)
> > > Not sure if this is a target-specific issue or not. For input:
> > > int f11(float *x, float val, int n)
> > > {
> > >     int i;
> > >     for (i = 0; i < n; i++) {
> > >         if (x[i] == val) break;
> > >     }
> > >     return i;
> > > }
> > > 
> > > GCC can do early-break vectorisation with e.g. -Ofast -mcpu=grace but it
> > > always uses a Neon sequence, even if we use a more aggressive SVE core 
> > > like
> > > -mcpu=a64fx. It refuses to do it even with --param
> > > aarch64-autovec-preference=sve-only.
> > > 
> > > Is there some enablement we're missing?
> > 
> > The loop requires first faulting loads to vectorize with SVE which we don't
> > support yet.  In theory peeling for alignment for SVE could work as well but
> > there are limitations in which cases we can use it and since the max VL is
> > 2048 a single loop iteration can easily load more than a page worth of data.
> > 
> > So for GCC 15 only *fixed length* SVE can vectorize and for GCC 16 we're
> > working on VLA.
> > 
> > e.g. https://godbolt.org/z/dYc6szWqa
> 
> Ah indeed, -msve-vector-bits= does do what I expected. Feel free to close
> this if it's not tracking anything new then.

Ok. FWIW the original testcase for me had doubles:
int f11(double *x, double val, int n)
{
    int i;
    for (i = 0; i < n; i++) {
        if (x[i] == val) break;
    }
    return i;
}

And with -msve-vector-bits=128 -mcpu=neoverse-v2  --param
aarch64-autovec-preference=sve-only GCC refuses to vectorise and picks Neon
without the aarch64-autovec-preference. I do see it vectorising with VLS SVE
for wider widths, so it may be a V2 cost model thing.
If choosing Neon is the right thing to do for V2 that's fine, but with --param
aarch64-autovec-preference=sve-only it should probably use SVE rather than
refusing to vectorise

[Bug target/120157] No use of SVE early break vectorisation in FP loop

Reply via email to