[Bug target/120157] No use of SVE early break vectorisation in FP loop

ktkachov at gcc dot gnu.org via Gcc-bugs Wed, 07 May 2025 07:07:43 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120157


--- Comment #2 from ktkachov at gcc dot gnu.org ---
(In reply to Tamar Christina from comment #1)
> (In reply to ktkachov from comment #0)
> > Not sure if this is a target-specific issue or not. For input:
> > int f11(float *x, float val, int n)
> > {
> >     int i;
> >     for (i = 0; i < n; i++) {
> >         if (x[i] == val) break;
> >     }
> >     return i;
> > }
> > 
> > GCC can do early-break vectorisation with e.g. -Ofast -mcpu=grace but it
> > always uses a Neon sequence, even if we use a more aggressive SVE core like
> > -mcpu=a64fx. It refuses to do it even with --param
> > aarch64-autovec-preference=sve-only.
> > 
> > Is there some enablement we're missing?
> 
> The loop requires first faulting loads to vectorize with SVE which we don't
> support yet.  In theory peeling for alignment for SVE could work as well but
> there are limitations in which cases we can use it and since the max VL is
> 2048 a single loop iteration can easily load more than a page worth of data.
> 
> So for GCC 15 only *fixed length* SVE can vectorize and for GCC 16 we're
> working on VLA.
> 
> e.g. https://godbolt.org/z/dYc6szWqa

Ah indeed, -msve-vector-bits= does do what I expected. Feel free to close this
if it's not tracking anything new then.

[Bug target/120157] No use of SVE early break vectorisation in FP loop

Reply via email to