[Bug target/120157] No use of SVE early break vectorisation in FP loop

tnfchris at gcc dot gnu.org via Gcc-bugs Wed, 07 May 2025 06:53:06 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120157


--- Comment #1 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to ktkachov from comment #0)
> Not sure if this is a target-specific issue or not. For input:
> int f11(float *x, float val, int n)
> {
>     int i;
>     for (i = 0; i < n; i++) {
>         if (x[i] == val) break;
>     }
>     return i;
> }
> 
> GCC can do early-break vectorisation with e.g. -Ofast -mcpu=grace but it
> always uses a Neon sequence, even if we use a more aggressive SVE core like
> -mcpu=a64fx. It refuses to do it even with --param
> aarch64-autovec-preference=sve-only.
> 
> Is there some enablement we're missing?

The loop requires first faulting loads to vectorize with SVE which we don't
support yet.  In theory peeling for alignment for SVE could work as well but
there are limitations in which cases we can use it and since the max VL is 2048
a single loop iteration can easily load more than a page worth of data.

So for GCC 15 only *fixed length* SVE can vectorize and for GCC 16 we're
working on VLA.

e.g. https://godbolt.org/z/dYc6szWqa

[Bug target/120157] No use of SVE early break vectorisation in FP loop

Reply via email to