https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120157
--- Comment #4 from ktkachov at gcc dot gnu.org --- (In reply to ktkachov from comment #2) > (In reply to Tamar Christina from comment #1) > > (In reply to ktkachov from comment #0) > > > Not sure if this is a target-specific issue or not. For input: > > > int f11(float *x, float val, int n) > > > { > > > int i; > > > for (i = 0; i < n; i++) { > > > if (x[i] == val) break; > > > } > > > return i; > > > } > > > > > > GCC can do early-break vectorisation with e.g. -Ofast -mcpu=grace but it > > > always uses a Neon sequence, even if we use a more aggressive SVE core > > > like > > > -mcpu=a64fx. It refuses to do it even with --param > > > aarch64-autovec-preference=sve-only. > > > > > > Is there some enablement we're missing? > > > > The loop requires first faulting loads to vectorize with SVE which we don't > > support yet. In theory peeling for alignment for SVE could work as well but > > there are limitations in which cases we can use it and since the max VL is > > 2048 a single loop iteration can easily load more than a page worth of data. > > > > So for GCC 15 only *fixed length* SVE can vectorize and for GCC 16 we're > > working on VLA. > > > > e.g. https://godbolt.org/z/dYc6szWqa > > Ah indeed, -msve-vector-bits= does do what I expected. Feel free to close > this if it's not tracking anything new then. Ok. FWIW the original testcase for me had doubles: int f11(double *x, double val, int n) { int i; for (i = 0; i < n; i++) { if (x[i] == val) break; } return i; } And with -msve-vector-bits=128 -mcpu=neoverse-v2 --param aarch64-autovec-preference=sve-only GCC refuses to vectorise and picks Neon without the aarch64-autovec-preference. I do see it vectorising with VLS SVE for wider widths, so it may be a V2 cost model thing. If choosing Neon is the right thing to do for V2 that's fine, but with --param aarch64-autovec-preference=sve-only it should probably use SVE rather than refusing to vectorise