https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111115
Bug ID: 111115 Summary: Failure to vectorize conditional grouped store Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- void foo (float * __restrict x, int *flag) { for (int i = 0; i < 512; ++i) { if (flag[i]) { float a = x[2*i+0] + 3.f; float b = x[2*i+1] + 177.f; x[2*i+0] = a; x[2*i+1] = b; } } } fails to vectorize on x86_64 with -march=znver4 (it needs masked stores enabled by tuning). This is because we do not support VMAT_CONTIGUOUS_PERMUTE for either .MASK_LOAD nor .MASK_STORE. Simply enabling that shows we fail to properly handle the mask part. The proper solution is to handle them in SLP which they are not either.