https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111115

            Bug ID: 111115
           Summary: Failure to vectorize conditional grouped store
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

void foo (float * __restrict x, int *flag)
{
  for (int i = 0; i < 512; ++i)
    {
      if (flag[i])
        {
          float a = x[2*i+0] + 3.f;
          float b = x[2*i+1] + 177.f;
          x[2*i+0] = a;
          x[2*i+1] = b;
        }
    }
}

fails to vectorize on x86_64 with -march=znver4 (it needs masked stores
enabled by tuning).  This is because we do not support VMAT_CONTIGUOUS_PERMUTE
for either .MASK_LOAD nor .MASK_STORE.  Simply enabling that shows we fail
to properly handle the mask part.

The proper solution is to handle them in SLP which they are not either.

Reply via email to