https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110223

            Bug ID: 110223
           Summary: Missed optimization vectorizing booleans comparisons
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---

== truncate before bool

float a[1024], b[1024], c[1024], d[1024];
int k[1024];
_Bool res[1024];

int main ()
{
  int i;
  for (i = 0; i < 1024; i++)
    res[i] = k[i] != ((i - 3) == 0);
}

vectorizes but does the bit clear before the truncate. Due to the high unroll
factor if done the other way around we can save the extra bitclears.

== reduce using unpack

float a[1024], b[1024], c[1024], d[1024];
_Bool k[1024];
_Bool res[1024];

int main ()
{
  int i;
  for (i = 0; i < 1024; i++)
    res[i] = k[i] != (i == 0);
}

Doesn't vectorize as the compiler doesn't know how to compare different boolean
vector element sizes.  Because i is an integer the result is a V4SI backed
boolean type, vs the V16QI one for k[i].  So it has to compare 4 V4SI vectors
against 1 V16QI, it can do this by truncating the the 4 V4SI bools to 1 V16QI
bool.

== mask vs non-mask type

_Bool k[1024];
_Bool res[1024];

int main ()
{
  char i;
  for (i = 0; i < 64; i++)
    res[i] = k[i] != (i == 0);
}

doesn't vectorize because the compiler doesn't know how to compare a boolean
mask vs a non-mask boolean.  There's a comment in the source code that this can
be done using a pattern (presumably casting the types earlier).

in my case I need these to work on gcond as well, not just assigns,  and since
we don't codegen conds, it might be better to handle them in vectorizable_*.

Reply via email to