https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110223
Bug ID: 110223
Summary: Missed optimization vectorizing booleans comparisons
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: tnfchris at gcc dot gnu.org
Target Milestone: ---
== truncate before bool
float a[1024], b[1024], c[1024], d[1024];
int k[1024];
_Bool res[1024];
int main ()
{
int i;
for (i = 0; i < 1024; i++)
res[i] = k[i] != ((i - 3) == 0);
}
vectorizes but does the bit clear before the truncate. Due to the high unroll
factor if done the other way around we can save the extra bitclears.
== reduce using unpack
float a[1024], b[1024], c[1024], d[1024];
_Bool k[1024];
_Bool res[1024];
int main ()
{
int i;
for (i = 0; i < 1024; i++)
res[i] = k[i] != (i == 0);
}
Doesn't vectorize as the compiler doesn't know how to compare different boolean
vector element sizes. Because i is an integer the result is a V4SI backed
boolean type, vs the V16QI one for k[i]. So it has to compare 4 V4SI vectors
against 1 V16QI, it can do this by truncating the the 4 V4SI bools to 1 V16QI
bool.
== mask vs non-mask type
_Bool k[1024];
_Bool res[1024];
int main ()
{
char i;
for (i = 0; i < 64; i++)
res[i] = k[i] != (i == 0);
}
doesn't vectorize because the compiler doesn't know how to compare a boolean
mask vs a non-mask boolean. There's a comment in the source code that this can
be done using a pattern (presumably casting the types earlier).
in my case I need these to work on gcond as well, not just assigns, and since
we don't codegen conds, it might be better to handle them in vectorizable_*.