https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110223
Bug ID: 110223 Summary: Missed optimization vectorizing booleans comparisons Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: tnfchris at gcc dot gnu.org Target Milestone: --- == truncate before bool float a[1024], b[1024], c[1024], d[1024]; int k[1024]; _Bool res[1024]; int main () { int i; for (i = 0; i < 1024; i++) res[i] = k[i] != ((i - 3) == 0); } vectorizes but does the bit clear before the truncate. Due to the high unroll factor if done the other way around we can save the extra bitclears. == reduce using unpack float a[1024], b[1024], c[1024], d[1024]; _Bool k[1024]; _Bool res[1024]; int main () { int i; for (i = 0; i < 1024; i++) res[i] = k[i] != (i == 0); } Doesn't vectorize as the compiler doesn't know how to compare different boolean vector element sizes. Because i is an integer the result is a V4SI backed boolean type, vs the V16QI one for k[i]. So it has to compare 4 V4SI vectors against 1 V16QI, it can do this by truncating the the 4 V4SI bools to 1 V16QI bool. == mask vs non-mask type _Bool k[1024]; _Bool res[1024]; int main () { char i; for (i = 0; i < 64; i++) res[i] = k[i] != (i == 0); } doesn't vectorize because the compiler doesn't know how to compare a boolean mask vs a non-mask boolean. There's a comment in the source code that this can be done using a pattern (presumably casting the types earlier). in my case I need these to work on gcond as well, not just assigns, and since we don't codegen conds, it might be better to handle them in vectorizable_*.