https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68714
--- Comment #7 from Marc Glisse <glisse at gcc dot gnu.org> ---
I find it strange that we do all operations on masks and not on "booleans" for
vectors.
typedef int T;
T f(T a,T b,T c,T d){
return (a<b)&(c<d);
}
we generate:
_Bool _3;
_Bool _6;
_Bool _7;
T _8;
<bb 2>:
_3 = a_1(D) < b_2(D);
_6 = c_4(D) < d_5(D);
_7 = _3 & _6;
_8 = (T) _7;
return _8;
that is, we are happy to do the bit_and on booleans. However, with
typedef int T __attribute__((vector_size(64)));
we now generate (-mavx512f):
_3 = VEC_COND_EXPR <a_1(D) < b_2(D), { -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
}>;
_6 = VEC_COND_EXPR <c_4(D) < d_5(D), { -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
}>;
_7 = _3 & _6;
return _7;
yielding this code:
vpcmpgtd %zmm0, %zmm1, %k1
vpternlogd $0xFF, %zmm4, %zmm4, %zmm4
vmovdqa32 %zmm4, %zmm0{%k1}{z}
vpcmpgtd %zmm2, %zmm3, %k1
vmovdqa32 %zmm4, %zmm2{%k1}{z}
vpandd %zmm2, %zmm0, %zmm0
We perform the bit_and on the mask type, whereas it would be better to do it on
the boolean type and use 'kandw'. For most platforms, (vec_cnd x -1 0) should
be a NOP so it doesn't really matter, and for the few remaining (AVX512 and
Sparc IIRC) we want to use "booleans" as much as possible and only convert to a
mask late. I think that implies that we should pull operations on masks into
operations on booleans (as in the original patch in comment #1 maybe, plus
canonicalizing (vec_cnd x 0 -1)), and probably that forwarding conditions into
the first argument of vec_cond should only be done late (around expand).
But it is quite possible that my intuition is completely bogus here.