https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115843
--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
Hmm, interesting. We even vectorize this with just -mavx512f but end up
using vector(16) int besides vector(8) long and equality compares of
vector(16) int:
vpcmpd $0, %zmm7, %zmm0, %k2
according to docs that's fine with AVX512F. But then for both long and double
you need byte masks so I wonder why kmovb isn't in AVX512F ...
I will adjust the testcase to use only AVX512F and push the fix now. I can't
reproduce the runfail in a different worktree.
Note I don't see all-zero masks but
vect_patt_22.11_6 = .MASK_LOAD (&MEM <BITBOARD[64]> [(void *)&KingSafetyMask1
+ 8B], 64B, { -1, 0, 0, 0, 0, 0, 0, 0 });
could be optimized to movq $mem, %zmmN (just a single or just a power-of-two
number of initial elements read). Not sure if the corresponding
vect_patt_20.17_34 = .MASK_LOAD (&MEM <BITBOARD[64]> [(void
*)&KingSafetyMask1 + -8B], 64B, { 0, 0, 0, 0, 0, 0, 0, -1 });
is worth optimizing to xor %zmmN, %zmmN and pinsr $MEM, %zmmN? Eliding
constant masks might help to avoid STLF issues due to false dependences on
masked out elements (IIRC all uarchs currently suffer from that).
Note even all-zero masks cannot be optimized on GIMPLE currently since the
value of the masked out lanes isn't well-defined there (we're working on that).