https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122074
--- Comment #6 from rockeet <rockeet at gmail dot com> ---
It is interesting that GCC fused the load into cmp if change the code a little:
size_t avx512_search_byte_max32_2(const byte_t* data, size_t len, byte_t key) {
__mmask32 k = _bzhi_u32(-1, len);
return _tzcnt_u32(_mm256_mask_cmpeq_epi8_mask(k,
*(__m256i_u*)data, _mm256_set1_epi8(key)));
}
see https://godbolt.org/z/W8MKTbKPv , it still generated an extra `mov eax,
eax`
