https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66862
--- Comment #5 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> Now, it seems AVX512BW (and AVX512VL in some cases) has the needed
> instructions,
> in particular VMOVDQU{8,16}, but it is not reflected in maskload<mode> and
> maskstore<mode> expanders. CCing Kyrill and Uros on this.
w/ -mavx512bw and -mavx512vl, the loop is vectorized since GCC 8.1.
