https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97642
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Ever confirmed|0 |1 CC| |crazylht at gmail dot com, | |hjl.tools at gmail dot com, | |jakub at gcc dot gnu.org Last reconfirmed| |2020-10-30 Status|UNCONFIRMED |NEW --- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> --- The problem is that in the RTL representation there is nothing that would tell cse, forward propagation or combiner etc. not to optimize the (insn 7 6 8 2 (set (reg:QI 89) (const_int 31 [0x1f])) "include/avx512vlintrin.h":865:20 77 {*movqi_internal} (nil)) (insn 8 7 9 2 (set (reg:V8SI 87) (vec_merge:V8SI (mem:V8SI (reg/v/f:DI 86 [ arr ]) [0 S32 A8]) (reg:V8SI 88) (reg:QI 89))) "include/avx512vlintrin.h":865:20 1423 {avx512vl_loadv8si_mask} (nil)) into: (insn 8 7 9 2 (set (reg:V8SI 87) (vec_merge:V8SI (mem:V8SI (reg/v/f:DI 86 [ arr ]) [0 S32 A8]) (reg:V8SI 88 [ tmp ]) (const_int 31 [0x1f]))) "include/avx512vlintrin.h":865:20 4402 {avx2_pblenddv8si} (expr_list:REG_DEAD (reg:QI 89) (expr_list:REG_DEAD (reg:V8SI 88 [ tmp ]) (expr_list:REG_DEAD (reg/v/f:DI 86 [ arr ]) (nil))))) Guess we'd need to use some UNSPEC for the masked loads and have patterns for combine to optimize those that have -1 masks into normal loads, or disable the blend patterns with MEM operands for avx512f+ (i.e. force those into registers). Because the RTL representation really matches more the blend behavior than the avx512 masking, where exceptions from the masked off elts just don't show up.