https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97642
--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Jakub Jelinek from comment #1) > The problem is that in the RTL representation there is nothing that would > tell cse, forward propagation or combiner etc. not to optimize the > (insn 7 6 8 2 (set (reg:QI 89) > (const_int 31 [0x1f])) "include/avx512vlintrin.h":865:20 77 > {*movqi_internal} > (nil)) > (insn 8 7 9 2 (set (reg:V8SI 87) > (vec_merge:V8SI (mem:V8SI (reg/v/f:DI 86 [ arr ]) [0 S32 A8]) > (reg:V8SI 88) > (reg:QI 89))) "include/avx512vlintrin.h":865:20 1423 > {avx512vl_loadv8si_mask} > (nil)) > into: > (insn 8 7 9 2 (set (reg:V8SI 87) > (vec_merge:V8SI (mem:V8SI (reg/v/f:DI 86 [ arr ]) [0 S32 A8]) > (reg:V8SI 88 [ tmp ]) > (const_int 31 [0x1f]))) "include/avx512vlintrin.h":865:20 4402 > {avx2_pblenddv8si} > (expr_list:REG_DEAD (reg:QI 89) > (expr_list:REG_DEAD (reg:V8SI 88 [ tmp ]) > (expr_list:REG_DEAD (reg/v/f:DI 86 [ arr ]) > (nil))))) > Guess we'd need to use some UNSPEC for the masked loads and have patterns > for combine to optimize those that have -1 masks into normal loads, or > disable the blend patterns with MEM operands for avx512f+ (i.e. force those > into registers). I prefer UNSPEC solution, UNSPEC masked load patterns only needed for intrinsics, <avx512>_load<mode>_mask could be keeped and renamed to <avx512>_blendm<mode>. > Because the RTL representation really matches more the blend behavior than > the avx512 masking, where exceptions from the masked off elts just don't > show up.