https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97642

--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Jakub Jelinek from comment #1)
> The problem is that in the RTL representation there is nothing that would
> tell cse, forward propagation or combiner etc. not to optimize the
> (insn 7 6 8 2 (set (reg:QI 89)
>         (const_int 31 [0x1f])) "include/avx512vlintrin.h":865:20 77
> {*movqi_internal}
>      (nil))
> (insn 8 7 9 2 (set (reg:V8SI 87)
>         (vec_merge:V8SI (mem:V8SI (reg/v/f:DI 86 [ arr ]) [0  S32 A8])
>             (reg:V8SI 88)
>             (reg:QI 89))) "include/avx512vlintrin.h":865:20 1423
> {avx512vl_loadv8si_mask}
>      (nil))
> into:
> (insn 8 7 9 2 (set (reg:V8SI 87)
>         (vec_merge:V8SI (mem:V8SI (reg/v/f:DI 86 [ arr ]) [0  S32 A8])
>             (reg:V8SI 88 [ tmp ])
>             (const_int 31 [0x1f]))) "include/avx512vlintrin.h":865:20 4402
> {avx2_pblenddv8si}
>      (expr_list:REG_DEAD (reg:QI 89)
>         (expr_list:REG_DEAD (reg:V8SI 88 [ tmp ])
>             (expr_list:REG_DEAD (reg/v/f:DI 86 [ arr ])
>                 (nil)))))
> Guess we'd need to use some UNSPEC for the masked loads and have patterns
> for combine to optimize those that have -1 masks into normal loads, or
> disable the blend patterns with MEM operands for avx512f+ (i.e. force those
> into registers).

I prefer UNSPEC solution, UNSPEC masked load patterns only needed for
intrinsics, <avx512>_load<mode>_mask could be keeped and renamed to
<avx512>_blendm<mode>.



> Because the RTL representation really matches more the blend behavior than
> the avx512 masking, where exceptions from the masked off elts just don't
> show up.

Reply via email to