Hi:

+/* Optimize vector MUL generation for V8QI, V16QI and V32QI
+   under TARGET_AVX512BW. i.e. for v16qi a * b, it has
+
+   vpmovzxbw ymm2, xmm0
+   vpmovzxbw ymm3, xmm1
+   vpmullw   ymm4, ymm2, ymm3
+   vpmovwb   xmm0, ymm4
+
+   it would take less instructions than ix86_expand_vecop_qihi.
+   Return true if success.  */

  Bootstrap is ok, regression test on i386/x86-64 backend is ok.

gcc/ChangeLog:
        PR target/95488
        * config/i386/i386-expand.c (ix86_expand_vecmul_qihi): New
        function.
        * config/i386/i386-protos.h (ix86_expand_vecmul_qihi): Declare.
        * config/i386/sse.md (mul<mode>3): Drop mask_name since
        there's no real vector char multiplication instruction with
        mask. Also optimize it under TARGET_AVX512BW.
        (mulv8qi3): New expander.

gcc/testsuite/ChangeLog:
        * gcc.target/i386/avx512bw-pr95488-1.c: New test.
        * gcc.target/i386/avx512bw-pr95488-2.c: Ditto.
        * gcc.target/i386/avx512vl-pr95488-1.c: Ditto.
        * gcc.target/i386/avx512vl-pr95488-2.c: Ditto.

-- 
BR,
Hongtao

Attachment: 0001-Optimize-multiplication-for-V8QI-V16QI-V32QI-under-T.patch
Description: Binary data

Reply via email to