Hi: +/* Optimize vector MUL generation for V8QI, V16QI and V32QI + under TARGET_AVX512BW. i.e. for v16qi a * b, it has + + vpmovzxbw ymm2, xmm0 + vpmovzxbw ymm3, xmm1 + vpmullw ymm4, ymm2, ymm3 + vpmovwb xmm0, ymm4 + + it would take less instructions than ix86_expand_vecop_qihi. + Return true if success. */
Bootstrap is ok, regression test on i386/x86-64 backend is ok. gcc/ChangeLog: PR target/95488 * config/i386/i386-expand.c (ix86_expand_vecmul_qihi): New function. * config/i386/i386-protos.h (ix86_expand_vecmul_qihi): Declare. * config/i386/sse.md (mul<mode>3): Drop mask_name since there's no real vector char multiplication instruction with mask. Also optimize it under TARGET_AVX512BW. (mulv8qi3): New expander. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512bw-pr95488-1.c: New test. * gcc.target/i386/avx512bw-pr95488-2.c: Ditto. * gcc.target/i386/avx512vl-pr95488-1.c: Ditto. * gcc.target/i386/avx512vl-pr95488-2.c: Ditto. -- BR, Hongtao
0001-Optimize-multiplication-for-V8QI-V16QI-V32QI-under-T.patch
Description: Binary data