https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110235
--- Comment #7 from CVS Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by hongtao Liu <liuho...@gcc.gnu.org>: https://gcc.gnu.org/g:f8e02702726d4514b8ff9f5481c9c1f5d34e1787 commit r14-1917-gf8e02702726d4514b8ff9f5481c9c1f5d34e1787 Author: liuhongt <hongtao....@intel.com> Date: Thu Jun 15 16:46:14 2023 +0800 Refined 256/512-bit vpacksswb/vpackssdw patterns. The packing in vpacksswb/vpackssdw is not a simple concat, it's an interweave from src1 and src2 for every 128 bit(or 64-bit for the ss_truncate result). .i.e. dst[192-255] = ss_truncate (src2[128-255]) dst[128-191] = ss_truncate (src1[128-255]) dst[64-127] = ss_truncate (src2[0-127]) dst[0-63] = ss_truncate (src1[0-127] The patch refined those patterns with an extra vec_select for the interweave. gcc/ChangeLog: PR target/110235 * config/i386/sse.md (<sse2_avx2>_packsswb<mask_name>): Substitute with .. (sse2_packsswb<mask_name>): .. this, .. (avx2_packsswb<mask_name>): .. this and .. (avx512bw_packsswb<mask_name>): .. this. (<sse2_avx2>_packssdw<mask_name>): Substitute with .. (sse2_packssdw<mask_name>): .. this, .. (avx2_packssdw<mask_name>): .. this and .. (avx512bw_packssdw<mask_name>): .. this. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512bw-vpackssdw-3.c: New test. * gcc.target/i386/avx512bw-vpacksswb-3.c: New test.