https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91546
Bug ID: 91546 Summary: Better solution for VEC_INIT under TARGET_SSE4_1 since PINSRB/PINSRD/PINSRQ Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: crazylht at gmail dot com Target Milestone: --- Target: i386, x86-64 for testcase: #include<immintrin.h> __m128 test2 (int a,int b,int c,int d) { return __extension__ (__m128) (__v4si) {a, b, c, d}; } comile with -Ofast -march=skylake-avx512 gcc generate test2(int, int, int, int): vmovd xmm2, edx vmovd xmm3, edi vpinsrd xmm1, xmm2, ecx, 1 vpinsrd xmm0, xmm3, esi, 1 vpunpcklqdq xmm0, xmm0, xmm1 ret while clang generate: test2(int, int, int, int): vmovd xmm0, edi vpinsrd xmm0, xmm0, esi, 1 vpinsrd xmm0, xmm0, edx, 2 vpinsrd xmm0, xmm0, ecx, 3 ret One instruction less for V4SI, more instructions less for V8SI/V16SI, similar for V8QI/V2DI. It also will make cost of vec_contruct in vectorization more realistic. 21119 case vec_construct: 21120 { 21121 /* N element inserts into SSE vectors. */ 21122 int cost = TYPE_VECTOR_SUBPARTS (vectype) * ix86_cost->sse_op; 21123 /* One vinserti128 for combining two SSE vectors for AVX256. */ 21124 if (GET_MODE_BITSIZE (mode) == 256) 21125 cost += ix86_vec_cost (mode, ix86_cost->addss); 21126 /* One vinserti64x4 and two vinserti128 for combining SSE 21127 and AVX256 vectors to AVX512. */ 21128 else if (GET_MODE_BITSIZE (mode) == 512) 21129 cost += 3 * ix86_vec_cost (mode, ix86_cost->addss); 21130 return cost;