https://llvm.org/bugs/show_bug.cgi?id=27854
Bug ID: 27854 Summary: [x86, SSE] insertps chosen when unpcklps would be better Product: libraries Version: trunk Hardware: PC OS: All Status: NEW Severity: normal Priority: P Component: Backend: X86 Assignee: unassignedb...@nondot.org Reporter: spatel+l...@rotateright.com CC: llvm-bugs@lists.llvm.org Classification: Unclassified Forking this off of bug 27826: If we have a sequence of insertelements to fill in a vector: define <4 x float> @goo(float %f0, float %f1, float %f2, float %f3) { %ins0 = insertelement <4 x float> undef, float %f0, i32 0 %ins1 = insertelement <4 x float> %ins0, float %f1, i32 1 %ins2 = insertelement <4 x float> %ins1, float %f2, i32 2 %ins3 = insertelement <4 x float> %ins2, float %f3, i32 3 ret <4 x float> %ins3 } We do the optimal thing with SSE2: $ ./llc -o - inselts.ll ... unpcklps %xmm3, %xmm1 ## xmm1 = xmm1[0],xmm3[0],xmm1[1],xmm3[1] unpcklps %xmm2, %xmm0 ## xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1] unpcklps %xmm1, %xmm0 ## xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1] retq The first two instructions are independent, so they can execute in parallel given enough hardware. But given the opportunity to use ever more shuffle instructions with each ISA extension: $ ./llc -o - inselts.ll -mattr=sse4.1 ... insertps $16, %xmm1, %xmm0 ## xmm0 = xmm0[0],xmm1[0],xmm0[2,3] insertps $32, %xmm2, %xmm0 ## xmm0 = xmm0[0,1],xmm2[0],xmm0[3] insertps $48, %xmm3, %xmm0 ## xmm0 = xmm0[0,1,2],xmm3[0] retq We now have a sequence of 3 dependent instructions, and each of those instructions is larger in size too. -- You are receiving this mail because: You are on the CC list for the bug.
_______________________________________________ llvm-bugs mailing list llvm-bugs@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs