https://llvm.org/bugs/show_bug.cgi?id=29222
Bug ID: 29222 Summary: Combining MMX with AVX suboptimal Product: clang Version: 3.8 Hardware: All OS: All Status: NEW Severity: enhancement Priority: P Component: -New Bugs Assignee: unassignedclangb...@nondot.org Reporter: kobalicek.p...@gmail.com CC: llvm-bugs@lists.llvm.org Classification: Unclassified The following code: #include <mmintrin.h> #include <immintrin.h> int fn(int x) { __m64 mm = _mm_set1_pi32(x); mm = _mm_packs_pi16(mm, mm); __m128i xmm = _mm_movpi64_epi64(mm); xmm = _mm_packs_epi16(xmm, xmm); return _mm_cvtsi128_si32(xmm); } Compiled with '-O2 -Wall -mavx2 -m32 -fomit-frame-pointer' produces: fn(int): sub esp, 20 vbroadcastss xmm0, dword ptr [esp + 24] # Cool idea, but not vmovlps qword ptr [esp + 8], xmm0 # in our context. movq mm0, qword ptr [esp + 8] # !!! packsswb mm0, mm0 movq qword ptr [esp], mm0 # These moves are vmovq xmm0, qword ptr [esp] # correct. vpacksswb xmm0, xmm0, xmm0 vmovd eax, xmm0 add esp, 20 ret I know that MMX is not used anymore, but I wonder why clang prefers a code-path that is one instruction longer and contains 2 memory accesses more than a more straightforward 'punpckldq'. -- You are receiving this mail because: You are on the CC list for the bug.
_______________________________________________ llvm-bugs mailing list llvm-bugs@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs