http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54703
Bug #: 54703 Summary: [miscompilation] _mm_sub_pd is incorrectly substituted with vandnps Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: major Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: kr...@kde.org The following testcase: #include <xmmintrin.h> __attribute__((aligned(16))) static const unsigned long long mask[2] = { 0xffffffffff000000ull, 0xffffffffff000000ull }; inline __m128d foo(__m128d v1) { const __m128d h1 = _mm_and_pd(v1, _mm_load_pd(reinterpret_cast<const double *>(&mask))); const __m128d l1 = _mm_sub_pd(v1, h1); return _mm_mul_pd(h1, l1); } __m128d test() { __m128d a = _mm_set1_pd(2.); return foo(foo(a)); } compiles to .cfi_startproc vmovaps _ZL4mask(%rip), %xmm0 vandps .LC0(%rip), %xmm0, %xmm2 vandnps .LC0(%rip), %xmm0, %xmm1 vmulpd %xmm1, %xmm2, %xmm1 vandps %xmm0, %xmm1, %xmm0 vsubpd %xmm0, %xmm1, %xmm1 vmulpd %xmm1, %xmm0, %xmm0 ret .cfi_endproc The second foo call is correct: vandps and vsubpd are used. But the first call uses vandps and vandnps. This pattern would be correct for integers, but is obviously wrong for floating point numbers.