https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81904
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- __m128d h(__m128d x, __m128d y, __m128d z){ __m128d tem = _mm_mul_pd (x,y); __m128d tem2 = tem + z; __m128d tem3 = tem - z; return __builtin_shuffle (tem2, tem3, (__m128i) {0, 3}); } doesn't quite work (the combiner pattern for fmaddsub is missing). Tried {0, 2} as well. : .LFB5021: .cfi_startproc vmovapd %xmm0, %xmm3 vfmsub132pd %xmm1, %xmm2, %xmm0 vfmadd132pd %xmm1, %xmm2, %xmm3 vshufpd $2, %xmm0, %xmm3, %xmm0