https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81904
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2017-08-21 CC| |rguenth at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Hmm, I think the issue is we see f (__m128d x, __m128d y, __m128d z) { vector(2) double _4; vector(2) double _6; <bb 2> [100.00%]: _4 = x_2(D) * y_3(D); _6 = __builtin_ia32_addsubpd (_4, z_5(D)); [tail call] return _6; the vectorizer will implement addsub as _6 = _4 + z_5(D); _7 = _4 - z_5(D); _8 = __builtin_shuffle (_6, _7, {0, 1}); return _8; which would then end up as (if the non-single use allows) _6 = FMA <x_2, y_3, z_5(D)> _9 = -z_5(D); _7 = FMA <x_2, y_3, _9> _8 = __builtin_shuffle (_6, _7, {0, 1}); return _8; a bit interesting for combine to figure out but theoretically possible? (I think we expand both FMAs properly). Look at the addsub patterns. That is, handling this requires open-coding _mm_addsub_pd with add, sub and suffle ...