https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81904

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2017-08-21
                 CC|                            |rguenth at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Hmm, I think the issue is we see

f (__m128d x, __m128d y, __m128d z)
{
  vector(2) double _4;
  vector(2) double _6;

  <bb 2> [100.00%]:
  _4 = x_2(D) * y_3(D);
  _6 = __builtin_ia32_addsubpd (_4, z_5(D)); [tail call]
  return _6;

the vectorizer will implement addsub as

  _6 = _4 + z_5(D);
  _7 = _4 - z_5(D);
  _8 = __builtin_shuffle (_6, _7, {0, 1});
  return _8;

which would then end up as (if the non-single use allows)

  _6 = FMA <x_2, y_3, z_5(D)>
  _9 = -z_5(D);
  _7 = FMA <x_2, y_3, _9>
  _8 = __builtin_shuffle (_6, _7, {0, 1});
  return _8;

a bit interesting for combine to figure out but theoretically possible?
(I think we expand both FMAs properly).

Look at the addsub patterns.

That is, handling this requires open-coding _mm_addsub_pd with add, sub
and suffle ...

Reply via email to