https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93009

Matthias Hochsteger <matthias.hochsteger at tuwien dot ac.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |UNCONFIRMED
         Resolution|INVALID                     |---

--- Comment #5 from Matthias Hochsteger <matthias.hochsteger at tuwien dot 
ac.at> ---
Thanks for the fast replies. Anyway, I think there was a misunderstanding. The
issue is not about accuracy of fma vs. mult+add.

The attached code should clarify the issue (I still couldn't simplify it much
though). It basically boils down to a single call of multiplyAndAdd:

template <typename T>                                                           
T multiplyAndAdd(T a, T b, T c)
{
  return a*b+c;
}

  template <class S>
  __attribute__ ((__always_inline__)) inline S P1(S x) const 
  {
    cout << "a = " << S(coefsal[1][0]) << endl; 
    cout << "b = " << S(x) << endl; 
    cout << "c = " << S(coefsal[1][1]) << endl; 
    auto res = multiplyAndAdd (S(coefsal[1][0]),S(x),S(coefsal[1][1]));
    cout << "res: " << res << endl; 
    return res;
  }

The data type is "AutoDiffRec<3, SIMD<double, 2>>", which basically contains 4
_m128d values.

>$ g++ -std=c++17 -march=skylake-avx512 -O1 test_fma.ii && ./a.out
> a = 1 1, D = 0 0 0 0 0 0 
> b = 3 4, D = 0 0 0 0 0 0 
> c = 2 2, D = 0 0 0 0 0 0 
> res: 5 6, D = 0 0 0 0 0 0 


>$ g++ -std=c++17 -march=skylake-avx512 -O1 -fexpensive-optimizations 
>test_fma.ii && ./a.out
> a = 1 1, D = 0 0 0 0 0 0 
> b = 3 4, D = 0 0 0 0 0 0 
> c = 2 2, D = 0 0 0 0 0 0 
> res: 3 3, D = 0 0 0 0 0 0

Reply via email to