https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93009
Matthias Hochsteger <matthias.hochsteger at tuwien dot ac.at> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |UNCONFIRMED Resolution|INVALID |--- --- Comment #5 from Matthias Hochsteger <matthias.hochsteger at tuwien dot ac.at> --- Thanks for the fast replies. Anyway, I think there was a misunderstanding. The issue is not about accuracy of fma vs. mult+add. The attached code should clarify the issue (I still couldn't simplify it much though). It basically boils down to a single call of multiplyAndAdd: template <typename T> T multiplyAndAdd(T a, T b, T c) { return a*b+c; } template <class S> __attribute__ ((__always_inline__)) inline S P1(S x) const { cout << "a = " << S(coefsal[1][0]) << endl; cout << "b = " << S(x) << endl; cout << "c = " << S(coefsal[1][1]) << endl; auto res = multiplyAndAdd (S(coefsal[1][0]),S(x),S(coefsal[1][1])); cout << "res: " << res << endl; return res; } The data type is "AutoDiffRec<3, SIMD<double, 2>>", which basically contains 4 _m128d values. >$ g++ -std=c++17 -march=skylake-avx512 -O1 test_fma.ii && ./a.out > a = 1 1, D = 0 0 0 0 0 0 > b = 3 4, D = 0 0 0 0 0 0 > c = 2 2, D = 0 0 0 0 0 0 > res: 5 6, D = 0 0 0 0 0 0 >$ g++ -std=c++17 -march=skylake-avx512 -O1 -fexpensive-optimizations >test_fma.ii && ./a.out > a = 1 1, D = 0 0 0 0 0 0 > b = 3 4, D = 0 0 0 0 0 0 > c = 2 2, D = 0 0 0 0 0 0 > res: 3 3, D = 0 0 0 0 0 0