https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119405
Bug ID: 119405 Summary: Missed FMA optimization for C code present in C++ Product: gcc Version: 14.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: alexander.gr...@tu-dresden.de Target Milestone: --- In a third-party library a test is failing on AMD Rome (Zen2) that I traced ultimately to a function doing a simple quaternion multiplication. Similar code is present in a C and a C++ file that are later linked together, reducing that to exactly the same code I see an FMA in the C++ code but not in C. The reduced code is: ``` void mjuu_mulquat(double* res, const double* qa, const double* qb) { double tmp[4] = { qa[0]*qb[0] - qa[1]*qb[1] - qa[2]*qb[2] - qa[3]*qb[3], qa[0]*qb[1] + qa[1]*qb[0] + qa[2]*qb[3] - qa[3]*qb[2], qa[0]*qb[2] - qa[1]*qb[3] + qa[2]*qb[0] + qa[3]*qb[1], qa[0]*qb[3] + qa[1]*qb[2] - qa[2]*qb[1] + qa[3]*qb[0] }; res[0] = tmp[0]; res[1] = tmp[1]; res[2] = tmp[2]; res[3] = tmp[3]; } ``` Compiling it with `-O3 -std=c++11 -mavx2 -mfma` and `-O3 -std=c11 -mavx2 -mfma` shows vfmadd132pd vfnmadd132pd in C++ not present in C. Comparison: https://godbolt.org/z/TsYqcssnM I don't see why the code would optimize differently in those cases as I'd expect the optimizer gets the same input which is seemingly not the case.