https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119405

            Bug ID: 119405
           Summary: Missed FMA optimization for C code present in C++
           Product: gcc
           Version: 14.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: alexander.gr...@tu-dresden.de
  Target Milestone: ---

In a third-party library a test is failing on AMD Rome (Zen2) that I traced
ultimately to a function doing a simple quaternion multiplication.

Similar code is present in a C and a C++ file that are later linked together,
reducing that to exactly the same code I see an FMA in the C++ code but not in
C.

The reduced code is:
```
void mjuu_mulquat(double* res, const double* qa, const double* qb) {
    double tmp[4] = {
    qa[0]*qb[0] - qa[1]*qb[1] - qa[2]*qb[2] - qa[3]*qb[3],
    qa[0]*qb[1] + qa[1]*qb[0] + qa[2]*qb[3] - qa[3]*qb[2],
    qa[0]*qb[2] - qa[1]*qb[3] + qa[2]*qb[0] + qa[3]*qb[1],
    qa[0]*qb[3] + qa[1]*qb[2] - qa[2]*qb[1] + qa[3]*qb[0]
  };
  res[0] = tmp[0];
  res[1] = tmp[1];
  res[2] = tmp[2];
  res[3] = tmp[3];
}
```
Compiling it with `-O3 -std=c++11 -mavx2 -mfma` and `-O3 -std=c11 -mavx2 
-mfma` shows vfmadd132pd vfnmadd132pd in C++ not present in C. Comparison:
https://godbolt.org/z/TsYqcssnM

I don't see why the code would optimize differently in those cases as I'd
expect the optimizer gets the same input which is seemingly not the case.

Reply via email to