https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103008
Bug ID: 103008 Summary: poor inlined builtin_fmod on x86_64 Product: gcc Version: 11.2.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: fx at gnu dot org Target Milestone: --- Target: x86_64 Created attachment 51706 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51706&action=edit ggl.f90 This is from looking at a Fortran benchmark set <https://www.fortran.uk/fortran-compiler-comparisons/>, but presumably isn't Fortran-specific. One of the cases in that set (ac.f90) gets bottlenecked on a random number routine (which may be rubbish, but it's there). It uses DMOD, which gets compiled to __builtin_fmod according to the tree dump, and is inlined. However, the benchmark performance is still 50% worse with gfortran than Intel ifort, and if I replace DMOD with its definition, gfortran is much closer to ifort. I'll attach files ggl.f90, the original, and gglx.f90 which avoids the call to the intrinsic, along with assembler from each. The assembler is from GCC 11.2.0, run (on SKX) as gfortran -Ofast -march=native (I note that the generated fmod isn't inlined with -O3, which looks to me like a Fortran miss that I should report.) I only take benchmarks too seriously for understanding the results but, at least with PDO, GCC is pretty much on a par with ifort on the bottom line of that set, despite also #40770, and another poor case. :-)