https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103008

            Bug ID: 103008
           Summary: poor inlined builtin_fmod on x86_64
           Product: gcc
           Version: 11.2.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: fx at gnu dot org
  Target Milestone: ---
            Target: x86_64

Created attachment 51706
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51706&action=edit
ggl.f90

This is from looking at a Fortran benchmark set
<https://www.fortran.uk/fortran-compiler-comparisons/>, but presumably
isn't Fortran-specific.

One of the cases in that set (ac.f90) gets bottlenecked on a random
number routine (which may be rubbish, but it's there).  It uses DMOD,
which gets compiled to __builtin_fmod according to the tree dump, and
is inlined.  However, the benchmark performance is still 50% worse
with gfortran than Intel ifort, and if I replace DMOD with its
definition, gfortran is much closer to ifort.

I'll attach files ggl.f90, the original, and gglx.f90 which avoids the
call to the intrinsic, along with assembler from each.  The assembler
is from GCC 11.2.0, run (on SKX) as

  gfortran -Ofast -march=native

(I note that the generated fmod isn't inlined with -O3, which looks to
me like a Fortran miss that I should report.)

I only take benchmarks too seriously for understanding the results
but, at least with PDO, GCC is pretty much on a par with ifort on the
bottom line of that set, despite also #40770, and another poor case. :-)

Reply via email to