https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115374

            Bug ID: 115374
           Summary: fmod() in x86_64 -O3 not using return value from the
                    glibc's implementation if x87 FPU fprem returns NaN
           Product: gcc
           Version: 14.1.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: k3x-devel at outlook dot com
  Target Milestone: ---

Created attachment 58371
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58371&action=edit
code to reproduce the problem, compile with -O3

I believe I have found a minor bug in GCC when -O3 optimization is enabled. Due
to the complexity of GCC codebase, I am unable to check the relevant parts but
I have a reproducible example and a commented assembly to illustrate the
problem.

I was able to reproduce the same behavior on Arch Linux GCC 14.1.1 20240522 and
Gentoo's GCC (Gentoo 13.2.1_p20240210 p14) 13.2.1 20240210.

Apparently, on x86_64 target with -O3 optimization enabled, GCC tries to use
use x87 FPU for the fmod implementation. But if the resulting number is NaN, it
seem to fall back to the glibc's implementation of fmod. The problem is it
looks like it never uses the returned value from the call and instead re-uses
the NaN from the previous FPU operation.

The NaN in the FPU can happen if some MMX instructions were used previously,
filling the FPU stack without using EMMS instruction to bring back the FPU into
usable state. This is how I found out the bug. In such case GCC could fall back
to the glibc's implementation and actually use the resulting value which was a
valid non-NaN number.

The commented assembly illustrating the problem:

00000000000011c0 <do_fmod>:
    11c0:       48 83 ec 28             sub    $0x28,%rsp
    11c4:       f2 0f 11 44 24 08       movsd  %xmm0,0x8(%rsp)
    11ca:       f2 0f 11 4c 24 10       movsd  %xmm1,0x10(%rsp)
    11d0:       dd 44 24 10             fldl   0x10(%rsp)           # load
numbers from stack to FPU
    11d4:       dd 44 24 08             fldl   0x8(%rsp)
    11d8:       d9 f8                   fprem                       # do fp
partial remainder
    11da:       df e0                   fnstsw %ax                  # read FPU
SW into AX
    11dc:       f6 c4 04                test   $0x4,%ah             # test if
C2 (incomplete reduction) is set
    11df:       75 f7                   jne    11d8 <do_fmod+0x18>  # jump to
fprem again if C2 was set (not taken)
    11e1:       dd d9                   fstp   %st(1)               # pop st1
from the stack
    11e3:       dd 5c 24 18             fstpl  0x18(%rsp)           # store st0
into the stack and pop (stores NaN due to FPU stack fault and IE)
    11e7:       f2 0f 10 54 24 18       movsd  0x18(%rsp),%xmm2     # copy
result to the %xmm2
    11ed:       66 0f 2e d2             ucomisd %xmm2,%xmm2         # check if
%xmm2 holds NaN
    11f1:       7a 09                   jp     11fc <do_fmod+0x3c>  # jump if
so (PB set, taken), jumps to A
    11f3:       66 0f 28 c2             movapd %xmm2,%xmm0          # B: copy
%xmm2 to %xmm0
    11f7:       48 83 c4 28             add    $0x28,%rsp
    11fb:       c3                      ret                         # return
from the procedure with NaN from %xmm0 as the result!
    11fc:       f2 0f 10 4c 24 10       movsd  0x10(%rsp),%xmm1     # A: load
original fmod args into xmm0 and 1
    1202:       f2 0f 10 44 24 08       movsd  0x8(%rsp),%xmm0
    1208:       e8 33 fe ff ff          call   1040 <fmod@plt>      # call libc
fmod
    120d:       f2 0f 10 54 24 18       movsd  0x18(%rsp),%xmm2     # <== GCC
BUG? Ignores libc fmod result and copies previous NaN to xmm2
    1213:       eb de                   jmp    11f3 <do_fmod+0x33>  # jumps to
B
    1215:       66 66 2e 0f 1f 84 00    data16 cs nopw 0x0(%rax,%rax,1)
    121c:       00 00 00 00

Reply via email to