https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115374
Bug ID: 115374 Summary: fmod() in x86_64 -O3 not using return value from the glibc's implementation if x87 FPU fprem returns NaN Product: gcc Version: 14.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: k3x-devel at outlook dot com Target Milestone: --- Created attachment 58371 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58371&action=edit code to reproduce the problem, compile with -O3 I believe I have found a minor bug in GCC when -O3 optimization is enabled. Due to the complexity of GCC codebase, I am unable to check the relevant parts but I have a reproducible example and a commented assembly to illustrate the problem. I was able to reproduce the same behavior on Arch Linux GCC 14.1.1 20240522 and Gentoo's GCC (Gentoo 13.2.1_p20240210 p14) 13.2.1 20240210. Apparently, on x86_64 target with -O3 optimization enabled, GCC tries to use use x87 FPU for the fmod implementation. But if the resulting number is NaN, it seem to fall back to the glibc's implementation of fmod. The problem is it looks like it never uses the returned value from the call and instead re-uses the NaN from the previous FPU operation. The NaN in the FPU can happen if some MMX instructions were used previously, filling the FPU stack without using EMMS instruction to bring back the FPU into usable state. This is how I found out the bug. In such case GCC could fall back to the glibc's implementation and actually use the resulting value which was a valid non-NaN number. The commented assembly illustrating the problem: 00000000000011c0 <do_fmod>: 11c0: 48 83 ec 28 sub $0x28,%rsp 11c4: f2 0f 11 44 24 08 movsd %xmm0,0x8(%rsp) 11ca: f2 0f 11 4c 24 10 movsd %xmm1,0x10(%rsp) 11d0: dd 44 24 10 fldl 0x10(%rsp) # load numbers from stack to FPU 11d4: dd 44 24 08 fldl 0x8(%rsp) 11d8: d9 f8 fprem # do fp partial remainder 11da: df e0 fnstsw %ax # read FPU SW into AX 11dc: f6 c4 04 test $0x4,%ah # test if C2 (incomplete reduction) is set 11df: 75 f7 jne 11d8 <do_fmod+0x18> # jump to fprem again if C2 was set (not taken) 11e1: dd d9 fstp %st(1) # pop st1 from the stack 11e3: dd 5c 24 18 fstpl 0x18(%rsp) # store st0 into the stack and pop (stores NaN due to FPU stack fault and IE) 11e7: f2 0f 10 54 24 18 movsd 0x18(%rsp),%xmm2 # copy result to the %xmm2 11ed: 66 0f 2e d2 ucomisd %xmm2,%xmm2 # check if %xmm2 holds NaN 11f1: 7a 09 jp 11fc <do_fmod+0x3c> # jump if so (PB set, taken), jumps to A 11f3: 66 0f 28 c2 movapd %xmm2,%xmm0 # B: copy %xmm2 to %xmm0 11f7: 48 83 c4 28 add $0x28,%rsp 11fb: c3 ret # return from the procedure with NaN from %xmm0 as the result! 11fc: f2 0f 10 4c 24 10 movsd 0x10(%rsp),%xmm1 # A: load original fmod args into xmm0 and 1 1202: f2 0f 10 44 24 08 movsd 0x8(%rsp),%xmm0 1208: e8 33 fe ff ff call 1040 <fmod@plt> # call libc fmod 120d: f2 0f 10 54 24 18 movsd 0x18(%rsp),%xmm2 # <== GCC BUG? Ignores libc fmod result and copies previous NaN to xmm2 1213: eb de jmp 11f3 <do_fmod+0x33> # jumps to B 1215: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) 121c: 00 00 00 00