https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254911
--- Comment #3 from Dimitry Andric <d...@freebsd.org> --- Hmm it seems that we have a case here that is similar to what is described here: https://stackoverflow.com/questions/63125919/how-to-avoid-floating-point-exceptions-in-unused-simd-lanes The gist being that clang indeed uses the vdivps (Divide Packed Single-Precision) instruction by default, so the two calculations (beta * rho * s) / denom, t / denom) are emitted as: #DEBUG_VALUE: ctanhf:denom <- $xmm2 .loc 1 77 35 is_stmt 1 # src/lib/msun/src/s_ctanhf.c:77:35 vmulss %xmm1, %xmm3, %xmm1 .loc 1 77 41 is_stmt 0 # src/lib/msun/src/s_ctanhf.c:77:41 vmulss %xmm1, %xmm0, %xmm0 .loc 1 77 46 # src/lib/msun/src/s_ctanhf.c:77:46 vinsertps $16, -80(%rbp), %xmm0, %xmm0 # 16-byte Folded Reload # xmm0 = xmm0[0],mem[0],xmm0[2,3] vmovsldup %xmm2, %xmm1 # xmm1 = xmm2[0,0,2,2] vdivps %xmm1, %xmm0, %xmm0 Now the problem with vdivps is apparently that the unused 'lanes' of the SIMD registers can still result in floating point exception bits being set, such as FE_INVALID (in this case probably because the unused lanes have zero in them, giving 0/0). That stackoverflow article suggests using clang's -ffp-exception-behavior=maytrap option (documented at <https://releases.llvm.org/11.0.1/tools/clang/docs/UsersManual.html#cmdoption-ffp-exception-behavior>), meaning "The compiler avoids transformations that may raise exceptions that would not have been raised by the original code". It is supported from clang 10 onwards. In practice, this indeed avoids using vdivps, and uses vdivss (Divide Scalar Single-Precision) instead, and the assembly for line 77 then looks like: #DEBUG_VALUE: ctanhf:denom <- $xmm1 .loc 1 77 35 is_stmt 1 # src/lib/msun/src/s_ctanhf.c:77:35 vmulss %xmm2, %xmm4, %xmm2 .loc 1 77 41 is_stmt 0 # src/lib/msun/src/s_ctanhf.c:77:41 vmulss %xmm0, %xmm2, %xmm0 .loc 1 77 46 # src/lib/msun/src/s_ctanhf.c:77:46 vdivss %xmm1, %xmm0, %xmm2 vmovss -80(%rbp), %xmm0 # 4-byte Reload # xmm0 = mem[0],zero,zero,zero #DEBUG_VALUE: ctanhf:t <- $xmm0 .loc 1 77 57 # src/lib/msun/src/s_ctanhf.c:77:57 vdivss %xmm1, %xmm0, %xmm0 And indeed, in this case the FE_INVALID is gone, and the tests succeed. I guess it may be good to use this -ffp-exception-behavior=maytrap flag for the whole of lib/msun, as many of these functions rely on this behavior. It does not seem to be required for gcc. -- You are receiving this mail because: You are the assignee for the bug. _______________________________________________ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"