https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254911

--- Comment #3 from Dimitry Andric <d...@freebsd.org> ---
Hmm it seems that we have a case here that is similar to what is described
here:

https://stackoverflow.com/questions/63125919/how-to-avoid-floating-point-exceptions-in-unused-simd-lanes

The gist being that clang indeed uses the vdivps (Divide Packed
Single-Precision) instruction by default, so the two calculations (beta * rho *
s) / denom, t / denom) are emitted as:

        #DEBUG_VALUE: ctanhf:denom <- $xmm2
        .loc    1 77 35 is_stmt 1               #
src/lib/msun/src/s_ctanhf.c:77:35
        vmulss  %xmm1, %xmm3, %xmm1
        .loc    1 77 41 is_stmt 0               #
src/lib/msun/src/s_ctanhf.c:77:41
        vmulss  %xmm1, %xmm0, %xmm0
        .loc    1 77 46                         #
src/lib/msun/src/s_ctanhf.c:77:46
        vinsertps       $16, -80(%rbp), %xmm0, %xmm0 # 16-byte Folded Reload
                                        # xmm0 = xmm0[0],mem[0],xmm0[2,3]
        vmovsldup       %xmm2, %xmm1            # xmm1 = xmm2[0,0,2,2]
        vdivps  %xmm1, %xmm0, %xmm0

Now the problem with vdivps is apparently that the unused 'lanes' of the SIMD
registers can still result in floating point exception bits being set, such as
FE_INVALID (in this case probably because the unused lanes have zero in them,
giving 0/0).

That stackoverflow article suggests using clang's
-ffp-exception-behavior=maytrap option (documented at
<https://releases.llvm.org/11.0.1/tools/clang/docs/UsersManual.html#cmdoption-ffp-exception-behavior>),
meaning "The compiler avoids transformations that may raise exceptions that
would not have been raised by the original code". It is supported from clang 10
onwards.

In practice, this indeed avoids using vdivps, and uses vdivss (Divide Scalar
Single-Precision) instead, and the assembly for line 77 then looks like:

        #DEBUG_VALUE: ctanhf:denom <- $xmm1
        .loc    1 77 35 is_stmt 1               #
src/lib/msun/src/s_ctanhf.c:77:35
        vmulss  %xmm2, %xmm4, %xmm2
        .loc    1 77 41 is_stmt 0               #
src/lib/msun/src/s_ctanhf.c:77:41
        vmulss  %xmm0, %xmm2, %xmm0
        .loc    1 77 46                         #
src/lib/msun/src/s_ctanhf.c:77:46
        vdivss  %xmm1, %xmm0, %xmm2
        vmovss  -80(%rbp), %xmm0                # 4-byte Reload
                                        # xmm0 = mem[0],zero,zero,zero
        #DEBUG_VALUE: ctanhf:t <- $xmm0
        .loc    1 77 57                         #
src/lib/msun/src/s_ctanhf.c:77:57
        vdivss  %xmm1, %xmm0, %xmm0

And indeed, in this case the FE_INVALID is gone, and the tests succeed.

I guess it may be good to use this -ffp-exception-behavior=maytrap flag for the
whole of lib/msun, as many of these functions rely on this behavior. It does
not seem to be required for gcc.

-- 
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"

Reply via email to