On 7/21/23 14:45, Xi Ruoyao wrote:
On Fri, 2023-07-21 at 14:11 +0100, Matthew Malcomson wrote:
My understanding is that this is not a hardware bug and that it's
specified that rounding does not happen on the multiply "sub-part" in
`FNMSUB`, but rounding happens on the `FMUL` that generates some input
to it.

AFAIK the C standard does only say "A floating *expression* may be
contracted".  I.e:

double r = a * b + c;

may be compiled to use FMA because "a * b + c" is a floating point
expression.  But

double t = a * b;
double r = t + c;

is not, because "a * b" and "t + c" are two separate floating point
expressions.

So a contraction across two functions is not allowed.  We now have -ffp-
contract=on (https://gcc.gnu.org/r14-2023) to only allow C-standard
contractions.

Perhaps -ffp-contract=on (not off) is enough to fix the issue (if you
are building GCC 14 snapshot).  The default is "fast" (if no -std=
option is used), which allows some contractions disallowed by the
standard.

But GCC is in C++ and I'm not sure if the C++ standard has the same
definition for allowed contractions as C.


Thanks -- I'll look into whether `-ffp-contract=on` works.


It's possible that the test itself is flaky.  Can you provide some
detail about how it fails?


Sure -- The outline is that `timer::validate_phases` sees the sum of sub-part timers as greater than the timer for the "overall" time (outside of a tolerance of 1.000001). It then complains and hits `gcc_unreachable()`.

While I found it difficult to get enough information out of the test that is run in the testsuite, I found that if passing an invalid argument to `cc1plus` all sub-parts would be zero, and sometimes the "total" would be negative.

This was due to the `times` syscall returning the same clock tick for start and end of the "total" timer and the difference in rounding between FNMSUB and FMUL means that depending on what that clock tick is the "elapsed time" can end up calculated as negative.

I didn't proove it 100% but I believe the same fundamental difference (but opposite rounding error) could trigger the testsuite failure -- if the "end" of one sub-phase timer is greater than the "start" of another sub-phase timer then sum of parts could be greater than total.

There is a "tolerance" in this test that I considered increasing, but since that would not affect the "invalid arguments" thing (where the total is negative and hence the tolerance multiplication of 1.000001 would have to be supplemented by a positive offset) I suggested avoiding the inline.

W.r.t. the x86 bug that Alexander Monakov has pointed to, it's a very similar thing but in this case the problem is not bit-precision of values after the inlining, but rather a difference between fused and not fused operations after the inlining.

Agreed that using integral arithmetic is the more robust solution.

Reply via email to