On 7/21/23 14:45, Xi Ruoyao wrote:
On Fri, 2023-07-21 at 14:11 +0100, Matthew Malcomson wrote:
My understanding is that this is not a hardware bug and that it's
specified that rounding does not happen on the multiply "sub-part" in
`FNMSUB`, but rounding happens on the `FMUL` that generates some input
to it.
AFAIK the C standard does only say "A floating *expression* may be
contracted". I.e:
double r = a * b + c;
may be compiled to use FMA because "a * b + c" is a floating point
expression. But
double t = a * b;
double r = t + c;
is not, because "a * b" and "t + c" are two separate floating point
expressions.
So a contraction across two functions is not allowed. We now have -ffp-
contract=on (https://gcc.gnu.org/r14-2023) to only allow C-standard
contractions.
Perhaps -ffp-contract=on (not off) is enough to fix the issue (if you
are building GCC 14 snapshot). The default is "fast" (if no -std=
option is used), which allows some contractions disallowed by the
standard.
But GCC is in C++ and I'm not sure if the C++ standard has the same
definition for allowed contractions as C.
Thanks -- I'll look into whether `-ffp-contract=on` works.
It's possible that the test itself is flaky. Can you provide some
detail about how it fails?
Sure -- The outline is that `timer::validate_phases` sees the sum of
sub-part timers as greater than the timer for the "overall" time
(outside of a tolerance of 1.000001). It then complains and hits
`gcc_unreachable()`.
While I found it difficult to get enough information out of the test
that is run in the testsuite, I found that if passing an invalid
argument to `cc1plus` all sub-parts would be zero, and sometimes the
"total" would be negative.
This was due to the `times` syscall returning the same clock tick for
start and end of the "total" timer and the difference in rounding
between FNMSUB and FMUL means that depending on what that clock tick is
the "elapsed time" can end up calculated as negative.
I didn't proove it 100% but I believe the same fundamental difference
(but opposite rounding error) could trigger the testsuite failure -- if
the "end" of one sub-phase timer is greater than the "start" of another
sub-phase timer then sum of parts could be greater than total.
There is a "tolerance" in this test that I considered increasing, but
since that would not affect the "invalid arguments" thing (where the
total is negative and hence the tolerance multiplication of 1.000001
would have to be supplemented by a positive offset) I suggested avoiding
the inline.
W.r.t. the x86 bug that Alexander Monakov has pointed to, it's a very
similar thing but in this case the problem is not bit-precision of
values after the inlining, but rather a difference between fused and not
fused operations after the inlining.
Agreed that using integral arithmetic is the more robust solution.