Hello all,
I'm focussing back on the atomic floating point fetch add/fetch sub
operations.
Andre and Wilco brought some things around floating point exceptions to
my attention that I'd missed at the time.
First, it seems the relevant AArch64 instructions do not raise floating
point exceptions and do not set floating point status flags.
https://developer.arm.com/documentation/ddi0602/2025-12/SIMD-FP-Instructions/LDFADD--LDFADDA--LDFADDAL--LDFADDL--Atomic-floating-point-add-?lang=en
.
AIUI this also holds for the existing floating point atomic operations
that are available on GPU's.
On looking back at the new C++ atomic<float>::fetch_add methods with
this in mind, it seems they have explicitly relaxed semantics around
floating point exceptions (https://eel.is/c++draft/atomics#ref.float-8).
The quote there is "The floating-point environment ([cfenv]) for
atomic arithmetic operations on floating-​point-type may be different
than the calling thread's floating-point environment".
Given this new information I believe the new atomic intrinsics should
have the semantics of the new libstdc++ methods -- i.e. allowing the
floating point environment of the operation to be different to that of
the calling thread.
The main reason for this being that we are interested in adding this new
intrinsic in order to ensure compilers can emit the most performant
instructions for code using this new libstdc++ method, and C `_Atomic`
is a language feature that does not need to be tied to the behaviour of
a builtin.
Does this seem reasonable to everyone?
One less-obvious consequence of this is around pattern matching a CAS
loop (which I originally planned to do in order to match the approach
taken for fetch_min/fetch_max). It seems that this would require a
"replay" operation to be added around the atomic internal function in
order to provide the same floating point exception semantics as the
original loop.
An extra question about AArch64 floating point semantics for Andre and
Wilco: The instruction mentions it behaves as if `FPCR.AH is 0` and
`FPCR.DN is 1`: Is this the case for standard code?
If not I guess that would mean that pattern matching a C-level CAS loop
to use the new atomic operations would leave incorrect values in memory?
Both of these points make me wonder whether pattern-matching an existing
CAS loop to these new instructions is worthwhile/feasible
(respectively). What are peoples thoughts on that?
MM
On 9/19/24 22:38, Joseph Myers wrote:
External email: Use caution opening links or attachments
On Thu, 19 Sep 2024, [email protected] wrote:
6) Anything special about floating point maths that I'm tripping up on?
Correct atomic operations with floating-point operands should ensure that
exceptions raised exactly correspond to the operands for which the
operation succeeded, and not to the operands for any previous attempts
where the compare-exchange failed. There is a lengthy note in the C
standard (in C11 it's a footnote in 6.5.16.2, in C17 it's a Note in
6.5.16.2 and in C23 that subclause has become 6.5.17.3) that discusses
appropriate code sequences to achieve this. In GCC the implementation of
this is in c-typeck.cc:build_atomic_assign, which in turn calls
targetm.atomic_assign_expand_fenv (note that we have the complication for
C of not introducing libm dependencies in code that only uses standard
language features and not <math.h>, <fenv.h> or <complex.h>, so direct use
of <fenv.h> functions is inappropriate here).
I would expect such built-in functions to follow the same semantics for
floating-point exceptions as _Atomic compound assignment does. (Note that
_Atomic compound assignment is more general in the allowed operands,
because compound assignment is a heterogeneous operation - for example,
the special floating-point logic in build_atomic_assign includes the case
where the LHS of the compound assignment is of atomic integer type but the
RHS is of floating type. However, built-in functions allow memory orders
other than seq_cst to be used, whereas _Atomic compound assignment is
limited to the seq_cst case.)
So it would seem appropriate for the implementation of such built-in
functions to make use of targetm.atomic_assign_expand_fenv for
floating-point environment handling, and for testcases to include tests
analogous to c11-atomic-exec-5.c that exceptions are being handled
correctly.
Cf. N2329 which suggested such operations for C in <stdatomic.h> (but
tried to do to many things in one paper to be accepted into C); it didn't
go into the floating-point exceptions semantics but simple correctness
would indicate avoiding spurious exceptions from discarded computations.
--
Joseph S. Myers
[email protected]