Hello all,

I'm focussing back on the atomic floating point fetch add/fetch sub operations.

Andre and Wilco brought some things around floating point exceptions to my attention that I'd missed at the time.

First, it seems the relevant AArch64 instructions do not raise floating point exceptions and do not set floating point status flags. https://developer.arm.com/documentation/ddi0602/2025-12/SIMD-FP-Instructions/LDFADD--LDFADDA--LDFADDAL--LDFADDL--Atomic-floating-point-add-?lang=en . AIUI this also holds for the existing floating point atomic operations that are available on GPU's.

On looking back at the new C++ atomic<float>::fetch_add methods with this in mind, it seems they have explicitly relaxed semantics around floating point exceptions (https://eel.is/c++draft/atomics#ref.float-8). The quote there is "The floating-point environment ([cfenv]) for atomic arithmetic operations on floating-​point-type may be different than the calling thread's floating-point environment".

Given this new information I believe the new atomic intrinsics should have the semantics of the new libstdc++ methods -- i.e. allowing the floating point environment of the operation to be different to that of the calling thread. The main reason for this being that we are interested in adding this new intrinsic in order to ensure compilers can emit the most performant instructions for code using this new libstdc++ method, and C `_Atomic` is a language feature that does not need to be tied to the behaviour of a builtin.

Does this seem reasonable to everyone?


One less-obvious consequence of this is around pattern matching a CAS loop (which I originally planned to do in order to match the approach taken for fetch_min/fetch_max). It seems that this would require a "replay" operation to be added around the atomic internal function in order to provide the same floating point exception semantics as the original loop.

An extra question about AArch64 floating point semantics for Andre and Wilco: The instruction mentions it behaves as if `FPCR.AH is 0` and `FPCR.DN is 1`: Is this the case for standard code? If not I guess that would mean that pattern matching a C-level CAS loop to use the new atomic operations would leave incorrect values in memory?

Both of these points make me wonder whether pattern-matching an existing CAS loop to these new instructions is worthwhile/feasible (respectively). What are peoples thoughts on that?

MM

On 9/19/24 22:38, Joseph Myers wrote:
External email: Use caution opening links or attachments


On Thu, 19 Sep 2024, [email protected] wrote:

6) Anything special about floating point maths that I'm tripping up on?

Correct atomic operations with floating-point operands should ensure that
exceptions raised exactly correspond to the operands for which the
operation succeeded, and not to the operands for any previous attempts
where the compare-exchange failed.  There is a lengthy note in the C
standard (in C11 it's a footnote in 6.5.16.2, in C17 it's a Note in
6.5.16.2 and in C23 that subclause has become 6.5.17.3) that discusses
appropriate code sequences to achieve this.  In GCC the implementation of
this is in c-typeck.cc:build_atomic_assign, which in turn calls
targetm.atomic_assign_expand_fenv (note that we have the complication for
C of not introducing libm dependencies in code that only uses standard
language features and not <math.h>, <fenv.h> or <complex.h>, so direct use
of <fenv.h> functions is inappropriate here).

I would expect such built-in functions to follow the same semantics for
floating-point exceptions as _Atomic compound assignment does.  (Note that
_Atomic compound assignment is more general in the allowed operands,
because compound assignment is a heterogeneous operation - for example,
the special floating-point logic in build_atomic_assign includes the case
where the LHS of the compound assignment is of atomic integer type but the
RHS is of floating type.  However, built-in functions allow memory orders
other than seq_cst to be used, whereas _Atomic compound assignment is
limited to the seq_cst case.)

So it would seem appropriate for the implementation of such built-in
functions to make use of targetm.atomic_assign_expand_fenv for
floating-point environment handling, and for testcases to include tests
analogous to c11-atomic-exec-5.c that exceptions are being handled
correctly.

Cf. N2329 which suggested such operations for C in <stdatomic.h> (but
tried to do to many things in one paper to be accepted into C); it didn't
go into the floating-point exceptions semantics but simple correctness
would indicate avoiding spurious exceptions from discarded computations.

--
Joseph S. Myers
[email protected]


Reply via email to