Re: [PATCH 0/8] [RFC] Introduce floating point fetch_add builtins

Matthew Malcomson Mon, 23 Feb 2026 04:01:11 -0800

Hello all,

I'm focussing back on the atomic floating point fetch add/fetch suboperations.

Andre and Wilco brought some things around floating point exceptions tomy attention that I'd missed at the time.

First, it seems the relevant AArch64 instructions do not raise floatingpoint exceptions and do not set floating point status flags.https://developer.arm.com/documentation/ddi0602/2025-12/SIMD-FP-Instructions/LDFADD--LDFADDA--LDFADDAL--LDFADDL--Atomic-floating-point-add-?lang=en.AIUI this also holds for the existing floating point atomic operationsthat are available on GPU's.

On looking back at the new C++ atomic<float>::fetch_add methods withthis in mind, it seems they have explicitly relaxed semantics aroundfloating point exceptions (https://eel.is/c++draft/atomics#ref.float-8).The quote there is "The floating-point environment ([cfenv]) foratomic arithmetic operations on floating-point-type may be differentthan the calling thread's floating-point environment".

Given this new information I believe the new atomic intrinsics shouldhave the semantics of the new libstdc++ methods -- i.e. allowing thefloating point environment of the operation to be different to that ofthe calling thread.The main reason for this being that we are interested in adding this newintrinsic in order to ensure compilers can emit the most performantinstructions for code using this new libstdc++ method, and C `_Atomic`is a language feature that does not need to be tied to the behaviour ofa builtin.


Does this seem reasonable to everyone?

One less-obvious consequence of this is around pattern matching a CASloop (which I originally planned to do in order to match the approachtaken for fetch_min/fetch_max). It seems that this would require a"replay" operation to be added around the atomic internal function inorder to provide the same floating point exception semantics as theoriginal loop.

An extra question about AArch64 floating point semantics for Andre andWilco: The instruction mentions it behaves as if `FPCR.AH is 0` and`FPCR.DN is 1`: Is this the case for standard code?If not I guess that would mean that pattern matching a C-level CAS loopto use the new atomic operations would leave incorrect values in memory?

Both of these points make me wonder whether pattern-matching an existingCAS loop to these new instructions is worthwhile/feasible(respectively). What are peoples thoughts on that?


MM

On 9/19/24 22:38, Joseph Myers wrote:

External email: Use caution opening links or attachments


On Thu, 19 Sep 2024, [email protected] wrote:

6) Anything special about floating point maths that I'm tripping up on?


Correct atomic operations with floating-point operands should ensure that
exceptions raised exactly correspond to the operands for which the
operation succeeded, and not to the operands for any previous attempts
where the compare-exchange failed.  There is a lengthy note in the C
standard (in C11 it's a footnote in 6.5.16.2, in C17 it's a Note in
6.5.16.2 and in C23 that subclause has become 6.5.17.3) that discusses
appropriate code sequences to achieve this.  In GCC the implementation of
this is in c-typeck.cc:build_atomic_assign, which in turn calls
targetm.atomic_assign_expand_fenv (note that we have the complication for
C of not introducing libm dependencies in code that only uses standard
language features and not <math.h>, <fenv.h> or <complex.h>, so direct use
of <fenv.h> functions is inappropriate here).

I would expect such built-in functions to follow the same semantics for
floating-point exceptions as _Atomic compound assignment does.  (Note that
_Atomic compound assignment is more general in the allowed operands,
because compound assignment is a heterogeneous operation - for example,
the special floating-point logic in build_atomic_assign includes the case
where the LHS of the compound assignment is of atomic integer type but the
RHS is of floating type.  However, built-in functions allow memory orders
other than seq_cst to be used, whereas _Atomic compound assignment is
limited to the seq_cst case.)

So it would seem appropriate for the implementation of such built-in
functions to make use of targetm.atomic_assign_expand_fenv for
floating-point environment handling, and for testcases to include tests
analogous to c11-atomic-exec-5.c that exceptions are being handled
correctly.

Cf. N2329 which suggested such operations for C in <stdatomic.h> (but
tried to do to many things in one paper to be accepted into C); it didn't
go into the floating-point exceptions semantics but simple correctness
would indicate avoiding spurious exceptions from discarded computations.

--
Joseph S. Myers
[email protected]

Re: [PATCH 0/8] [RFC] Introduce floating point fetch_add builtins

Reply via email to