Re: [PATCH 0/8] [RFC] Introduce floating point fetch_add builtins

Matthew Malcomson Fri, 27 Feb 2026 03:20:58 -0800

Hello,

First I'd like to ping for others opinion on the my previous design
questions around floating point atomic operations.
- Any contrary opinions on the builtin semantics matching the libstdc++
  fetch_add semantics?
- Do people agree on my conclusion that pattern-matching an FP CAS loop
  to use a builtin with these relaxed FP semantics is awkward enough to
  not be worthwhile?


------------------------------
Second: while working on expanding these floating point atomic
operations in the mid-end (after IPA) I've hit a few tricky points.

If we are defining the floating point exception semantics of these
builtins to be the same as the libstdc++ floating point fetch_add
methods (i.e. can be executed in a different floating point environment
as the calling thread), then we still have the question of whether the
builtin can *pollute* the calling threads floating point environment.

As it stands the CAS implementation in libstdc++ could go around the
compare exchange loop multiple times and set floating point
status/exception flags according to one of the operations that didn't
end up getting stored in memory.
While the C++ specification of this operation allows the operation to be
performed in a "different" floating point environment to the calling
thread, I would not read that as allowing such pollution of the
environment.

Since I'm hoping these builtins become the standard approach to
performing the floating point fetch_add/sub operations, I expect that it
would be best if they do not do such "polluting" of the calling threads
environment.

For the AArch64 instruction recently defined this is not a problem (the
instruction doesn't set any flags).  However when expanding to a CAS
loop in the mid-end when no optab is defined I believe I have to add
some floating point exception handling code.

Questions:
1) Am I missing anything?  Is there some reason that this is not
   necessary?
2) Since the builtin will be defined to not necessarily set floating
   point exceptions or status flags, I would guess that something
   implementing `feholdexcept(&fenv); <CAS loop>; fesetenv(&fenv)`
   would be better than the full handling as is done for the C `_Atomic`
   feature.  Do others agree?
3) W.r.t. implementation -- it seems there would be two feasible
   approaches:
   - Add new sync-builtins along the lines of `__atomic_feraiseexcept`
     so that the mid-end can emit calls to these directly.
   - Add a new target hook similar to `atomic_assign_expand_fenv` but
     that returns GIMPLE and provides `fesetenv` instead of
     `feupdateenv` and `feclearexcept`.
   I'm guessing that the second option (new target hook) would be the
   best since it means we can inline the operations.
   Do others agree?

Thanks,
Matthew


On 2/23/26 17:05, Jonathan Wakely wrote:

External email: Use caution opening links or attachments


On Mon, 23 Feb 2026 at 12:01, Matthew Malcomson <[email protected]> wrote:

Hello all,

I'm focussing back on the atomic floating point fetch add/fetch sub
operations.

Andre and Wilco brought some things around floating point exceptions to
my attention that I'd missed at the time.

First, it seems the relevant AArch64 instructions do not raise floating
point exceptions and do not set floating point status flags.
https://developer.arm.com/documentation/ddi0602/2025-12/SIMD-FP-Instructions/LDFADD--LDFADDA--LDFADDAL--LDFADDL--Atomic-floating-point-add-?lang=en
.
AIUI this also holds for the existing floating point atomic operations
that are available on GPU's.

On looking back at the new C++ atomic<float>::fetch_add methods with
this in mind, it seems they have explicitly relaxed semantics around
floating point exceptions (https://eel.is/c++draft/atomics#ref.float-8).
The quote there is "The floating-point environment ([cfenv]) for
atomic arithmetic operations on floating-point-type may be different
than the calling thread's floating-point environment".

Given this new information I believe the new atomic intrinsics should
have the semantics of the new libstdc++ methods -- i.e. allowing the
floating point environment of the operation to be different to that of
the calling thread.
The main reason for this being that we are interested in adding this new
intrinsic in order to ensure compilers can emit the most performant
instructions for code using this new libstdc++ method, and C `_Atomic`
is a language feature that does not need to be tied to the behaviour of
a builtin.

Does this seem reasonable to everyone?


I am selfishly in favour of the builtins matching the semantics that
libstdc++ wants :-)

But yes, I agree that if the standard allows the atomic ops to ignore
the current FP env, and some targets can emit more efficient code by
taking advantage of that permission, then it makes sense to do that.



One less-obvious consequence of this is around pattern matching a CAS
loop (which I originally planned to do in order to match the approach
taken for fetch_min/fetch_max).  It seems that this would require a
"replay" operation to be added around the atomic internal function in
order to provide the same floating point exception semantics as the
original loop.

An extra question about AArch64 floating point semantics for Andre and
Wilco:  The instruction mentions it behaves as if `FPCR.AH is 0` and
`FPCR.DN is 1`:  Is this the case for standard code?
If not I guess that would mean that pattern matching a C-level CAS loop
to use the new atomic operations would leave incorrect values in memory?

Both of these points make me wonder whether pattern-matching an existing
CAS loop to these new instructions is worthwhile/feasible
(respectively).  What are peoples thoughts on that?

MM

On 9/19/24 22:38, Joseph Myers wrote:

External email: Use caution opening links or attachments


On Thu, 19 Sep 2024, [email protected] wrote:

6) Anything special about floating point maths that I'm tripping up on?


Correct atomic operations with floating-point operands should ensure that
exceptions raised exactly correspond to the operands for which the
operation succeeded, and not to the operands for any previous attempts
where the compare-exchange failed.  There is a lengthy note in the C
standard (in C11 it's a footnote in 6.5.16.2, in C17 it's a Note in
6.5.16.2 and in C23 that subclause has become 6.5.17.3) that discusses
appropriate code sequences to achieve this.  In GCC the implementation of
this is in c-typeck.cc:build_atomic_assign, which in turn calls
targetm.atomic_assign_expand_fenv (note that we have the complication for
C of not introducing libm dependencies in code that only uses standard
language features and not <math.h>, <fenv.h> or <complex.h>, so direct use
of <fenv.h> functions is inappropriate here).

I would expect such built-in functions to follow the same semantics for
floating-point exceptions as _Atomic compound assignment does.  (Note that
_Atomic compound assignment is more general in the allowed operands,
because compound assignment is a heterogeneous operation - for example,
the special floating-point logic in build_atomic_assign includes the case
where the LHS of the compound assignment is of atomic integer type but the
RHS is of floating type.  However, built-in functions allow memory orders
other than seq_cst to be used, whereas _Atomic compound assignment is
limited to the seq_cst case.)

So it would seem appropriate for the implementation of such built-in
functions to make use of targetm.atomic_assign_expand_fenv for
floating-point environment handling, and for testcases to include tests
analogous to c11-atomic-exec-5.c that exceptions are being handled
correctly.

Cf. N2329 which suggested such operations for C in <stdatomic.h> (but
tried to do to many things in one paper to be accepted into C); it didn't
go into the floating-point exceptions semantics but simple correctness
would indicate avoiding spurious exceptions from discarded computations.

--
Joseph S. Myers
[email protected]

Re: [PATCH 0/8] [RFC] Introduce floating point fetch_add builtins

Reply via email to