Ping on these design questions. On 2/27/26 11:20, Matthew Malcomson wrote:
External email: Use caution opening links or attachmentsHello, First I'd like to ping for others opinion on the my previous design questions around floating point atomic operations. - Any contrary opinions on the builtin semantics matching the libstdc++ fetch_add semantics? - Do people agree on my conclusion that pattern-matching an FP CAS loop to use a builtin with these relaxed FP semantics is awkward enough to not be worthwhile? ------------------------------ Second: while working on expanding these floating point atomic operations in the mid-end (after IPA) I've hit a few tricky points. If we are defining the floating point exception semantics of these builtins to be the same as the libstdc++ floating point fetch_add methods (i.e. can be executed in a different floating point environment as the calling thread), then we still have the question of whether the builtin can *pollute* the calling threads floating point environment. As it stands the CAS implementation in libstdc++ could go around the compare exchange loop multiple times and set floating point status/exception flags according to one of the operations that didn't end up getting stored in memory. While the C++ specification of this operation allows the operation to be performed in a "different" floating point environment to the calling thread, I would not read that as allowing such pollution of the environment. Since I'm hoping these builtins become the standard approach to performing the floating point fetch_add/sub operations, I expect that it would be best if they do not do such "polluting" of the calling threads environment. For the AArch64 instruction recently defined this is not a problem (the instruction doesn't set any flags). However when expanding to a CAS loop in the mid-end when no optab is defined I believe I have to add some floating point exception handling code. Questions: 1) Am I missing anything? Is there some reason that this is not necessary? 2) Since the builtin will be defined to not necessarily set floating point exceptions or status flags, I would guess that something implementing `feholdexcept(&fenv); <CAS loop>; fesetenv(&fenv)` would be better than the full handling as is done for the C `_Atomic` feature. Do others agree? 3) W.r.t. implementation -- it seems there would be two feasible approaches: - Add new sync-builtins along the lines of `__atomic_feraiseexcept` so that the mid-end can emit calls to these directly. - Add a new target hook similar to `atomic_assign_expand_fenv` but that returns GIMPLE and provides `fesetenv` instead of `feupdateenv` and `feclearexcept`. I'm guessing that the second option (new target hook) would be the best since it means we can inline the operations. Do others agree? Thanks, Matthew On 2/23/26 17:05, Jonathan Wakely wrote:External email: Use caution opening links or attachmentsOn Mon, 23 Feb 2026 at 12:01, Matthew Malcomson <[email protected]> wrote:Hello all, I'm focussing back on the atomic floating point fetch add/fetch sub operations. Andre and Wilco brought some things around floating point exceptions to my attention that I'd missed at the time. First, it seems the relevant AArch64 instructions do not raise floating point exceptions and do not set floating point status flags.https://developer.arm.com/documentation/ddi0602/2025-12/SIMD-FP- Instructions/LDFADD--LDFADDA--LDFADDAL--LDFADDL--Atomic-floating- point-add-?lang=en. AIUI this also holds for the existing floating point atomic operations that are available on GPU's. On looking back at the new C++ atomic<float>::fetch_add methods with this in mind, it seems they have explicitly relaxed semantics around floating point exceptions (https://eel.is/c++draft/atomics#ref.float-8). The quote there is "The floating-point environment ([cfenv]) for atomic arithmetic operations on floating-point-type may be different than the calling thread's floating-point environment". Given this new information I believe the new atomic intrinsics should have the semantics of the new libstdc++ methods -- i.e. allowing the floating point environment of the operation to be different to that of the calling thread. The main reason for this being that we are interested in adding this new intrinsic in order to ensure compilers can emit the most performant instructions for code using this new libstdc++ method, and C `_Atomic` is a language feature that does not need to be tied to the behaviour of a builtin. Does this seem reasonable to everyone?I am selfishly in favour of the builtins matching the semantics that libstdc++ wants :-) But yes, I agree that if the standard allows the atomic ops to ignore the current FP env, and some targets can emit more efficient code by taking advantage of that permission, then it makes sense to do that.One less-obvious consequence of this is around pattern matching a CAS loop (which I originally planned to do in order to match the approach taken for fetch_min/fetch_max). It seems that this would require a "replay" operation to be added around the atomic internal function in order to provide the same floating point exception semantics as the original loop. An extra question about AArch64 floating point semantics for Andre and Wilco: The instruction mentions it behaves as if `FPCR.AH is 0` and `FPCR.DN is 1`: Is this the case for standard code? If not I guess that would mean that pattern matching a C-level CAS loop to use the new atomic operations would leave incorrect values in memory? Both of these points make me wonder whether pattern-matching an existing CAS loop to these new instructions is worthwhile/feasible (respectively). What are peoples thoughts on that? MM On 9/19/24 22:38, Joseph Myers wrote:External email: Use caution opening links or attachments On Thu, 19 Sep 2024, [email protected] wrote:6) Anything special about floating point maths that I'm tripping up on?Correct atomic operations with floating-point operands should ensure thatexceptions raised exactly correspond to the operands for which the operation succeeded, and not to the operands for any previous attempts where the compare-exchange failed. There is a lengthy note in the C standard (in C11 it's a footnote in 6.5.16.2, in C17 it's a Note in 6.5.16.2 and in C23 that subclause has become 6.5.17.3) that discussesappropriate code sequences to achieve this. In GCC the implementation ofthis is in c-typeck.cc:build_atomic_assign, which in turn callstargetm.atomic_assign_expand_fenv (note that we have the complication forC of not introducing libm dependencies in code that only uses standardlanguage features and not <math.h>, <fenv.h> or <complex.h>, so direct useof <fenv.h> functions is inappropriate here). I would expect such built-in functions to follow the same semantics forfloating-point exceptions as _Atomic compound assignment does. (Note that_Atomic compound assignment is more general in the allowed operands, because compound assignment is a heterogeneous operation - for example,the special floating-point logic in build_atomic_assign includes the case where the LHS of the compound assignment is of atomic integer type but the RHS is of floating type. However, built-in functions allow memory ordersother than seq_cst to be used, whereas _Atomic compound assignment is limited to the seq_cst case.) So it would seem appropriate for the implementation of such built-in functions to make use of targetm.atomic_assign_expand_fenv for floating-point environment handling, and for testcases to include tests analogous to c11-atomic-exec-5.c that exceptions are being handled correctly. Cf. N2329 which suggested such operations for C in <stdatomic.h> (buttried to do to many things in one paper to be accepted into C); it didn'tgo into the floating-point exceptions semantics but simple correctnesswould indicate avoiding spurious exceptions from discarded computations.-- Joseph S. Myers [email protected]
