Hi I have read multiple bug reports (84522, 80878, 70490), and the past decision regarding GCC change to redirect double-width (128-bit) atomics for x86-64 and arm64 to libatomic. Below I mention major concerns as well as the response from C11 (WG14) regarding DR 459 which, most likely, triggered this change in more recent GCC releases in the first place. If I understand correctly, the redirection to libatomic was made for 2 reasons: 1. cmpxchg16b is not available on early amd64 processors. (However, mcx16 flag already specifies that you use CPUs that have this instruction, so it should not be a concern when the flag is specified.) 2. atomic_load on read-only memory. DR 459 now requires to have 'const' qualifiers for atomic_load which probably resulted in the interpretation that read-only memory must be supported. However, per response from C11/WG14 (see below), it does not seem to be the case at all. Therefore, previously filed bug 70490 does not seem to be valid. There are several concerns with current GCC behavior:
1. Not consistent with clang/llvm which completely supports double-width atomics for arm32, arm64, x86 and x86-64 making it possible to write portable code (w/o specific extensions or assembly code) across all these architectures (which is finally possible with C11!).The behavior of clang: if mxc16 is specified, cmpxchg16b is generated for x86-64 (without any calls to libatomic), otherwise -- redirection to libatomic. For arm64, ldaxp/staxp are always generated. In my opinion, this is very logical and non-confusing. 2. Oftentimes you want to have strict guarantees (by specifying mcx16 flag for x86-64) that the generated code is lock-free, otherwise it is useless. Double-width atomics are often used in lock-free algorithms that use tags (stamps) for pointers to resolve the ABA problem. So, it is very useful to have corresponding support in the compiler. 3. The behavior is inconsistent even within GCC. Older (and more limited, less portable, etc) __sync builtins still use cmpxchg16b directly, newer __atomic and C11 -- do not. Moreover, __sync builtins are probably less suitable for arm/arm64. 4. atomic_load can be implemented using read-modify-write as it is the only option for x86-64 and arm64 (see below). For these reasons, it may be a good idea if GCC folks reconsider past decision. And just to clarify: if mcx16 (x86-64) is not specified during compilation, it is totally OK to redirect to libatomic, and there make the final decision if target CPU supports a given instruction or not. But if it is specified, it makes sense for performance reasons and lock-freedom guarantees to always generate it directly. -- Ruslan Response from the WG14 (C11) Convener regarding DR 459: (I asked for a permission to publish this response here.) Ruslan, Thank you for your comments. There is no normative requirement that const objects be suitable for read-only memory. An example and a footnote refer to read-only memory as a way to illustrate a point, but examples and footnotes are not normative. The actual nature of read-only memory and how it can be used are outside the scope of the standard, so there is nothing to prevent atomic_load from being implemented as a read-modify-write operation. David My original email: Dear David Keaton, After reviewing the proposed change DR 459 for C11: http://www.open-std.org/jtc1/sc22/wg14/www/docs/summary.htm#dr_459 ,I identified that adding const qualifier to atomic_load (C11 implements its without it) may actually be harmful in some cases. Particularly, for double-width (128-bit) atomics found in x86-64 (cmpxchg16b instruction), arm64 (ldaxp/staxp instructions), it is currently only possible to implement atomic_load for 128 bit using corresponding read-modify-write instructions (i.e., potentially rewriting memory with the same value, but, in essence, not changing it). But these implementations will not work on read-only memory. Similar concerns apply to some extent to x86 and arm32 for double-width (64-bit) atomics. Otherwise, there is no obstacle to implement all C11 atomics for corresponding types in these architectures. Moreover, a well-known clang/llvm compiler already implements all double-width operations for x86, x86-64, arm32 and arm64 (atomic_load is implemented using corresponding read-modify-write instructions). Double-width atomics are often used in data structures that need tagging for pointers to avoid the ABA problem (e.g., in lock-free stacks and queues). It is my understanding that C11 aimed to make atomics more or less portable across different microarchitectures, while at the same time provide an ability for a compiler to optimize code well and utilize all potential of the corresponding microarchitecture. If now it is required to support read-only memory (i.e., const qualifier) for atomic_load, 128-bit atomics are likely be impossible to implement in any meaningful and portable way. Thus, anyone who wants to use them will have to go with assembly fallbacks (or compiler extensions), thus, partially defeating the purpose of C11 atomics. One way to address this concern would be to state that atomic_load on read-only memory is implementation-defined and may not be supported for all types. That would also mean to go with the previous C11 definition (i.e., without the const qualifier) to implement atomic_load rather than what was proposed in the DR 459 change. I am ready to submit a more formal proposal if this is something that can be considered by the committee.