Hi
I have read multiple bug reports (84522, 80878, 70490), and the past decision 
regarding GCC change to redirect double-width (128-bit) atomics for x86-64 and 
arm64 to libatomic. Below I mention major concerns as well as the response from 
C11 (WG14) regarding DR 459 which, most likely, triggered this change in more 
recent GCC releases in the first place. 
If I understand correctly, the redirection to libatomic was made for 2 reasons:
1. cmpxchg16b is not available on early amd64 processors. (However, mcx16 flag 
already specifies that you use CPUs that have this instruction, so it should 
not be a concern when the flag is specified.)
2. atomic_load on read-only memory. DR 459 now requires to have 'const' 
qualifiers for atomic_load which probably resulted in the interpretation that 
read-only memory must be supported. However, per response from C11/WG14 (see 
below), it does not seem to be the case at all. Therefore, previously filed bug 
70490 does not seem to be valid.
There are several concerns with current GCC behavior:

1. Not consistent with clang/llvm which completely supports double-width 
atomics for arm32, arm64, x86 and x86-64 making it possible to write portable 
code (w/o specific extensions or assembly code) across all these architectures 
(which is finally possible with C11!).The behavior of clang: if mxc16 is 
specified, cmpxchg16b is generated for x86-64 (without any calls to libatomic), 
otherwise -- redirection to libatomic. For arm64, ldaxp/staxp are always 
generated. In my opinion, this is very logical and non-confusing.

2. Oftentimes you want to have strict guarantees (by specifying mcx16 flag for 
x86-64) that the generated code is lock-free, otherwise it is useless. 
Double-width atomics are often used in lock-free algorithms that use tags 
(stamps) for pointers to resolve the ABA problem. So, it is very useful to have 
corresponding support in the compiler.

3. The behavior is inconsistent even within GCC. Older (and more limited, less 
portable, etc) __sync builtins still use cmpxchg16b directly, newer __atomic 
and C11 -- do not. Moreover, __sync builtins are probably less suitable for 
arm/arm64.

4. atomic_load can be implemented using read-modify-write as it is the only 
option for x86-64 and arm64 (see below).

For these reasons, it may be a good idea if GCC folks reconsider past decision. 
And just to clarify: if mcx16 (x86-64) is not specified during compilation, it 
is totally OK to redirect to libatomic, and there make the final decision if 
target CPU supports a given instruction or not. But if it is specified, it 
makes sense for performance reasons and lock-freedom guarantees to always 
generate it directly. 

-- Ruslan

Response from the WG14 (C11) Convener regarding DR 459: (I asked for a 
permission to publish this response here.)
Ruslan,

     Thank you for your comments.  There is no normative requirement that const 
objects be suitable for read-only memory.  An example and a footnote refer to 
read-only memory as a way to illustrate a point, but examples and footnotes are 
not normative.  The actual nature of read-only memory and how it can be used 
are outside the scope of the standard, so there is nothing to prevent 
atomic_load from being implemented as a read-modify-write operation.

                                        David
My original email:

Dear David Keaton,
After reviewing the proposed change DR 459 for C11: 
http://www.open-std.org/jtc1/sc22/wg14/www/docs/summary.htm#dr_459 ,I 
identified that adding const qualifier to atomic_load (C11 implements its 
without it) may actually be harmful in some cases.
Particularly, for double-width (128-bit) atomics found in x86-64 (cmpxchg16b 
instruction), arm64 (ldaxp/staxp instructions), it is currently only possible 
to implement atomic_load for 128 bit using corresponding read-modify-write 
instructions (i.e., potentially rewriting memory with the same value, but, in 
essence, not changing it). But these implementations will not work on read-only 
memory. Similar concerns apply to some extent to x86 and arm32 for double-width 
(64-bit) atomics. Otherwise, there is no obstacle to implement all C11 atomics 
for corresponding types in these architectures. Moreover, a well-known 
clang/llvm compiler already implements all double-width operations for x86, 
x86-64, arm32 and arm64 (atomic_load is implemented using corresponding 
read-modify-write instructions). Double-width atomics are often used in data 
structures that need tagging for pointers to avoid the ABA problem (e.g., in 
lock-free stacks and queues).
It is my understanding that C11 aimed to make atomics more or less portable 
across different microarchitectures, while at the same time provide an ability 
for a compiler to optimize code well and utilize all potential of the 
corresponding microarchitecture.
If now it is required to support read-only memory (i.e., const qualifier) for 
atomic_load, 128-bit atomics are likely be impossible to implement in any 
meaningful and portable way. Thus, anyone who wants to use them will have to go 
with assembly fallbacks (or compiler extensions), thus, partially defeating the 
purpose of C11 atomics. One way to address this concern would be to state that 
atomic_load on read-only memory is implementation-defined and may not be 
supported for all types. That would also mean to go with the previous C11 
definition (i.e., without the const qualifier) to implement atomic_load rather 
than what was proposed in the DR 459 change.
I am ready to submit a more formal proposal if this is something that can be 
considered by the committee.


Reply via email to