Hello, Although I wouldn't like to fight defending GCC's design change here, let me offer a couple of corrections/additions so everyone is on the same page:
On Mon, 26 Feb 2018, Ruslan Nikolaev via gcc wrote: > > 1. Not consistent with clang/llvm which completely supports double-width > atomics for arm32, arm64, x86 and x86-64 making it possible to write portable > code (w/o specific extensions or assembly code) across all these architectures > (which is finally possible with C11!).The behavior of clang: if mxc16 is > specified, cmpxchg16b is generated for x86-64 (without any calls to > libatomic), otherwise -- redirection to libatomic. For arm64, ldaxp/staxp are > always generated. In my opinion, this is very logical and non-confusing. Note that there's more issues to that than just behavior on readonly memory: you need to ensure that the whole program, including all static and shared libraries, is compiled with -mcx16 (and currently there's no ld.so/ld-level support to ensure that), or you'd need to be sure that it's safe to mix code compiled with different -mcx16 settings because it never happens to interop on wide atomic objects. (if you mix -mcx16 and -mno-cx16 code operating on the same 128-bit object, you get wrong code that will appear to work >99% of the time) > 3. The behavior is inconsistent even within GCC. Older (and more limited, less > portable, etc) __sync builtins still use cmpxchg16b directly, newer __atomic > and C11 -- do not. Moreover, __sync builtins are probably less suitable for > arm/arm64. Note that there's no "load" function in the __sync family, so the original concern about operations on readonly memory does not apply. > For these reasons, it may be a good idea if GCC folks reconsider past > decision. And just to clarify: if mcx16 (x86-64) is not specified during > compilation, it is totally OK to redirect to libatomic, and there make the > final decision if target CPU supports a given instruction or not. But if it is > specified, it makes sense for performance reasons and lock-freedom guarantees > to always generate it directly. You don't mention it directly, so just to make it clear for readers: on systems where GNU IFUNC extension is available (i.e. on Glibc), libatomic tries to do exactly that: test for cmpxchg16b availability and redirect 128-bit atomics to lock-free RMW implementations if so. (I don't like this solution) Thanks. Alexander