Hello,

Although I wouldn't like to fight defending GCC's design change here, let me
offer a couple of corrections/additions so everyone is on the same page:

On Mon, 26 Feb 2018, Ruslan Nikolaev via gcc wrote:
> 
> 1. Not consistent with clang/llvm which completely supports double-width
> atomics for arm32, arm64, x86 and x86-64 making it possible to write portable
> code (w/o specific extensions or assembly code) across all these architectures
> (which is finally possible with C11!).The behavior of clang: if mxc16 is
> specified, cmpxchg16b is generated for x86-64 (without any calls to
> libatomic), otherwise -- redirection to libatomic. For arm64, ldaxp/staxp are
> always generated. In my opinion, this is very logical and non-confusing.

Note that there's more issues to that than just behavior on readonly memory:
you need to ensure that the whole program, including all static and shared
libraries, is compiled with -mcx16 (and currently there's no ld.so/ld-level
support to ensure that), or you'd need to be sure that it's safe to mix code
compiled with different -mcx16 settings because it never happens to interop
on wide atomic objects.

(if you mix -mcx16 and -mno-cx16 code operating on the same 128-bit object,
you get wrong code that will appear to work >99% of the time)

> 3. The behavior is inconsistent even within GCC. Older (and more limited, less
> portable, etc) __sync builtins still use cmpxchg16b directly, newer __atomic
> and C11 -- do not. Moreover, __sync builtins are probably less suitable for
> arm/arm64.

Note that there's no "load" function in the __sync family, so the original
concern about operations on readonly memory does not apply.

> For these reasons, it may be a good idea if GCC folks reconsider past
> decision. And just to clarify: if mcx16 (x86-64) is not specified during
> compilation, it is totally OK to redirect to libatomic, and there make the
> final decision if target CPU supports a given instruction or not. But if it is
> specified, it makes sense for performance reasons and lock-freedom guarantees
> to always generate it directly. 

You don't mention it directly, so just to make it clear for readers: on systems
where GNU IFUNC extension is available (i.e. on Glibc), libatomic tries to do
exactly that: test for cmpxchg16b availability and redirect 128-bit atomics to
lock-free RMW implementations if so.  (I don't like this solution)

Thanks.
Alexander

Reply via email to