On Fri, 2016-12-02 at 12:13 +0100, Gabriel Paubert wrote: > On Thu, Dec 01, 2016 at 11:13:37AM -0800, Bin Fan at Work wrote: > > Hi Szabolcs, > > > > > On Nov 29, 2016, at 3:11 AM, Szabolcs Nagy <szabolcs.n...@arm.com> wrote: > > > > > > On 17/11/16 20:12, Bin Fan wrote: > > >> > > >> Although this ABI specification specifies that 16-byte properly aligned > > >> atomics are inlineable on platforms > > >> supporting cmpxchg16b, we document the caveats here for further > > >> discussion. If we decide to change the > > >> inlineable attribute for those atomics, then this ABI, the compiler and > > >> the runtime implementation should be > > >> updated together at the same time. > > >> > > >> > > >> The compiler and runtime need to check the availability of cmpxchg16b to > > >> implement this ABI specification. > > >> Here is how it would work: The compiler can get the information either > > >> from the compiler flags or by > > >> inquiring the hardware capabilities. When the information is not > > >> available, the compiler should assume that > > >> cmpxchg16b instruction is not supported. The runtime library > > >> implementation can also query the hardware > > >> compatibility and choose the implementation at runtime. Assuming the > > >> user provides correct compiler options > > > > > > with this abi the runtime implementation *must* query the hardware > > > (because there might be inlined cmpxchg16b in use in another module > > > on a hardware that supports it and the runtime must be able to sync > > > with it). > > > > Thanks for the comment. Yes, the ABI requires libatomic must query the > > hardware. This is > > necessary if we want the compiler to generate inlined code for 16-byte > > atomics. Note that > > this particular issue only affects x86. > > Why? Power (at least recent ones) has 128 bit atomic instructions > (lqarx/stqcx.) and Z has 128 bit compare and swap.
That's not the only factor affecting whether cmpxchg16b or such is used for atomics. If the HW just offers a wide CAS but no wide atomic load, then even an atomic load is not truly just a load, which breaks (1) atomic loads on read-only mapped memory and (2) volatile atomic loads (unless we claim that an idempotent store is like a load, which is quite a stretch for volatile I think).