On Wed, Oct 12, 2022 at 07:16:20PM +0200, Andrea Parri wrote: > > > > +Andrea, in case he has time to look at the memory model / ABI > > > > issues. > > > +Jeff, who was offering to help when the threads got crossed. I'd punted on > > a lot of this in the hope Andrea could help out, as I'm not really a memory > > model guy and this is pretty far down the rabbit hole. Happy to have the > > help if you're offering, though, as what's there is likely a pretty big > > performance issue for anyone with a reasonable memory system. > > Thanks for linking me to the discussion and the remarks, Palmer. I'm > happy to help (and synchronized with Jeff/the community) as possible, > building a better understanding of the 'issues' at stake.
Summarizing here some findings from looking at the currently-implemented and the proposed [1] mappings: - Current mapping is missing synchronization, notably atomic_compare_exchange_weak_explicit(-, -, -, memory_order_release, memory_order_relaxed); is unable to provide the (required) release ordering guarantees; for reference, I've reported a litmus test illustrating it at the bottom of this email, cf. c-cmpxchg. - [1] addressed the "memory_order_release" problem/bug mentioned above (as well as other quirks of the current mapping I won't detail here), but it doesn't address other problems present in the current mapping; in particular, both mappings translate the following atomic_compare_exchange_weak_explicit(-, -, -, memory_order_acquire, memory_order_relaxed); to a sequence lr.w bne sc.w.aq (withouth any other synchronization/fences), which contrasts with the the Unprivileged Spec, Section 10,2 "Load-Reserve / Store-Conditional Instructions": "Software should not set the 'rl' bit on an LR instruction unless the 'aq' bit is also set, nor should software set the 'aq' bit on an SC instruction unless the 'rl' bit is also set. LR.rl and SC.aq instructions are not guaranteed to provide any stronger ordering than those with both bits clear [...]" Thanks, Andrea [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595712.html C c-cmpxchg {} P0(atomic_int *x, atomic_int *y, int *z) { int r0; atomic_store_explicit(x, 1, memory_order_relaxed); r0 = atomic_compare_exchange_weak_explicit(y, z, 1, memory_order_release, memory_order_relaxed); } P1(atomic_int *x, atomic_int *y) { int r1; int r2; r1 = atomic_load_explicit(y, memory_order_acquire); r2 = atomic_load_explicit(x, memory_order_relaxed); } exists (1:r1=1 /\ 1:r2=0)