On Wed, Oct 12, 2022 at 07:16:20PM +0200, Andrea Parri wrote:
> > > >     +Andrea, in case he has time to look at the memory model / ABI
> > > >     issues.
> 
> > +Jeff, who was offering to help when the threads got crossed.  I'd punted on
> > a lot of this in the hope Andrea could help out, as I'm not really a memory
> > model guy and this is pretty far down the rabbit hole.  Happy to have the
> > help if you're offering, though, as what's there is likely a pretty big
> > performance issue for anyone with a reasonable memory system.
> 
> Thanks for linking me to the discussion and the remarks, Palmer.  I'm
> happy to help (and synchronized with Jeff/the community) as possible,
> building a better understanding of the 'issues' at stake.

Summarizing here some findings from looking at the currently-implemented
and the proposed [1] mappings:

  - Current mapping is missing synchronization, notably

        atomic_compare_exchange_weak_explicit(-, -, -,
                                              memory_order_release,
                                              memory_order_relaxed);

    is unable to provide the (required) release ordering guarantees; for
    reference, I've reported a litmus test illustrating it at the bottom
    of this email, cf. c-cmpxchg.

  - [1] addressed the "memory_order_release" problem/bug mentioned above
    (as well as other quirks of the current mapping I won't detail here),
    but it doesn't address other problems present in the current mapping;
    in particular, both mappings translate the following

        atomic_compare_exchange_weak_explicit(-, -, -,
                                              memory_order_acquire,
                                              memory_order_relaxed);

    to a sequence 

        lr.w
        bne
        sc.w.aq

    (withouth any other synchronization/fences), which contrasts with the
    the Unprivileged Spec, Section 10,2 "Load-Reserve / Store-Conditional
    Instructions":

      "Software should not set the 'rl' bit on an LR instruction unless
      the 'aq' bit is also set, nor should software set the 'aq' bit on
      an SC instruction unless the 'rl' bit is also set.  LR.rl and SC.aq
      instructions are not guaranteed to provide any stronger ordering
      than those with both bits clear [...]"

Thanks,
  Andrea

[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595712.html


C c-cmpxchg

{}

P0(atomic_int *x, atomic_int *y, int *z)
{
        int r0;

        atomic_store_explicit(x, 1, memory_order_relaxed);
        r0 = atomic_compare_exchange_weak_explicit(y, z, 1, 
memory_order_release, memory_order_relaxed);
}

P1(atomic_int *x, atomic_int *y)
{
        int r1;
        int r2;

        r1 = atomic_load_explicit(y, memory_order_acquire);
        r2 = atomic_load_explicit(x, memory_order_relaxed);
}

exists (1:r1=1 /\ 1:r2=0)

Reply via email to