On 2020-05-18, Linus Torvalds <torva...@linux-foundation.org> wrote:
>>                 smp_mb(); /* LMM(data_push_tail:C) */
>>
>> +               if (atomic_long_try_cmpxchg_relaxed(&data_ring->tail_lpos,
>> +                               &tail_lpos,
>> +                               next_lpos)) { /* LMM(data_push_tail:D) */
>> +                       break;
>> +               }
>
> Doing an "smp_mb()" followed by a "cmpxchg_relaxed" seems all kinds of
> odd and pointless, and is very much non-optimal on x86 for example.,
>
> Just remove the smp_mb(), and use the non-relaxed form of cmpxchg.
> It's defined to be fully ordered if it succeeds (and if the cmpxchg
> doesn't succeed, it's a no-op and the memory barrier shouldn't make
> any difference).
>
> Otherwise you'll do two memory ordering operations on x86 (and
> probably some other architectures), since the cmpxchg is always
> ordered on x86 and there exists no "relaxed" form of it.

ACK.

All three smp_mb()'s and both smp_wmb()'s sit directly next to
cmpxchg_relaxed() calls. Having explicit memory barriers was helpful for
identifying, proving, and testing a minimal set of pairs (on arm64), but
all will be folded into full cmpxchg()'s for the next version.

John Ogness

Reply via email to