Hi Stephen, the patch we submitted is safe, it has been verified that it indeed fixes the bug: Besides rmem and genmc model checkers, we verified the bugfix with herd7 tool, which is the official tool ARM provides to check such pieces of code against their memory model.
Actually, I think there are other barriers in this MCS lock implementation that could be weakened without causing problems, but that would be an optimization. This patch is just to fix the bug. Best regards, -Diogo -----Original Message----- From: Honnappa Nagarahalli [mailto:honnappa.nagaraha...@arm.com] Sent: Wednesday, November 25, 2020 5:51 AM To: Stephen Hemminger <step...@networkplumber.org> Cc: Diogo Behrens <diogo.behr...@huawei.com>; tho...@monjalon.net; david.march...@redhat.com; dev@dpdk.org; nd <n...@arm.com>; Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; nd <n...@arm.com> Subject: RE: [dpdk-dev] [PATCH] librte_eal: fix mcslock hang on weak memory <snip> > > > > > > > > The initialization me->locked=1 in lock() must happen before > > > next->locked=0 in unlock(), otherwise a thread may hang forever, > > > waiting me->locked become 0. On weak memory systems (such as > ARMv8), > > > the current implementation allows me->locked=1 to be reordered with > > > announcing the node (pred->next=me) and, consequently, to be > > > reordered with next->locked=0 in unlock(). > > > > > > This fix adds a release barrier to pred->next=me, forcing > > > me->locked=1 to happen before this operation. > > > > > > Signed-off-by: Diogo Behrens <diogo.behr...@huawei.com> > > The change looks fine to me. I have tested this on few x86 and Arm > > machines. > > Acked-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > > Maybe a simpler alternative would be as fast and safer. Why is this safer? > By using compare_exchange you can get same effect in one operation. > Like the following UNTESTED. > > diff --git a/lib/librte_eal/include/generic/rte_mcslock.h > b/lib/librte_eal/include/generic/rte_mcslock.h > index 78b0df295e2d..9c537ce577e6 100644 > --- a/lib/librte_eal/include/generic/rte_mcslock.h > +++ b/lib/librte_eal/include/generic/rte_mcslock.h > @@ -48,23 +48,23 @@ rte_mcslock_lock(rte_mcslock_t **msl, rte_mcslock_t > *me) > rte_mcslock_t *prev; > > /* Init me node */ > - __atomic_store_n(&me->locked, 1, __ATOMIC_RELAXED); > - __atomic_store_n(&me->next, NULL, __ATOMIC_RELAXED); > + me->locked = 1; > > - /* If the queue is empty, the exchange operation is enough to acquire > - * the lock. Hence, the exchange operation requires acquire semantics. > - * The store to me->next above should complete before the node is > - * visible to other CPUs/threads. Hence, the exchange operation > requires > - * release semantics as well. > + /* > + * Atomic insert into single linked list > */ > - prev = __atomic_exchange_n(msl, me, __ATOMIC_ACQ_REL); > + do { > + prev = __atomic_load_n(msl, __ATOMIC_RELAXED); > + me->next = prev; This needs to be __atomic_store_n(__ATOMIC_RELEASE) as it can sink below the following line. > + } while (!__atomic_compare_exchange_n(&msl, me, prev, > + __ATOMIC_ACQUIRE, > __ATOMIC_RELAXED)); > + > if (likely(prev == NULL)) { > /* Queue was empty, no further action required, > * proceed with lock taken. > */ > return; > } > - __atomic_store_n(&prev->next, me, __ATOMIC_RELAXED); > > /* The while-load of me->locked should not move above the previous > * store to prev->next. Otherwise it will cause a deadlock. Need a