Hi Thomas, we are still waiting for the comments from Honnappa. In our understanding, the missing barrier is a bug according to the model. We reproduced the scenario in herd7, which represents the authoritative memory model: https://developer.arm.com/architectures/cpu-architecture/a-profile/memory-model-tool
Here is a litmus code that shows that the XCHG (when compiled to LDAXR and STLR) is not atomic wrt memory updates to other locations: ----- AArch64 XCHG-nonatomic { 0:X1=locked; 0:X3=next; 1:X1=locked; 1:X3=next; 1:X5=tail; } P0 | P1; LDR W0, [X3] | MOV W0, #1; CBZ W0, end | STR W0, [X1]; (* init locked *) MOV W2, #2 | MOV W2, #0; STR W2, [X1] | xchg:; end: | LDAXR W6, [X5]; NOP | STLXR W4, W0, [X5]; NOP | CBNZ W4, xchg; NOP | STR W0, [X3]; (* set next *) exists (0:X2=2 /\ locked=1) ----- (web version of herd7: http://diy.inria.fr/www/?record=aarch64) P1 is trying to acquire the lock: - initializes locked - does the xchg on the tail of the mcslock - sets the next P0 is releasing the lock: - if next is not set, just terminates - if next is set, stores 2 in locked The initialization of locked should never overwrite the store 2 to locked, but it does. To avoid that reordering to happen, one should make the last store of P1 to have a "release" barrier, ie, STLR. This is equivalent to the reordering occurring in the mcslock of librte_eal. Best regards, -Diogo -----Original Message----- From: Thomas Monjalon [mailto:tho...@monjalon.net] Sent: Tuesday, October 6, 2020 11:50 PM To: Phil Yang <phil.y...@arm.com>; Diogo Behrens <diogo.behr...@huawei.com>; Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> Cc: dev@dpdk.org; nd <n...@arm.com> Subject: Re: [dpdk-dev] [PATCH] librte_eal: fix mcslock hang on weak memory 31/08/2020 20:45, Honnappa Nagarahalli: > > Hi Diogo, > > Thanks for your explanation. > > As documented in https://developer.arm.com/documentation/ddi0487/fc B2.9.5 > Load-Exclusive and Store-Exclusive instruction usage restrictions: > " Between the Load-Exclusive and the Store-Exclusive, there are no > explicit memory accesses, preloads, direct or indirect System register > writes, address translation instructions, cache or TLB maintenance > instructions, exception generating instructions, exception returns, or > indirect branches." > [Honnappa] This is a requirement on the software, not on the > micro-architecture. > We are having few discussions internally, will get back soon. > > So it is not allowed to insert (1) & (4) between (2, 3). The cmpxchg > operation is atomic. Please what is the conclusion?