> > > >> On 07.03.2019 9:45, gavin hu wrote: > > > >>> In weak memory models, like arm64, reading the {prod,cons}.tail > > > >>> may > > get > > > >>> reordered after reading or writing the ring slots, which > > > >>> corrupts the > > ring > > > >>> and stale data is observed. > > > >>> > > > >>> This issue was reported by NXP on 8-A72 DPAA2 board. The problem > > is > > > >> most > > > >>> likely caused by missing the acquire semantics when reading > > > >>> cons.tail > > (in > > > >>> SP enqueue) or prod.tail (in SC dequeue) which makes it possible > > > >>> to > > > read > > > >> a > > > >>> stale value from the ring slots. > > > >>> > > > >>> For MP (and MC) case, rte_atomic32_cmpset() already provides the > > > >> required > > > >>> ordering. This patch is to prevent reading and writing the ring > > > >>> slots get reordered before reading {prod,cons}.tail for SP (and SC) > case. > > > >> > > > >> Read barrier rte_smp_rmb() is OK to prevent reading the ring get > > > >> reordered before reading the tail. However, to prevent *writing* > > > >> the ring get reordered *before reading* the tail you need a full > > > >> memory barrier, i.e. > > > >> rte_smp_mb(). > > > > > > > > ISHLD(rte_smp_rmb is DMB(ishld) orders LD/LD and LD/ST, while > > WMB(ST > > > Option) orders ST/ST. > > > > For more details, please refer to: Table B2-1 Encoding of the DMB > > > > and > > DSB > > > <option> parameter in > > > > https://developer.arm.com/docs/ddi0487/latest/arm-architecture- > > > reference-manual-armv8-for-armv8-a-architecture-profile > > > > > > I see. But you have to change the rte_smp_rmb() function definition > > > in lib/librte_eal/common/include/generic/rte_atomic.h and assure > > > that all other architectures follows same rules. > > > Otherwise, this change is logically wrong, because read barrier in > > > current definition could not be used to order Load with Store. > > > > > > > Good points, let me re-think how to handle for other architectures. > > Full MB is required for other architectures(x86? Ppc?), but for arm, > > read barrier(load/store and load/load) is enough. > > Hi Ilya, > > I would expand the rmb definition to cover load/store, in addition to > load/load. > For X86, as a strong memory order model, rmb is actually equivalent to mb, > as implemented as a compiler barrier: rte_compiler_barrier, arm32 is also > this case. > For PPC, both 32 and 64-bit, rmb=wmb=mb, lwsync/sync orders load/store, > load/load, store/load, store/store, looking at the table on this page: > https://www.ibm.com/developerworks/systems/articles/powerpc.html > > In summary, we are safe to expand this definition for all the architectures > DPDK support? Essentially, it is a documentation bug. i.e. the current implementation of rte_smp_rmb() already behaves as load/load and load/store barrier.
> Any comments are welcome! > > BR. Gavin > <snip>