> > > > > >>> In weak memory models, like arm64, reading the {prod,cons}.tail > > > > > >>> may get reordered after reading or writing the ring slots, which > > > > > >>> corrupts the ring and stale data is observed. > > > > > >>> > > > > > >>> This issue was reported by NXP on 8-A72 DPAA2 board. The > > problem > > > > > >>> is > > > > > >> most > > > > > >>> likely caused by missing the acquire semantics when reading > > > > > >>> cons.tail (in SP enqueue) or prod.tail (in SC dequeue) which > > > > > >>> makes it possible to > > > > > read > > > > > >> a > > > > > >>> stale value from the ring slots. > > > > > >>> > > > > > >>> For MP (and MC) case, rte_atomic32_cmpset() already provides > > the > > > > > >> required > > > > > >>> ordering. This patch is to prevent reading and writing the ring > > > > > >>> slots get reordered before reading {prod,cons}.tail for SP (and > > > > > >>> SC) > > > case. > > > > > >> > > > > > >> Read barrier rte_smp_rmb() is OK to prevent reading the ring get > > > > > >> reordered before reading the tail. However, to prevent *writing* > > > > > >> the ring get reordered *before reading* the tail you need a full > > > > > >> memory barrier, i.e. > > > > > >> rte_smp_mb(). > > > > > > > > > > > > ISHLD(rte_smp_rmb is DMB(ishld) orders LD/LD and LD/ST, while > > > > > > WMB(ST > > > > > Option) orders ST/ST. > > > > > > For more details, please refer to: Table B2-1 Encoding of the DMB > > > > > > and DSB > > > > > <option> parameter in > > > > > > https://developer.arm.com/docs/ddi0487/latest/arm-architecture- > > > > > reference-manual-armv8-for-armv8-a-architecture-profile > > > > > > > > > > I see. But you have to change the rte_smp_rmb() function definition > > > > > in lib/librte_eal/common/include/generic/rte_atomic.h and assure > > > > > that all other architectures follows same rules. > > > > > Otherwise, this change is logically wrong, because read barrier in > > > > > current definition could not be used to order Load with Store. > > > > > > > > > > > > > Good points, let me re-think how to handle for other architectures. > > > > Full MB is required for other architectures(x86? Ppc?), but for arm, > > > > read > > > barrier(load/store and load/load) is enough. > > > > > > For x86, I don't think you need any barrier here, as with IA memory mode: > > > - Reads are not reordered with other reads. > > > - Writes are not reordered with older reads. > > Agree > > I understand herein no instruction level barriers are required for IA, but > how about the > compiler barrier: rte_compiler_barrier? > > > > > > > > > BTW, could you explain a bit more why barrier is necessary even on arm > > here? > > > As I can see, there is a data dependency between the tail value and > > > subsequent address calculations for ring writes/reads. > > > Isn't that sufficient to prevent re-ordering even for weak memory model? > > The tail value affects 'n'. But, the value of 'n' can be speculated because > > of > > the following 'if' statement: > > > > if (unlikely(n > *free_entries)) > > n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : > > *free_entries; > > > > The address calculations for actual ring writes/reads do not depend on the > > tail value.
Ok, agree I formulated it wrongly, only limit value is dependent on cons.tail. Address is not. >Since 'n' can be speculated, the writes/reads can be moved up > > before the load of the tail value. For my curiosity: ok, I understand that 'n' value can be speculated, and speculative stores could start before n is calculated properly... But are you saying that such speculative store results might be visible to the other observers (different cpu)? > > Good explanation. The address calculations does not depend on tail/n, only the > limit/last one depends on it, while it can be speculated. > > > > Konstantin > > > > > > > > <snip>