> -----Original Message----- > From: Jia He [mailto:hejia...@gmail.com] > Sent: Friday, November 10, 2017 1:51 AM > To: jerin.ja...@caviumnetworks.com; dev@dpdk.org; olivier.m...@6wind.com > Cc: Ananyev, Konstantin <konstantin.anan...@intel.com>; Richardson, Bruce > <bruce.richard...@intel.com>; jianbo....@arm.com; > hemant.agra...@nxp.com; Jia He <hejia...@gmail.com>; Jia He > <jia...@hxt-semitech.com>; jie2....@hxt-semitech.com; bing.zhao@hxt- > semitech.com > Subject: [PATCH v5 1/1] ring: guarantee load/load order in enqueue and dequeue > > We watched a rte panic of mbuf_autotest in our qualcomm arm64 server. > In __rte_ring_move_cons_head() > ... > do { > /* Restore n as it may change every loop */ > n = max; > > *old_head = r->cons.head; //1st load > const uint32_t prod_tail = r->prod.tail; //2nd load > > cpu1(producer) cpu2(consumer) cpu3(consumer) > load r->prod.tail > in enqueue: > load r->cons.tail > load r->prod.head > > store r->prod.tail > > load r->cons.head > load r->prod.tail > ... > store r->cons.{head,tail} > load r->cons.head > > In weak memory order architectures(powerpc,arm), the 2nd load might be > reodered before the 1st load, that makes *entries is bigger than we > wanted. This nasty reording messed enque/deque up. Then, r->cons.head > will be bigger than prod_tail, then make *entries very big and the > consumer will go forward incorrectly. > > After this patch, even with above context switches, the old cons.head > will be recaculated after failure of rte_atomic32_cmpset. So no race > conditions left. > > There is no such issue on X86, because X86 is strong memory order model. > But rte_smp_rmb() doesn't have impact on runtime performance on X86, so > keep the same code without architectures specific concerns. > > Signed-off-by: Jia He <jia...@hxt-semitech.com> > Signed-off-by: jie2....@hxt-semitech.com > Signed-off-by: bing.z...@hxt-semitech.com > --- > lib/librte_ring/rte_ring.h | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h > index 5e9b3b7..3e8085a 100644 > --- a/lib/librte_ring/rte_ring.h > +++ b/lib/librte_ring/rte_ring.h > @@ -409,6 +409,11 @@ __rte_ring_move_prod_head(struct rte_ring *r, int is_sp, > n = max; > > *old_head = r->prod.head; > + > + /* add rmb barrier to avoid load/load reorder in weak > + * memory model. It is noop on x86 */ > + rte_smp_rmb(); > + > const uint32_t cons_tail = r->cons.tail; > /* > * The subtraction is done between two unsigned 32bits value > @@ -517,6 +522,11 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc, > n = max; > > *old_head = r->cons.head; > + > + /* add rmb barrier to avoid load/load reorder in weak > + * memory model. It is noop on x86 */ > + rte_smp_rmb(); > + > const uint32_t prod_tail = r->prod.tail; > /* The subtraction is done between two unsigned 32bits value > * (the result is always modulo 32 bits even if we have > --
Acked-by: Konstantin Ananyev <konstantin.anan...@intel.com> > 2.7.4