> -----Original Message----- > From: Jia He [mailto:[email protected]] > Sent: Friday, November 10, 2017 1:51 AM > To: [email protected]; [email protected]; [email protected] > Cc: Ananyev, Konstantin <[email protected]>; Richardson, Bruce > <[email protected]>; [email protected]; > [email protected]; Jia He <[email protected]>; Jia He > <[email protected]>; [email protected]; bing.zhao@hxt- > semitech.com > Subject: [PATCH v5 1/1] ring: guarantee load/load order in enqueue and dequeue > > We watched a rte panic of mbuf_autotest in our qualcomm arm64 server. > In __rte_ring_move_cons_head() > ... > do { > /* Restore n as it may change every loop */ > n = max; > > *old_head = r->cons.head; //1st load > const uint32_t prod_tail = r->prod.tail; //2nd load > > cpu1(producer) cpu2(consumer) cpu3(consumer) > load r->prod.tail > in enqueue: > load r->cons.tail > load r->prod.head > > store r->prod.tail > > load r->cons.head > load r->prod.tail > ... > store r->cons.{head,tail} > load r->cons.head > > In weak memory order architectures(powerpc,arm), the 2nd load might be > reodered before the 1st load, that makes *entries is bigger than we > wanted. This nasty reording messed enque/deque up. Then, r->cons.head > will be bigger than prod_tail, then make *entries very big and the > consumer will go forward incorrectly. > > After this patch, even with above context switches, the old cons.head > will be recaculated after failure of rte_atomic32_cmpset. So no race > conditions left. > > There is no such issue on X86, because X86 is strong memory order model. > But rte_smp_rmb() doesn't have impact on runtime performance on X86, so > keep the same code without architectures specific concerns. > > Signed-off-by: Jia He <[email protected]> > Signed-off-by: [email protected] > Signed-off-by: [email protected] > --- > lib/librte_ring/rte_ring.h | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h > index 5e9b3b7..3e8085a 100644 > --- a/lib/librte_ring/rte_ring.h > +++ b/lib/librte_ring/rte_ring.h > @@ -409,6 +409,11 @@ __rte_ring_move_prod_head(struct rte_ring *r, int is_sp, > n = max; > > *old_head = r->prod.head; > + > + /* add rmb barrier to avoid load/load reorder in weak > + * memory model. It is noop on x86 */ > + rte_smp_rmb(); > + > const uint32_t cons_tail = r->cons.tail; > /* > * The subtraction is done between two unsigned 32bits value > @@ -517,6 +522,11 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc, > n = max; > > *old_head = r->cons.head; > + > + /* add rmb barrier to avoid load/load reorder in weak > + * memory model. It is noop on x86 */ > + rte_smp_rmb(); > + > const uint32_t prod_tail = r->prod.tail; > /* The subtraction is done between two unsigned 32bits value > * (the result is always modulo 32 bits even if we have > --
Acked-by: Konstantin Ananyev <[email protected]> > 2.7.4

