Before this patch: In __rte_ring_move_cons_head() ... do { /* Restore n as it may change every loop */ n = max;
*old_head = r->cons.head; //1st load const uint32_t prod_tail = r->prod.tail; //2nd load In weak memory order architectures(powerpc,arm), the 2nd load might be reodered before the 1st load, that makes *entries is bigger than we wanted. This nasty reording messed enque/deque up. cpu1(producer) cpu2(consumer) cpu3(consumer) load r->prod.tail in enqueue: load r->cons.tail load r->prod.head store r->prod.tail load r->cons.head load r->prod.tail ... store r->cons.{head,tail} load r->cons.head THEN,r->cons.head will be bigger than prod_tail, then make *entries very big After this patch, the old cons.head will be recaculated after failure of rte_atomic32_cmpset There is no such issue in X86 cpu, because X86 is strong memory order model Signed-off-by: Jia He <hejia...@gmail.com> Signed-off-by: jia...@hxt-semitech.com Signed-off-by: jie2....@hxt-semitech.com Signed-off-by: bing.z...@hxt-semitech.com --- lib/librte_ring/rte_ring.h | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h index 5e9b3b7..15c72e2 100644 --- a/lib/librte_ring/rte_ring.h +++ b/lib/librte_ring/rte_ring.h @@ -409,6 +409,10 @@ __rte_ring_move_prod_head(struct rte_ring *r, int is_sp, n = max; *old_head = r->prod.head; + + /* load of prod.tail can't be reordered before cons.head */ + rte_smp_rmb(); + const uint32_t cons_tail = r->cons.tail; /* * The subtraction is done between two unsigned 32bits value @@ -517,6 +521,10 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc, n = max; *old_head = r->cons.head; + + /* load of prod.tail can't be reordered before cons.head */ + rte_smp_rmb(); + const uint32_t prod_tail = r->prod.tail; /* The subtraction is done between two unsigned 32bits value * (the result is always modulo 32 bits even if we have -- 2.7.4