On Thu, 12 Oct 2017 17:53:51 +0200 Olivier MATZ <olivier.m...@6wind.com> wrote:
> Hi, > > On Tue, Oct 10, 2017 at 05:56:36PM +0800, Jia He wrote: > > Before this patch: > > In __rte_ring_move_cons_head() > > ... > > do { > > /* Restore n as it may change every loop */ > > n = max; > > > > *old_head = r->cons.head; //1st load > > const uint32_t prod_tail = r->prod.tail; //2nd load > > > > In weak memory order architectures(powerpc,arm), the 2nd load might be > > reodered before the 1st load, that makes *entries is bigger than we wanted. > > This nasty reording messed enque/deque up. > > > > cpu1(producer) cpu2(consumer) cpu3(consumer) > > load r->prod.tail > > in enqueue: > > load r->cons.tail > > load r->prod.head > > > > store r->prod.tail > > > > load r->cons.head > > load r->prod.tail > > ... > > store r->cons.{head,tail} > > load r->cons.head > > > > THEN,r->cons.head will be bigger than prod_tail, then make *entries very big > > > > After this patch, the old cons.head will be recaculated after failure of > > rte_atomic32_cmpset > > > > There is no such issue in X86 cpu, because X86 is strong memory order model > > > > Signed-off-by: Jia He <hejia...@gmail.com> > > Signed-off-by: jia...@hxt-semitech.com > > Signed-off-by: jie2....@hxt-semitech.com > > Signed-off-by: bing.z...@hxt-semitech.com > > > > --- > > lib/librte_ring/rte_ring.h | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h > > index 5e9b3b7..15c72e2 100644 > > --- a/lib/librte_ring/rte_ring.h > > +++ b/lib/librte_ring/rte_ring.h > > @@ -409,6 +409,10 @@ __rte_ring_move_prod_head(struct rte_ring *r, int > > is_sp, > > n = max; > > > > *old_head = r->prod.head; > > + > > + /* load of prod.tail can't be reordered before cons.head */ > > + rte_smp_rmb(); > > + > > const uint32_t cons_tail = r->cons.tail; > > /* > > * The subtraction is done between two unsigned 32bits value > > @@ -517,6 +521,10 @@ __rte_ring_move_cons_head(struct rte_ring *r, int > > is_sc, > > n = max; > > > > *old_head = r->cons.head; > > + > > + /* load of prod.tail can't be reordered before cons.head */ > > + rte_smp_rmb(); > > + > > const uint32_t prod_tail = r->prod.tail; > > /* The subtraction is done between two unsigned 32bits value > > * (the result is always modulo 32 bits even if we have > > -- > > 2.7.4 > > > > The explanation convinces me. > > However, since it's in a critical path, it would be good to have other > opinions. This patch reminds me this discussion, that was also related to > memory barrier, but at another place: > http://dpdk.org/ml/archives/dev/2016-July/043765.html > Lead to that patch: http://dpdk.org/browse/dpdk/commit/?id=ecc7d10e448e > But finally reverted: http://dpdk.org/browse/dpdk/commit/?id=c3acd92746c3 > > Konstatin, Jerin, do you have any comment? On strongly ordered architectures like x86 rte_smp_rmb() turns into compiler_barrier(). Compiler barriers generate no instructions, they just tell the compiler to not do data flow optimization across. Since the pointers are mostly volatile it shouldn't even change the generated code.