Hi Jerin

On 10/13/2017 1:23 AM, Jerin Jacob Wrote:
-----Original Message-----
Date: Thu, 12 Oct 2017 17:05:50 +0000
From: "Ananyev, Konstantin" <konstantin.anan...@intel.com>
To: Olivier MATZ <olivier.m...@6wind.com>, Jia He <hejia...@gmail.com>
CC: "dev@dpdk.org" <dev@dpdk.org>, "jia...@hxt-semitech.com"
  <jia...@hxt-semitech.com>, "jie2....@hxt-semitech.com"
  <jie2....@hxt-semitech.com>, "bing.z...@hxt-semitech.com"
  <bing.z...@hxt-semitech.com>, "jerin.ja...@caviumnetworks.com"
  <jerin.ja...@caviumnetworks.com>
Subject: RE: [PATCH] ring: guarantee ordering of cons/prod loading when
  doing enqueue/dequeue

Hi guys,

-----Original Message-----
From: Olivier MATZ [mailto:olivier.m...@6wind.com]
Sent: Thursday, October 12, 2017 4:54 PM
To: Jia He <hejia...@gmail.com>
Cc: dev@dpdk.org; jia...@hxt-semitech.com; jie2....@hxt-semitech.com; 
bing.z...@hxt-semitech.com; Ananyev, Konstantin
<konstantin.anan...@intel.com>; jerin.ja...@caviumnetworks.com
Subject: Re: [PATCH] ring: guarantee ordering of cons/prod loading when doing 
enqueue/dequeue

Hi,

On Tue, Oct 10, 2017 at 05:56:36PM +0800, Jia He wrote:
Before this patch:
In __rte_ring_move_cons_head()
...
         do {
                 /* Restore n as it may change every loop */
                 n = max;

                 *old_head = r->cons.head;                //1st load
                 const uint32_t prod_tail = r->prod.tail; //2nd load

In weak memory order architectures(powerpc,arm), the 2nd load might be
reodered before the 1st load, that makes *entries is bigger than we wanted.
This nasty reording messed enque/deque up.

cpu1(producer)          cpu2(consumer)          cpu3(consumer)
                         load r->prod.tail
in enqueue:
load r->cons.tail
load r->prod.head

store r->prod.tail

                                                 load r->cons.head
                                                 load r->prod.tail
                                                 ...
                                                 store r->cons.{head,tail}
                         load r->cons.head

THEN,r->cons.head will be bigger than prod_tail, then make *entries very big

After this patch, the old cons.head will be recaculated after failure of
rte_atomic32_cmpset

There is no such issue in X86 cpu, because X86 is strong memory order model

Signed-off-by: Jia He <hejia...@gmail.com>
Signed-off-by: jia...@hxt-semitech.com
Signed-off-by: jie2....@hxt-semitech.com
Signed-off-by: bing.z...@hxt-semitech.com

---
  lib/librte_ring/rte_ring.h | 8 ++++++++
  1 file changed, 8 insertions(+)

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 5e9b3b7..15c72e2 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -409,6 +409,10 @@ __rte_ring_move_prod_head(struct rte_ring *r, int is_sp,
                n = max;

                *old_head = r->prod.head;
+
+               /* load of prod.tail can't be reordered before cons.head */
+               rte_smp_rmb();
+
                const uint32_t cons_tail = r->cons.tail;
                /*
                 *  The subtraction is done between two unsigned 32bits value
@@ -517,6 +521,10 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
                n = max;

                *old_head = r->cons.head;
+
+               /* load of prod.tail can't be reordered before cons.head */
+               rte_smp_rmb();
+
                const uint32_t prod_tail = r->prod.tail;
                /* The subtraction is done between two unsigned 32bits value
                 * (the result is always modulo 32 bits even if we have
--
2.7.4

The explanation convinces me.

However, since it's in a critical path, it would be good to have other
opinions. This patch reminds me this discussion, that was also related to
memory barrier, but at another place:
http://dpdk.org/ml/archives/dev/2016-July/043765.html
Lead to that patch: http://dpdk.org/browse/dpdk/commit/?id=ecc7d10e448e
But finally reverted: http://dpdk.org/browse/dpdk/commit/?id=c3acd92746c3

Konstatin, Jerin, do you have any comment?
For IA, as rte_smp_rmb() is just a compiler_barrier, that patch shouldn't make 
any difference,
but  I can't see how read reordering would screw things up here...
Probably just me and arm or ppc guys could explain what will be the problem
if let say cons.tail will be read before prod.head in 
__rte_ring_move_prod_head().
I wonder Is there a simple test-case to reproduce that problem (on arm or ppc)?
Probably new test-case for rte_ring autotest is needed, or is it possible to 
reproduce
it with existing one?
On the same lines,

Jia He, jie2.liu, bing.zhao,

Is this patch based on code review or do you saw this issue on any of the
arm/ppc target? arm64 will have performance impact with this change.
Based on mbuf_autotest, the rte_panic will be invoked in seconds.

PANIC in test_refcnt_iter():
(lcore=0, iter=0): after 10s only 61 of 64 mbufs left free
1: [./test(rte_dump_stack+0x38) [0x58d868]]
Aborted (core dumped)

Cheers,
Jia


Konstantin

Reply via email to