-----Original Message-----
> Date: Fri, 13 Oct 2017 02:12:58 +0000
> From: "Zhao, Bing" <bing.z...@hxt-semitech.com>
> To: "Liu, Jie2" <jie2....@hxt-semitech.com>, "Ananyev, Konstantin"
>  <konstantin.anan...@intel.com>, Olivier MATZ <olivier.m...@6wind.com>,
>  "dev@dpdk.org" <dev@dpdk.org>, "jerin.ja...@caviumnetworks.com"
>  <jerin.ja...@caviumnetworks.com>
> CC: "He, Jia" <jia...@hxt-semitech.com>, Jia He <hejia...@gmail.com>
> Subject: RE: [PATCH] ring: guarantee ordering of cons/prod loading when
>  doing enqueue/dequeue
> 
> Hi Guys,

Hi Bing,

> If needed, we can provide a simplified UT test case and ring patch to 
> reproduce this. And then we can catch the information to prove that in a 
> SP+MC scenario, when the tail and head pointers' number are smaller than the 
> ring size, some lcore thread will have a fault judge and then make the CH 
> over the PT.

Yes. Please provide simplified UT test case and ring patch.

> 
> Thanks
> 
> BR. Bing
> 
> -----Original Message-----
> From: Liu, Jie2
> Sent: 2017年10月13日 8:25
> To: Ananyev, Konstantin <konstantin.anan...@intel.com>; Olivier MATZ 
> <olivier.m...@6wind.com>; dev@dpdk.org; jerin.ja...@caviumnetworks.com
> Cc: He, Jia <jia...@hxt-semitech.com>; Zhao, Bing 
> <bing.z...@hxt-semitech.com>; Jia He <hejia...@gmail.com>
> Subject: RE: [PATCH] ring: guarantee ordering of cons/prod loading when doing 
> enqueue/dequeue
> 
> Hi guys,
> We found this issue when we run mbuf_autotest. It failed on a aarch64 
> platform. I am not sure if it can be reproduced on other platforms.
> Regards,
> Jie Liu
> 
> -----Original Message-----
> From: Ananyev, Konstantin [mailto:konstantin.anan...@intel.com]
> Sent: 2017年10月13日 1:06
> To: Olivier MATZ <olivier.m...@6wind.com>; Jia He <hejia...@gmail.com>
> Cc: dev@dpdk.org; He, Jia <jia...@hxt-semitech.com>; Liu, Jie2 
> <jie2....@hxt-semitech.com>; Zhao, Bing <bing.z...@hxt-semitech.com>; 
> jerin.ja...@caviumnetworks.com
> Subject: RE: [PATCH] ring: guarantee ordering of cons/prod loading when doing 
> enqueue/dequeue
> 
> Hi guys,
> 
> > -----Original Message-----
> > From: Olivier MATZ [mailto:olivier.m...@6wind.com]
> > Sent: Thursday, October 12, 2017 4:54 PM
> > To: Jia He <hejia...@gmail.com>
> > Cc: dev@dpdk.org; jia...@hxt-semitech.com; jie2....@hxt-semitech.com;
> > bing.z...@hxt-semitech.com; Ananyev, Konstantin
> > <konstantin.anan...@intel.com>; jerin.ja...@caviumnetworks.com
> > Subject: Re: [PATCH] ring: guarantee ordering of cons/prod loading
> > when doing enqueue/dequeue
> >
> > Hi,
> >
> > On Tue, Oct 10, 2017 at 05:56:36PM +0800, Jia He wrote:
> > > Before this patch:
> > > In __rte_ring_move_cons_head()
> > > ...
> > >         do {
> > >                 /* Restore n as it may change every loop */
> > >                 n = max;
> > >
> > >                 *old_head = r->cons.head;                //1st load
> > >                 const uint32_t prod_tail = r->prod.tail; //2nd load
> > >
> > > In weak memory order architectures(powerpc,arm), the 2nd load might
> > > be reodered before the 1st load, that makes *entries is bigger than we 
> > > wanted.
> > > This nasty reording messed enque/deque up.
> > >
> > > cpu1(producer)          cpu2(consumer)          cpu3(consumer)
> > >                         load r->prod.tail in enqueue:
> > > load r->cons.tail
> > > load r->prod.head
> > >
> > > store r->prod.tail
> > >
> > >                                                 load r->cons.head
> > >                                                 load r->prod.tail
> > >                                                 ...
> > >                                                 store r->cons.{head,tail}
> > >                         load r->cons.head
> > >
> > > THEN,r->cons.head will be bigger than prod_tail, then make *entries
> > > very big
> > >
> > > After this patch, the old cons.head will be recaculated after
> > > failure of rte_atomic32_cmpset
> > >
> > > There is no such issue in X86 cpu, because X86 is strong memory
> > > order model
> > >
> > > Signed-off-by: Jia He <hejia...@gmail.com>
> > > Signed-off-by: jia...@hxt-semitech.com
> > > Signed-off-by: jie2....@hxt-semitech.com
> > > Signed-off-by: bing.z...@hxt-semitech.com
> > >
> > > ---
> > >  lib/librte_ring/rte_ring.h | 8 ++++++++
> > >  1 file changed, 8 insertions(+)
> > >
> > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > > index 5e9b3b7..15c72e2 100644
> > > --- a/lib/librte_ring/rte_ring.h
> > > +++ b/lib/librte_ring/rte_ring.h
> > > @@ -409,6 +409,10 @@ __rte_ring_move_prod_head(struct rte_ring *r, int 
> > > is_sp,
> > >  n = max;
> > >
> > >  *old_head = r->prod.head;
> > > +
> > > +/* load of prod.tail can't be reordered before cons.head */
> > > +rte_smp_rmb();
> > > +
> > >  const uint32_t cons_tail = r->cons.tail;
> > >  /*
> > >   *  The subtraction is done between two unsigned 32bits value @@
> > > -517,6 +521,10 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
> > >  n = max;
> > >
> > >  *old_head = r->cons.head;
> > > +
> > > +/* load of prod.tail can't be reordered before cons.head */
> > > +rte_smp_rmb();
> > > +
> > >  const uint32_t prod_tail = r->prod.tail;
> > >  /* The subtraction is done between two unsigned 32bits value
> > >   * (the result is always modulo 32 bits even if we have
> > > --
> > > 2.7.4
> > >
> >
> > The explanation convinces me.
> >
> > However, since it's in a critical path, it would be good to have other
> > opinions. This patch reminds me this discussion, that was also related
> > to memory barrier, but at another place:
> > http://dpdk.org/ml/archives/dev/2016-July/043765.html
> > Lead to that patch:
> > http://dpdk.org/browse/dpdk/commit/?id=ecc7d10e448e
> > But finally reverted:
> > http://dpdk.org/browse/dpdk/commit/?id=c3acd92746c3
> >
> > Konstatin, Jerin, do you have any comment?
> 
> For IA, as rte_smp_rmb() is just a compiler_barrier, that patch shouldn't 
> make any difference, but  I can't see how read reordering would screw things 
> up here...
> Probably just me and arm or ppc guys could explain what will be the problem 
> if let say cons.tail will be read before prod.head in 
> __rte_ring_move_prod_head().
> I wonder Is there a simple test-case to reproduce that problem (on arm or 
> ppc)?
> Probably new test-case for rte_ring autotest is needed, or is it possible to 
> reproduce it with existing one?
> Konstantin
> 
> 
> 
> This email is intended only for the named addressee. It may contain 
> information that is confidential/private, legally privileged, or 
> copyright-protected, and you should handle it accordingly. If you are not the 
> intended recipient, you do not have legal rights to retain, copy, or 
> distribute this email or its contents, and should promptly delete the email 
> and all electronic copies in your system; do not retain copies in any media. 
> If you have received this email in error, please notify the sender promptly. 
> Thank you.
> 
> 

Reply via email to