Re: [dpdk-dev] [PATCH v2] ring: guarantee ordering of cons/prod loading when doing

Jerin Jacob Thu, 02 Nov 2017 10:01:38 -0700

-----Original Message-----
> Date: Thu, 2 Nov 2017 16:16:33 +0000
> From: "Ananyev, Konstantin" <[email protected]>
> To: Jia He <[email protected]>, "[email protected]"
>  <[email protected]>, "[email protected]" <[email protected]>,
>  "[email protected]" <[email protected]>
> CC: "Richardson, Bruce" <[email protected]>, "[email protected]"
>  <[email protected]>, "[email protected]" <[email protected]>,
>  "[email protected]" <[email protected]>,
>  "[email protected]" <[email protected]>,
>  "[email protected]" <[email protected]>
> Subject: RE: [PATCH v2] ring: guarantee ordering of cons/prod loading when
>  doing
> 
> 
> 
> > -----Original Message-----
> > From: Jia He [mailto:[email protected]]
> > Sent: Thursday, November 2, 2017 3:43 PM
> > To: Ananyev, Konstantin <[email protected]>; 
> > [email protected]; [email protected]; [email protected]
> > Cc: Richardson, Bruce <[email protected]>; [email protected]; 
> > [email protected]; [email protected];
> > [email protected]; [email protected]
> > Subject: Re: [PATCH v2] ring: guarantee ordering of cons/prod loading when 
> > doing
> > 
> > Hi Ananyev
> > 
> > 
> > On 11/2/2017 9:26 PM, Ananyev, Konstantin Wrote:
> > > Hi Jia,
> > >
> > >> -----Original Message-----
> > >> From: Jia He [mailto:[email protected]]
> > >> Sent: Thursday, November 2, 2017 8:44 AM
> > >> To: [email protected]; [email protected]; [email protected]
> > >> Cc: Ananyev, Konstantin <[email protected]>; Richardson, 
> > >> Bruce <[email protected]>; [email protected];
> > >> [email protected]; Jia He <[email protected]>; 
> > >> [email protected]; [email protected]; jia.he@hxt-
> > >> semitech.com
> > >> Subject: [PATCH v2] ring: guarantee ordering of cons/prod loading when 
> > >> doing
> > >>
> > >> We watched a rte panic of mbuf_autotest in our qualcomm arm64 server.
> > >> As for the possible race condition, please refer to [1].
> > >>
> > >> Furthermore, there are 2 options as suggested by Jerin:
> > >> 1. use rte_smp_rmb
> > >> 2. use load_acquire/store_release(refer to [2]).
> > >> CONFIG_RTE_ATOMIC_ACQUIRE_RELEASE_BARRIER_PREFER is provided, and by
> > >> default it is n;
> > >>
> > >> The reason why providing 2 options is due to the performance benchmark
> > >> difference in different arm machines, please refer to [3].
> > >>
> > >> Already fuctionally tested on the machines as follows:
> > >> on X86(passed the compilation)
> > >> on arm64 with CONFIG_RTE_ATOMIC_ACQUIRE_RELEASE_BARRIER_PREFER=y
> > >> on arm64 with CONFIG_RTE_ATOMIC_ACQUIRE_RELEASE_BARRIER_PREFER=n
> > >>
> > >> [1] http://dpdk.org/ml/archives/dev/2017-October/078366.html
> > >> [2] 
> > >> https://github.com/freebsd/freebsd/blob/master/sys/sys/buf_ring.h#L170
> > >> [3] http://dpdk.org/ml/archives/dev/2017-October/080861.html
> > >>
> > >> ---
> > >> Changelog:
> > >> V2: let users choose whether using load_acquire/store_release
> > >> V1: rte_smp_rmb() between 2 loads
> > >>
> > >> Signed-off-by: Jia He <[email protected]>
> > >> Signed-off-by: [email protected]
> > >> Signed-off-by: [email protected]
> > >> Signed-off-by: [email protected]
> > >> Suggested-by: [email protected]
> > >> ---
> > >>   lib/librte_ring/Makefile           |  4 +++-
> > >>   lib/librte_ring/rte_ring.h         | 38 ++++++++++++++++++++++++------
> > >>   lib/librte_ring/rte_ring_arm64.h   | 48 
> > >> ++++++++++++++++++++++++++++++++++++++
> > >>   lib/librte_ring/rte_ring_generic.h | 45 
> > >> +++++++++++++++++++++++++++++++++++
> > >>   4 files changed, 127 insertions(+), 8 deletions(-)
> > >>   create mode 100644 lib/librte_ring/rte_ring_arm64.h
> > >>   create mode 100644 lib/librte_ring/rte_ring_generic.h
> > >>
> > >> diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
> > >> index e34d9d9..fa57a86 100644
> > >> --- a/lib/librte_ring/Makefile
> > >> +++ b/lib/librte_ring/Makefile
> > >> @@ -45,6 +45,8 @@ LIBABIVER := 1
> > >>   SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
> > >>
> > >>   # install includes
> > >> -SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h
> > >> +SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \
> > >> +                                        rte_ring_arm64.h \
> > >> +                                        rte_ring_generic.h
> > >>
> > >>   include $(RTE_SDK)/mk/rte.lib.mk
> > >> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > >> index 5e9b3b7..943b1f9 100644
> > >> --- a/lib/librte_ring/rte_ring.h
> > >> +++ b/lib/librte_ring/rte_ring.h
> > >> @@ -103,6 +103,18 @@ extern "C" {
> > >>   #include <rte_memzone.h>
> > >>   #include <rte_pause.h>
> > >>
> > >> +/* In those strong memory models (e.g. x86), there is no need to add 
> > >> rmb()
> > >> + * between load and load.
> > >> + * In those weak models(powerpc/arm), there are 2 choices for the users
> > >> + * 1.use rmb() memory barrier
> > >> + * 2.use one-direcion load_acquire/store_release barrier
> > >> + * It depends on performance test results. */
> > >> +#ifdef RTE_ARCH_ARM64
> > >> +#include "rte_ring_arm64.h"
> > >> +#else
> > >> +#include "rte_ring_generic.h"
> > >> +#endif
> > >> +
> > >>   #define RTE_TAILQ_RING_NAME "RTE_RING"
> > >>
> > >>   enum rte_ring_queue_behavior {
> > >> @@ -368,7 +380,7 @@ update_tail(struct rte_ring_headtail *ht, uint32_t 
> > >> old_val, uint32_t new_val,
> > >>                  while (unlikely(ht->tail != old_val))
> > >>                          rte_pause();
> > >>
> > >> -        ht->tail = new_val;
> > >> +        arch_rte_atomic_store(&ht->tail, new_val, __ATOMIC_RELEASE);
> > >>   }
> > >>
> > >>   /**
> > >> @@ -408,7 +420,8 @@ __rte_ring_move_prod_head(struct rte_ring *r, int 
> > >> is_sp,
> > >>                  /* Reset n to the initial burst count */
> > >>                  n = max;
> > >>
> > >> -                *old_head = r->prod.head;
> > >> +                *old_head = arch_rte_atomic_load(&r->prod.head,
> > >> +                                                __ATOMIC_ACQUIRE);
> > >>                  const uint32_t cons_tail = r->cons.tail;
> > > The code starts to look a bit messy with all these arch specific macros...
> > > So I wonder wouldn't it be more cleaner to:
> > >
> > > 1. move existing 
> > > __rte_ring_move_prod_head/__rte_ring_move_cons_head/update_tail
> > > into rte_ring_generic.h
> > > 2. Add rte_smp_rmb into generic 
> > > __rte_ring_move_prod_head/__rte_ring_move_cons_head
> > > (as was in v1 of your patch).
> > > 3. Introduce ARM specific versions of 
> > > __rte_ring_move_prod_head/__rte_ring_move_cons_head/update_tail
> > > in the rte_ring_arm64.h
> > >
> > > That way we will keep ogneric code simple and clean, while still allowing 
> > > arch specific optimizations.
> > Thanks for your review.
> > But as per your suggestion, there will be at least 2 copies of
> > __rte_ring_move_prod_head/__rte_ring_move_cons_head/update_tail.
> > Thus, if there are any bugs in the future, both 2 copies have to be
> > changed, right?
> 
> Yes, there would be some code duplication.
> Though generic code would be much cleaner and simpler to read in that case.
> Which is worth some dups I think.


I agree with Konstantin here.

Re: [dpdk-dev] [PATCH v2] ring: guarantee ordering of cons/prod loading when doing

Reply via email to