-----Original Message----- > Date: Thu, 2 Nov 2017 16:16:33 +0000 > From: "Ananyev, Konstantin" <konstantin.anan...@intel.com> > To: Jia He <hejia...@gmail.com>, "jerin.ja...@caviumnetworks.com" > <jerin.ja...@caviumnetworks.com>, "dev@dpdk.org" <dev@dpdk.org>, > "olivier.m...@6wind.com" <olivier.m...@6wind.com> > CC: "Richardson, Bruce" <bruce.richard...@intel.com>, "jianbo....@arm.com" > <jianbo....@arm.com>, "hemant.agra...@nxp.com" <hemant.agra...@nxp.com>, > "jie2....@hxt-semitech.com" <jie2....@hxt-semitech.com>, > "bing.z...@hxt-semitech.com" <bing.z...@hxt-semitech.com>, > "jia...@hxt-semitech.com" <jia...@hxt-semitech.com> > Subject: RE: [PATCH v2] ring: guarantee ordering of cons/prod loading when > doing > > > > > -----Original Message----- > > From: Jia He [mailto:hejia...@gmail.com] > > Sent: Thursday, November 2, 2017 3:43 PM > > To: Ananyev, Konstantin <konstantin.anan...@intel.com>; > > jerin.ja...@caviumnetworks.com; dev@dpdk.org; olivier.m...@6wind.com > > Cc: Richardson, Bruce <bruce.richard...@intel.com>; jianbo....@arm.com; > > hemant.agra...@nxp.com; jie2....@hxt-semitech.com; > > bing.z...@hxt-semitech.com; jia...@hxt-semitech.com > > Subject: Re: [PATCH v2] ring: guarantee ordering of cons/prod loading when > > doing > > > > Hi Ananyev > > > > > > On 11/2/2017 9:26 PM, Ananyev, Konstantin Wrote: > > > Hi Jia, > > > > > >> -----Original Message----- > > >> From: Jia He [mailto:hejia...@gmail.com] > > >> Sent: Thursday, November 2, 2017 8:44 AM > > >> To: jerin.ja...@caviumnetworks.com; dev@dpdk.org; olivier.m...@6wind.com > > >> Cc: Ananyev, Konstantin <konstantin.anan...@intel.com>; Richardson, > > >> Bruce <bruce.richard...@intel.com>; jianbo....@arm.com; > > >> hemant.agra...@nxp.com; Jia He <hejia...@gmail.com>; > > >> jie2....@hxt-semitech.com; bing.z...@hxt-semitech.com; jia.he@hxt- > > >> semitech.com > > >> Subject: [PATCH v2] ring: guarantee ordering of cons/prod loading when > > >> doing > > >> > > >> We watched a rte panic of mbuf_autotest in our qualcomm arm64 server. > > >> As for the possible race condition, please refer to [1]. > > >> > > >> Furthermore, there are 2 options as suggested by Jerin: > > >> 1. use rte_smp_rmb > > >> 2. use load_acquire/store_release(refer to [2]). > > >> CONFIG_RTE_ATOMIC_ACQUIRE_RELEASE_BARRIER_PREFER is provided, and by > > >> default it is n; > > >> > > >> The reason why providing 2 options is due to the performance benchmark > > >> difference in different arm machines, please refer to [3]. > > >> > > >> Already fuctionally tested on the machines as follows: > > >> on X86(passed the compilation) > > >> on arm64 with CONFIG_RTE_ATOMIC_ACQUIRE_RELEASE_BARRIER_PREFER=y > > >> on arm64 with CONFIG_RTE_ATOMIC_ACQUIRE_RELEASE_BARRIER_PREFER=n > > >> > > >> [1] http://dpdk.org/ml/archives/dev/2017-October/078366.html > > >> [2] > > >> https://github.com/freebsd/freebsd/blob/master/sys/sys/buf_ring.h#L170 > > >> [3] http://dpdk.org/ml/archives/dev/2017-October/080861.html > > >> > > >> --- > > >> Changelog: > > >> V2: let users choose whether using load_acquire/store_release > > >> V1: rte_smp_rmb() between 2 loads > > >> > > >> Signed-off-by: Jia He <hejia...@gmail.com> > > >> Signed-off-by: jie2....@hxt-semitech.com > > >> Signed-off-by: bing.z...@hxt-semitech.com > > >> Signed-off-by: jia...@hxt-semitech.com > > >> Suggested-by: jerin.ja...@caviumnetworks.com > > >> --- > > >> lib/librte_ring/Makefile | 4 +++- > > >> lib/librte_ring/rte_ring.h | 38 ++++++++++++++++++++++++------ > > >> lib/librte_ring/rte_ring_arm64.h | 48 > > >> ++++++++++++++++++++++++++++++++++++++ > > >> lib/librte_ring/rte_ring_generic.h | 45 > > >> +++++++++++++++++++++++++++++++++++ > > >> 4 files changed, 127 insertions(+), 8 deletions(-) > > >> create mode 100644 lib/librte_ring/rte_ring_arm64.h > > >> create mode 100644 lib/librte_ring/rte_ring_generic.h > > >> > > >> diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile > > >> index e34d9d9..fa57a86 100644 > > >> --- a/lib/librte_ring/Makefile > > >> +++ b/lib/librte_ring/Makefile > > >> @@ -45,6 +45,8 @@ LIBABIVER := 1 > > >> SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c > > >> > > >> # install includes > > >> -SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h > > >> +SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h \ > > >> + rte_ring_arm64.h \ > > >> + rte_ring_generic.h > > >> > > >> include $(RTE_SDK)/mk/rte.lib.mk > > >> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h > > >> index 5e9b3b7..943b1f9 100644 > > >> --- a/lib/librte_ring/rte_ring.h > > >> +++ b/lib/librte_ring/rte_ring.h > > >> @@ -103,6 +103,18 @@ extern "C" { > > >> #include <rte_memzone.h> > > >> #include <rte_pause.h> > > >> > > >> +/* In those strong memory models (e.g. x86), there is no need to add > > >> rmb() > > >> + * between load and load. > > >> + * In those weak models(powerpc/arm), there are 2 choices for the users > > >> + * 1.use rmb() memory barrier > > >> + * 2.use one-direcion load_acquire/store_release barrier > > >> + * It depends on performance test results. */ > > >> +#ifdef RTE_ARCH_ARM64 > > >> +#include "rte_ring_arm64.h" > > >> +#else > > >> +#include "rte_ring_generic.h" > > >> +#endif > > >> + > > >> #define RTE_TAILQ_RING_NAME "RTE_RING" > > >> > > >> enum rte_ring_queue_behavior { > > >> @@ -368,7 +380,7 @@ update_tail(struct rte_ring_headtail *ht, uint32_t > > >> old_val, uint32_t new_val, > > >> while (unlikely(ht->tail != old_val)) > > >> rte_pause(); > > >> > > >> - ht->tail = new_val; > > >> + arch_rte_atomic_store(&ht->tail, new_val, __ATOMIC_RELEASE); > > >> } > > >> > > >> /** > > >> @@ -408,7 +420,8 @@ __rte_ring_move_prod_head(struct rte_ring *r, int > > >> is_sp, > > >> /* Reset n to the initial burst count */ > > >> n = max; > > >> > > >> - *old_head = r->prod.head; > > >> + *old_head = arch_rte_atomic_load(&r->prod.head, > > >> + __ATOMIC_ACQUIRE); > > >> const uint32_t cons_tail = r->cons.tail; > > > The code starts to look a bit messy with all these arch specific macros... > > > So I wonder wouldn't it be more cleaner to: > > > > > > 1. move existing > > > __rte_ring_move_prod_head/__rte_ring_move_cons_head/update_tail > > > into rte_ring_generic.h > > > 2. Add rte_smp_rmb into generic > > > __rte_ring_move_prod_head/__rte_ring_move_cons_head > > > (as was in v1 of your patch). > > > 3. Introduce ARM specific versions of > > > __rte_ring_move_prod_head/__rte_ring_move_cons_head/update_tail > > > in the rte_ring_arm64.h > > > > > > That way we will keep ogneric code simple and clean, while still allowing > > > arch specific optimizations. > > Thanks for your review. > > But as per your suggestion, there will be at least 2 copies of > > __rte_ring_move_prod_head/__rte_ring_move_cons_head/update_tail. > > Thus, if there are any bugs in the future, both 2 copies have to be > > changed, right? > > Yes, there would be some code duplication. > Though generic code would be much cleaner and simpler to read in that case. > Which is worth some dups I think.
I agree with Konstantin here.