On Fri, 2019-01-25 at 17:21 +0000, Eads, Gage wrote:
> > -----Original Message-----
> > From: Ola Liljedahl [mailto:ola.liljed...@arm.com]
> > Sent: Wednesday, January 23, 2019 4:16 AM
> > To: Eads, Gage <gage.e...@intel.com>; dev@dpdk.org
> > Cc: olivier.m...@6wind.com; step...@networkplumber.org; nd
> > <n...@arm.com>; Richardson, Bruce <bruce.richard...@intel.com>;
> > arybche...@solarflare.com; Ananyev, Konstantin
> > <konstantin.anan...@intel.com>
> > Subject: Re: [dpdk-dev] [PATCH v3 2/5] ring: add a non-blocking
> > implementation
> > 
> > s.
> > > > 
> > > You can tell this code was written when I thought x86-64 was the
> > > only
> > > viable target :). Yes, you are correct.
> > > 
> > > With regards to using __atomic intrinsics, I'm planning on taking
> > > a
> > > similar approach to the functions duplicated in
> > > rte_ring_generic.h and
> > > rte_ring_c11_mem.h: one version that uses rte_atomic functions
> > > (and
> > > thus stricter memory ordering) and one that uses __atomic
> > > intrinsics
> > > (and thus can benefit from more relaxed memory ordering).
> > What's the advantage of having two different implementations? What
> > is the
> > disadvantage?
> > 
> > The existing ring buffer code originally had only the "legacy"
> > implementation
> > which was kept when the __atomic implementation was added. The
> > reason
> > claimed was that some older compilers for x86 do not support GCC
> > __atomic
> > builtins. But I thought there was consensus that new functionality
> > could have
> > only __atomic implementations.
> > 
> 
> When CONFIG_RTE_RING_USE_C11_MEM_MODEL was introduced, it was left
> disabled for thunderx[1] for performance reasons. Assuming that
> hasn't changed, the advantage to having two versions is to best
> support all of DPDK's platforms. The disadvantage is of course
> duplicated code and the additional maintenance burden.
> 
> That said, if the thunderx maintainers are ok with it, I'm certainly 

The ring code was so fundamental building block for DPDK, there was 
difference in performance and there was already legacy code so
introducing C11_MEM_MODEL was justified IMO. 

For the nonblocking implementation, I am happy to test with
three ARM64 microarchitectures and share the result with C11_MEM_MODEL
vs non C11_MEM_MODLE performance. We may need to consider PPC also
here. So IMO, based on the overall performance result may be can decide
the new code direction.

Reply via email to