On Fri, 2019-01-25 at 17:56 +0000, Eads, Gage wrote:
> 
> > 
> > -----Original Message-----
> > From: Eads, Gage
> > Sent: Friday, January 25, 2019 11:43 AM
> > To: 'Honnappa Nagarahalli' <honnappa.nagaraha...@arm.com>; dev@dpdk.org
> > Cc: olivier.m...@6wind.com; arybche...@solarflare.com; Richardson, Bruce
> > <bruce.richard...@intel.com>; Ananyev, Konstantin
> > <konstantin.anan...@intel.com>; step...@networkplumber.org; nd
> > <n...@arm.com>; tho...@monjalon.net; Ola Liljedahl
> > <ola.liljed...@arm.com>; Gavin Hu (Arm Technology China)
> > <gavin...@arm.com>; Song Zhu (Arm Technology China)
> > <song....@arm.com>; nd <n...@arm.com>
> > Subject: RE: [dpdk-dev] [PATCH v3 0/5] Add non-blocking ring
> > 
> > Hi Honnappa,
> > 
> > Works for me -- I'm in favor of the best performing implementation, whoever
> > provides it.
> > 
> > To allow an apples-to-apples comparison, I suggest Ola's/ARM's
> > implementation
> > be made to fit into the rte_ring API with an associated mempool handler.
> > That'll
> > allow us to use the existing ring and mempool performance tests as well.
> > Feel
> > free to use code from this patchset for the rte_ring integration, if that
> > helps, of
> > course.
> > 
> But also, if Ola/ARM's algorithm is sufficiently similar to this one, it's
> probably better to tweak this patchset's enqueue and dequeue functions with
> any improvements you can identify rather than creating an entirely separate
> implementation.
There are strong similarities. But my implementation is separate from rte_ring
(whose code is a mess) which also freed me from any interoperatibility with the
rte_ring code and data structure (with two pairs of head+tail which is
unnecessary for the lock-free ring buffer).

My design and implementation is here:
https://github.com/ARM-software/progress64/blob/master/src/p64_lfring.c
I have a DPDK version in flight. Merging the relevant changes into your patch
makes sense. There are some differences we will have to agree on.

> 
> > 
> > I expect to have v4 available within the next week.
> > 
> > Thanks,
> > Gage
> > 
> > > 
> > > -----Original Message-----
> > > From: Honnappa Nagarahalli [mailto:honnappa.nagaraha...@arm.com]
> > > Sent: Thursday, January 24, 2019 11:21 PM
> > > To: Eads, Gage <gage.e...@intel.com>; dev@dpdk.org
> > > Cc: olivier.m...@6wind.com; arybche...@solarflare.com; Richardson,
> > > Bruce <bruce.richard...@intel.com>; Ananyev, Konstantin
> > > <konstantin.anan...@intel.com>; step...@networkplumber.org; nd
> > > <n...@arm.com>; tho...@monjalon.net; Ola Liljedahl
> > > <ola.liljed...@arm.com>; Gavin Hu (Arm Technology China)
> > > <gavin...@arm.com>; Song Zhu (Arm Technology China)
> > > <song....@arm.com>; nd <n...@arm.com>
> > > Subject: RE: [dpdk-dev] [PATCH v3 0/5] Add non-blocking ring
> > > 
> > > Hi Gage,
> > >   Thank you for this patch. Arm (Ola Liljedahl) had worked on a non-
> > > blocking ring algorithm. We were planning to add it to DPDK at some
> > > point this year. I am wondering if you would be open to take a look at
> > > the algorithm and collaborate?
> > > 
> > > I am yet to fully understand both the algorithms. But, Ola has
> > > reviewed your patch and can provide a quick overview of the differences
> > > here.
> > > 
> > > If you agree, we can send a RFC patch. You can review that and do
> > > performance benchmarking on your platforms. I can also benchmark your
> > > patch (may be once you fix the issue identified in
> > > __rte_ring_do_nb_enqueue_mp  function?) on Arm platforms. May be we can
> > end up with a better combined algorithm.
> > > 
> > > 
> > > Hi Thomas/Bruce,
> > >   Please let me know if this is ok and if there is a better way to do
> > > this.
> > > 
> > > Thank you,
> > > Honnappa
> > > 
> > > > 
> > > > -----Original Message-----
> > > > From: dev <dev-boun...@dpdk.org> On Behalf Of Gage Eads
> > > > Sent: Friday, January 18, 2019 9:23 AM
> > > > To: dev@dpdk.org
> > > > Cc: olivier.m...@6wind.com; arybche...@solarflare.com;
> > > > bruce.richard...@intel.com; konstantin.anan...@intel.com;
> > > > step...@networkplumber.org
> > > > Subject: [dpdk-dev] [PATCH v3 0/5] Add non-blocking ring
> > > > 
> > > > For some users, the rte ring's "non-preemptive" constraint is not
> > > > acceptable; for example, if the application uses a mixture of pinned
> > > > high-priority threads and multiplexed low-priority threads that
> > > > share a
> > > mempool.
> > > > 
> > > > 
> > > > This patchset introduces a non-blocking ring, on top of which a
> > > > mempool can run.
> > > > Crucially, the non-blocking algorithm relies on a 128-bit
> > > > compare-and-swap, so it is currently limited to x86_64 machines.
> > > > This is also an experimental API, so RING_F_NB users must build with
> > > > the
> > > ALLOW_EXPERIMENTAL_API flag.
> > > > 
> > > > 
> > > > The ring uses more compare-and-swap atomic operations than the
> > > > regular rte
> > > > ring:
> > > > With no contention, an enqueue of n pointers uses (1 + 2n) CAS
> > > > operations and a dequeue of n pointers uses 2. This algorithm has
> > > > worse average-case performance than the regular rte ring
> > > > (particularly a highly-contended ring with large bulk accesses),
> > > > however:
> > > > - For applications with preemptible pthreads, the regular rte ring's
> > > > worst-
> > case
> > > 
> > > > 
> > > >   performance (i.e. one thread being preempted in the update_tail()
> > > > critical
> > > >   section) is much worse than the non-blocking ring's.
> > > > - Software caching can mitigate the average case performance for ring-
> > based
> > > 
> > > > 
> > > >   algorithms. For example, a non-blocking ring based mempool (a
> > > > likely use case
> > > >   for this ring) with per-thread caching.
> > > > 
> > > > The non-blocking ring is enabled via a new flag, RING_F_NB. For
> > > > ease-of-use, existing ring enqueue/dequeue functions work with both
> > > > "regular" and non- blocking rings.
> > > > 
> > > > This patchset also adds non-blocking versions of ring_autotest and
> > > > ring_perf_autotest, and a non-blocking ring based mempool.
> > > > 
> > > > This patchset makes one API change; a deprecation notice will be
> > > > posted in a separate commit.
> > > > 
> > > > This patchset depends on the non-blocking stack patchset[1].
> > > > 
> > > > [1] http://mails.dpdk.org/archives/dev/2019-January/123653.html
> > > > 
> > > > v3:
> > > >  - Avoid the ABI break by putting 64-bit head and tail values in the
> > > > same
> > > >    cacheline as struct rte_ring's prod and cons members.
> > > >  - Don't attempt to compile rte_atomic128_cmpset without
> > > >    ALLOW_EXPERIMENTAL_API, as this would break a large number of
> > libraries.
> > > 
> > > > 
> > > >  - Add a helpful warning to __rte_ring_do_nb_enqueue_mp() in case
> > > > someone tries
> > > >    to use RING_F_NB without the ALLOW_EXPERIMENTAL_API flag.
> > > >  - Update the ring mempool to use experimental APIs
> > > >  - Clarify that RINB_F_NB is only limited to x86_64 currently;
> > > > ARMv8.1-A builds
> > > >    can eventually support it with the CASP instruction.
> > > > 
> > > > v2:
> > > >  - Merge separate docs commit into patch #5
> > > >  - Convert uintptr_t to size_t
> > > >  - Add a compile-time check for the size of size_t
> > > >  - Fix a space-after-typecast issue
> > > >  - Fix an unnecessary-parentheses checkpatch warning
> > > >  - Bump librte_ring's library version
> > > > 
> > > > Gage Eads (5):
> > > >   ring: add 64-bit headtail structure
> > > >   ring: add a non-blocking implementation
> > > >   test_ring: add non-blocking ring autotest
> > > >   test_ring_perf: add non-blocking ring perf test
> > > >   mempool/ring: add non-blocking ring handlers
> > > > 
> > > >  doc/guides/prog_guide/env_abstraction_layer.rst |   2 +-
> > > >  drivers/mempool/ring/Makefile                   |   1 +
> > > >  drivers/mempool/ring/meson.build                |   2 +
> > > >  drivers/mempool/ring/rte_mempool_ring.c         |  58 ++-
> > > >  lib/librte_eventdev/rte_event_ring.h            |   2 +-
> > > >  lib/librte_ring/Makefile                        |   3 +-
> > > >  lib/librte_ring/rte_ring.c                      |  72 ++-
> > > >  lib/librte_ring/rte_ring.h                      | 574
> > > > ++++++++++++++++++++++--
> > > >  lib/librte_ring/rte_ring_generic_64.h           | 152 +++++++
> > > >  lib/librte_ring/rte_ring_version.map            |   7 +
> > > >  test/test/test_ring.c                           |  57 ++-
> > > >  test/test/test_ring_perf.c                      |  19 +-
> > > >  12 files changed, 874 insertions(+), 75 deletions(-)  create mode
> > > > 100644 lib/librte_ring/rte_ring_generic_64.h
> > > > 
> > > > --
> > > > 2.13.6
-- 
Ola Liljedahl, Networking System Architect, Arm
Phone +46706866373, Skype ola.liljedahl

Reply via email to