Hi Gage,
        Thank you for this patch. Arm (Ola Liljedahl) had worked on a 
non-blocking ring algorithm. We were planning to add it to DPDK at some point 
this year. I am wondering if you would be open to take a look at the algorithm 
and collaborate?

I am yet to fully understand both the algorithms. But, Ola has reviewed your 
patch and can provide a quick overview of the differences here.

If you agree, we can send a RFC patch. You can review that and do performance 
benchmarking on your platforms. I can also benchmark your patch (may be once 
you fix the issue identified in __rte_ring_do_nb_enqueue_mp  function?) on Arm 
platforms. May be we can end up with a better combined algorithm.

Hi Thomas/Bruce,
        Please let me know if this is ok and if there is a better way to do 
this.

Thank you,
Honnappa

> -----Original Message-----
> From: dev <dev-boun...@dpdk.org> On Behalf Of Gage Eads
> Sent: Friday, January 18, 2019 9:23 AM
> To: dev@dpdk.org
> Cc: olivier.m...@6wind.com; arybche...@solarflare.com;
> bruce.richard...@intel.com; konstantin.anan...@intel.com;
> step...@networkplumber.org
> Subject: [dpdk-dev] [PATCH v3 0/5] Add non-blocking ring
> 
> For some users, the rte ring's "non-preemptive" constraint is not acceptable;
> for example, if the application uses a mixture of pinned high-priority threads
> and multiplexed low-priority threads that share a mempool.
> 
> This patchset introduces a non-blocking ring, on top of which a mempool can
> run.
> Crucially, the non-blocking algorithm relies on a 128-bit compare-and-swap,
> so it is currently limited to x86_64 machines. This is also an experimental 
> API,
> so RING_F_NB users must build with the ALLOW_EXPERIMENTAL_API flag.
> 
> The ring uses more compare-and-swap atomic operations than the regular rte
> ring:
> With no contention, an enqueue of n pointers uses (1 + 2n) CAS operations
> and a dequeue of n pointers uses 2. This algorithm has worse average-case
> performance than the regular rte ring (particularly a highly-contended ring
> with large bulk accesses), however:
> - For applications with preemptible pthreads, the regular rte ring's 
> worst-case
>   performance (i.e. one thread being preempted in the update_tail() critical
>   section) is much worse than the non-blocking ring's.
> - Software caching can mitigate the average case performance for ring-based
>   algorithms. For example, a non-blocking ring based mempool (a likely use
> case
>   for this ring) with per-thread caching.
> 
> The non-blocking ring is enabled via a new flag, RING_F_NB. For ease-of-use,
> existing ring enqueue/dequeue functions work with both "regular" and non-
> blocking rings.
> 
> This patchset also adds non-blocking versions of ring_autotest and
> ring_perf_autotest, and a non-blocking ring based mempool.
> 
> This patchset makes one API change; a deprecation notice will be posted in a
> separate commit.
> 
> This patchset depends on the non-blocking stack patchset[1].
> 
> [1] http://mails.dpdk.org/archives/dev/2019-January/123653.html
> 
> v3:
>  - Avoid the ABI break by putting 64-bit head and tail values in the same
>    cacheline as struct rte_ring's prod and cons members.
>  - Don't attempt to compile rte_atomic128_cmpset without
>    ALLOW_EXPERIMENTAL_API, as this would break a large number of libraries.
>  - Add a helpful warning to __rte_ring_do_nb_enqueue_mp() in case someone
> tries
>    to use RING_F_NB without the ALLOW_EXPERIMENTAL_API flag.
>  - Update the ring mempool to use experimental APIs
>  - Clarify that RINB_F_NB is only limited to x86_64 currently; ARMv8.1-A
> builds
>    can eventually support it with the CASP instruction.
> 
> v2:
>  - Merge separate docs commit into patch #5
>  - Convert uintptr_t to size_t
>  - Add a compile-time check for the size of size_t
>  - Fix a space-after-typecast issue
>  - Fix an unnecessary-parentheses checkpatch warning
>  - Bump librte_ring's library version
> 
> Gage Eads (5):
>   ring: add 64-bit headtail structure
>   ring: add a non-blocking implementation
>   test_ring: add non-blocking ring autotest
>   test_ring_perf: add non-blocking ring perf test
>   mempool/ring: add non-blocking ring handlers
> 
>  doc/guides/prog_guide/env_abstraction_layer.rst |   2 +-
>  drivers/mempool/ring/Makefile                   |   1 +
>  drivers/mempool/ring/meson.build                |   2 +
>  drivers/mempool/ring/rte_mempool_ring.c         |  58 ++-
>  lib/librte_eventdev/rte_event_ring.h            |   2 +-
>  lib/librte_ring/Makefile                        |   3 +-
>  lib/librte_ring/rte_ring.c                      |  72 ++-
>  lib/librte_ring/rte_ring.h                      | 574 
> ++++++++++++++++++++++--
>  lib/librte_ring/rte_ring_generic_64.h           | 152 +++++++
>  lib/librte_ring/rte_ring_version.map            |   7 +
>  test/test/test_ring.c                           |  57 ++-
>  test/test/test_ring_perf.c                      |  19 +-
>  12 files changed, 874 insertions(+), 75 deletions(-)  create mode 100644
> lib/librte_ring/rte_ring_generic_64.h
> 
> --
> 2.13.6

Reply via email to