Hi Gage, Thank you for this patch. Arm (Ola Liljedahl) had worked on a non-blocking ring algorithm. We were planning to add it to DPDK at some point this year. I am wondering if you would be open to take a look at the algorithm and collaborate?
I am yet to fully understand both the algorithms. But, Ola has reviewed your patch and can provide a quick overview of the differences here. If you agree, we can send a RFC patch. You can review that and do performance benchmarking on your platforms. I can also benchmark your patch (may be once you fix the issue identified in __rte_ring_do_nb_enqueue_mp function?) on Arm platforms. May be we can end up with a better combined algorithm. Hi Thomas/Bruce, Please let me know if this is ok and if there is a better way to do this. Thank you, Honnappa > -----Original Message----- > From: dev <dev-boun...@dpdk.org> On Behalf Of Gage Eads > Sent: Friday, January 18, 2019 9:23 AM > To: dev@dpdk.org > Cc: olivier.m...@6wind.com; arybche...@solarflare.com; > bruce.richard...@intel.com; konstantin.anan...@intel.com; > step...@networkplumber.org > Subject: [dpdk-dev] [PATCH v3 0/5] Add non-blocking ring > > For some users, the rte ring's "non-preemptive" constraint is not acceptable; > for example, if the application uses a mixture of pinned high-priority threads > and multiplexed low-priority threads that share a mempool. > > This patchset introduces a non-blocking ring, on top of which a mempool can > run. > Crucially, the non-blocking algorithm relies on a 128-bit compare-and-swap, > so it is currently limited to x86_64 machines. This is also an experimental > API, > so RING_F_NB users must build with the ALLOW_EXPERIMENTAL_API flag. > > The ring uses more compare-and-swap atomic operations than the regular rte > ring: > With no contention, an enqueue of n pointers uses (1 + 2n) CAS operations > and a dequeue of n pointers uses 2. This algorithm has worse average-case > performance than the regular rte ring (particularly a highly-contended ring > with large bulk accesses), however: > - For applications with preemptible pthreads, the regular rte ring's > worst-case > performance (i.e. one thread being preempted in the update_tail() critical > section) is much worse than the non-blocking ring's. > - Software caching can mitigate the average case performance for ring-based > algorithms. For example, a non-blocking ring based mempool (a likely use > case > for this ring) with per-thread caching. > > The non-blocking ring is enabled via a new flag, RING_F_NB. For ease-of-use, > existing ring enqueue/dequeue functions work with both "regular" and non- > blocking rings. > > This patchset also adds non-blocking versions of ring_autotest and > ring_perf_autotest, and a non-blocking ring based mempool. > > This patchset makes one API change; a deprecation notice will be posted in a > separate commit. > > This patchset depends on the non-blocking stack patchset[1]. > > [1] http://mails.dpdk.org/archives/dev/2019-January/123653.html > > v3: > - Avoid the ABI break by putting 64-bit head and tail values in the same > cacheline as struct rte_ring's prod and cons members. > - Don't attempt to compile rte_atomic128_cmpset without > ALLOW_EXPERIMENTAL_API, as this would break a large number of libraries. > - Add a helpful warning to __rte_ring_do_nb_enqueue_mp() in case someone > tries > to use RING_F_NB without the ALLOW_EXPERIMENTAL_API flag. > - Update the ring mempool to use experimental APIs > - Clarify that RINB_F_NB is only limited to x86_64 currently; ARMv8.1-A > builds > can eventually support it with the CASP instruction. > > v2: > - Merge separate docs commit into patch #5 > - Convert uintptr_t to size_t > - Add a compile-time check for the size of size_t > - Fix a space-after-typecast issue > - Fix an unnecessary-parentheses checkpatch warning > - Bump librte_ring's library version > > Gage Eads (5): > ring: add 64-bit headtail structure > ring: add a non-blocking implementation > test_ring: add non-blocking ring autotest > test_ring_perf: add non-blocking ring perf test > mempool/ring: add non-blocking ring handlers > > doc/guides/prog_guide/env_abstraction_layer.rst | 2 +- > drivers/mempool/ring/Makefile | 1 + > drivers/mempool/ring/meson.build | 2 + > drivers/mempool/ring/rte_mempool_ring.c | 58 ++- > lib/librte_eventdev/rte_event_ring.h | 2 +- > lib/librte_ring/Makefile | 3 +- > lib/librte_ring/rte_ring.c | 72 ++- > lib/librte_ring/rte_ring.h | 574 > ++++++++++++++++++++++-- > lib/librte_ring/rte_ring_generic_64.h | 152 +++++++ > lib/librte_ring/rte_ring_version.map | 7 + > test/test/test_ring.c | 57 ++- > test/test/test_ring_perf.c | 19 +- > 12 files changed, 874 insertions(+), 75 deletions(-) create mode 100644 > lib/librte_ring/rte_ring_generic_64.h > > -- > 2.13.6