On Fri, 2019-01-25 at 17:56 +0000, Eads, Gage wrote: > > > > > -----Original Message----- > > From: Eads, Gage > > Sent: Friday, January 25, 2019 11:43 AM > > To: 'Honnappa Nagarahalli' <honnappa.nagaraha...@arm.com>; dev@dpdk.org > > Cc: olivier.m...@6wind.com; arybche...@solarflare.com; Richardson, Bruce > > <bruce.richard...@intel.com>; Ananyev, Konstantin > > <konstantin.anan...@intel.com>; step...@networkplumber.org; nd > > <n...@arm.com>; tho...@monjalon.net; Ola Liljedahl > > <ola.liljed...@arm.com>; Gavin Hu (Arm Technology China) > > <gavin...@arm.com>; Song Zhu (Arm Technology China) > > <song....@arm.com>; nd <n...@arm.com> > > Subject: RE: [dpdk-dev] [PATCH v3 0/5] Add non-blocking ring > > > > Hi Honnappa, > > > > Works for me -- I'm in favor of the best performing implementation, whoever > > provides it. > > > > To allow an apples-to-apples comparison, I suggest Ola's/ARM's > > implementation > > be made to fit into the rte_ring API with an associated mempool handler. > > That'll > > allow us to use the existing ring and mempool performance tests as well. > > Feel > > free to use code from this patchset for the rte_ring integration, if that > > helps, of > > course. > > > But also, if Ola/ARM's algorithm is sufficiently similar to this one, it's > probably better to tweak this patchset's enqueue and dequeue functions with > any improvements you can identify rather than creating an entirely separate > implementation. There are strong similarities. But my implementation is separate from rte_ring (whose code is a mess) which also freed me from any interoperatibility with the rte_ring code and data structure (with two pairs of head+tail which is unnecessary for the lock-free ring buffer).
My design and implementation is here: https://github.com/ARM-software/progress64/blob/master/src/p64_lfring.c I have a DPDK version in flight. Merging the relevant changes into your patch makes sense. There are some differences we will have to agree on. > > > > > I expect to have v4 available within the next week. > > > > Thanks, > > Gage > > > > > > > > -----Original Message----- > > > From: Honnappa Nagarahalli [mailto:honnappa.nagaraha...@arm.com] > > > Sent: Thursday, January 24, 2019 11:21 PM > > > To: Eads, Gage <gage.e...@intel.com>; dev@dpdk.org > > > Cc: olivier.m...@6wind.com; arybche...@solarflare.com; Richardson, > > > Bruce <bruce.richard...@intel.com>; Ananyev, Konstantin > > > <konstantin.anan...@intel.com>; step...@networkplumber.org; nd > > > <n...@arm.com>; tho...@monjalon.net; Ola Liljedahl > > > <ola.liljed...@arm.com>; Gavin Hu (Arm Technology China) > > > <gavin...@arm.com>; Song Zhu (Arm Technology China) > > > <song....@arm.com>; nd <n...@arm.com> > > > Subject: RE: [dpdk-dev] [PATCH v3 0/5] Add non-blocking ring > > > > > > Hi Gage, > > > Thank you for this patch. Arm (Ola Liljedahl) had worked on a non- > > > blocking ring algorithm. We were planning to add it to DPDK at some > > > point this year. I am wondering if you would be open to take a look at > > > the algorithm and collaborate? > > > > > > I am yet to fully understand both the algorithms. But, Ola has > > > reviewed your patch and can provide a quick overview of the differences > > > here. > > > > > > If you agree, we can send a RFC patch. You can review that and do > > > performance benchmarking on your platforms. I can also benchmark your > > > patch (may be once you fix the issue identified in > > > __rte_ring_do_nb_enqueue_mp function?) on Arm platforms. May be we can > > end up with a better combined algorithm. > > > > > > > > > Hi Thomas/Bruce, > > > Please let me know if this is ok and if there is a better way to do > > > this. > > > > > > Thank you, > > > Honnappa > > > > > > > > > > > -----Original Message----- > > > > From: dev <dev-boun...@dpdk.org> On Behalf Of Gage Eads > > > > Sent: Friday, January 18, 2019 9:23 AM > > > > To: dev@dpdk.org > > > > Cc: olivier.m...@6wind.com; arybche...@solarflare.com; > > > > bruce.richard...@intel.com; konstantin.anan...@intel.com; > > > > step...@networkplumber.org > > > > Subject: [dpdk-dev] [PATCH v3 0/5] Add non-blocking ring > > > > > > > > For some users, the rte ring's "non-preemptive" constraint is not > > > > acceptable; for example, if the application uses a mixture of pinned > > > > high-priority threads and multiplexed low-priority threads that > > > > share a > > > mempool. > > > > > > > > > > > > This patchset introduces a non-blocking ring, on top of which a > > > > mempool can run. > > > > Crucially, the non-blocking algorithm relies on a 128-bit > > > > compare-and-swap, so it is currently limited to x86_64 machines. > > > > This is also an experimental API, so RING_F_NB users must build with > > > > the > > > ALLOW_EXPERIMENTAL_API flag. > > > > > > > > > > > > The ring uses more compare-and-swap atomic operations than the > > > > regular rte > > > > ring: > > > > With no contention, an enqueue of n pointers uses (1 + 2n) CAS > > > > operations and a dequeue of n pointers uses 2. This algorithm has > > > > worse average-case performance than the regular rte ring > > > > (particularly a highly-contended ring with large bulk accesses), > > > > however: > > > > - For applications with preemptible pthreads, the regular rte ring's > > > > worst- > > case > > > > > > > > > > > performance (i.e. one thread being preempted in the update_tail() > > > > critical > > > > section) is much worse than the non-blocking ring's. > > > > - Software caching can mitigate the average case performance for ring- > > based > > > > > > > > > > > algorithms. For example, a non-blocking ring based mempool (a > > > > likely use case > > > > for this ring) with per-thread caching. > > > > > > > > The non-blocking ring is enabled via a new flag, RING_F_NB. For > > > > ease-of-use, existing ring enqueue/dequeue functions work with both > > > > "regular" and non- blocking rings. > > > > > > > > This patchset also adds non-blocking versions of ring_autotest and > > > > ring_perf_autotest, and a non-blocking ring based mempool. > > > > > > > > This patchset makes one API change; a deprecation notice will be > > > > posted in a separate commit. > > > > > > > > This patchset depends on the non-blocking stack patchset[1]. > > > > > > > > [1] http://mails.dpdk.org/archives/dev/2019-January/123653.html > > > > > > > > v3: > > > > - Avoid the ABI break by putting 64-bit head and tail values in the > > > > same > > > > cacheline as struct rte_ring's prod and cons members. > > > > - Don't attempt to compile rte_atomic128_cmpset without > > > > ALLOW_EXPERIMENTAL_API, as this would break a large number of > > libraries. > > > > > > > > > > > - Add a helpful warning to __rte_ring_do_nb_enqueue_mp() in case > > > > someone tries > > > > to use RING_F_NB without the ALLOW_EXPERIMENTAL_API flag. > > > > - Update the ring mempool to use experimental APIs > > > > - Clarify that RINB_F_NB is only limited to x86_64 currently; > > > > ARMv8.1-A builds > > > > can eventually support it with the CASP instruction. > > > > > > > > v2: > > > > - Merge separate docs commit into patch #5 > > > > - Convert uintptr_t to size_t > > > > - Add a compile-time check for the size of size_t > > > > - Fix a space-after-typecast issue > > > > - Fix an unnecessary-parentheses checkpatch warning > > > > - Bump librte_ring's library version > > > > > > > > Gage Eads (5): > > > > ring: add 64-bit headtail structure > > > > ring: add a non-blocking implementation > > > > test_ring: add non-blocking ring autotest > > > > test_ring_perf: add non-blocking ring perf test > > > > mempool/ring: add non-blocking ring handlers > > > > > > > > doc/guides/prog_guide/env_abstraction_layer.rst | 2 +- > > > > drivers/mempool/ring/Makefile | 1 + > > > > drivers/mempool/ring/meson.build | 2 + > > > > drivers/mempool/ring/rte_mempool_ring.c | 58 ++- > > > > lib/librte_eventdev/rte_event_ring.h | 2 +- > > > > lib/librte_ring/Makefile | 3 +- > > > > lib/librte_ring/rte_ring.c | 72 ++- > > > > lib/librte_ring/rte_ring.h | 574 > > > > ++++++++++++++++++++++-- > > > > lib/librte_ring/rte_ring_generic_64.h | 152 +++++++ > > > > lib/librte_ring/rte_ring_version.map | 7 + > > > > test/test/test_ring.c | 57 ++- > > > > test/test/test_ring_perf.c | 19 +- > > > > 12 files changed, 874 insertions(+), 75 deletions(-) create mode > > > > 100644 lib/librte_ring/rte_ring_generic_64.h > > > > > > > > -- > > > > 2.13.6 -- Ola Liljedahl, Networking System Architect, Arm Phone +46706866373, Skype ola.liljedahl