> -----Original Message----- > From: Ola Liljedahl [mailto:ola.liljed...@arm.com] > Sent: Tuesday, January 22, 2019 3:28 AM > To: Eads, Gage <gage.e...@intel.com>; dev@dpdk.org > Cc: olivier.m...@6wind.com; step...@networkplumber.org; Richardson, Bruce > <bruce.richard...@intel.com>; arybche...@solarflare.com; Ananyev, > Konstantin <konstantin.anan...@intel.com> > Subject: Re: [dpdk-dev] [PATCH v3 0/5] Add non-blocking ring > > On Fri, 2019-01-18 at 09:23 -0600, Gage Eads wrote: > > For some users, the rte ring's "non-preemptive" constraint is not > > acceptable; for example, if the application uses a mixture of pinned > > high- priority threads and multiplexed low-priority threads that share > > a mempool. > > > > This patchset introduces a non-blocking ring, on top of which a > > mempool can run. > > Crucially, the non-blocking algorithm relies on a 128-bit compare- > > and-swap, so it is currently limited to x86_64 machines. This is also > > an experimental API, so RING_F_NB users must build with the > > ALLOW_EXPERIMENTAL_API flag. > > > > The ring uses more compare-and-swap atomic operations than the regular > > rte ring: > > With no contention, an enqueue of n pointers uses (1 + 2n) CAS > > operations and a dequeue of n pointers uses 2. This algorithm has > > worse average-case performance than the regular rte ring (particularly > > a highly-contended ring with large bulk accesses), however: > > - For applications with preemptible pthreads, the regular rte ring's > > worst-case > > performance (i.e. one thread being preempted in the update_tail() > > critical > > section) is much worse than the non-blocking ring's. > > - Software caching can mitigate the average case performance for > > ring-based > > algorithms. For example, a non-blocking ring based mempool (a likely > > use case > > for this ring) with per-thread caching. > > > > The non-blocking ring is enabled via a new flag, RING_F_NB. For ease- > > of-use, existing ring enqueue/dequeue functions work with both > > "regular" and non-blocking rings. > > > > This patchset also adds non-blocking versions of ring_autotest and > > ring_perf_autotest, and a non-blocking ring based mempool. > > > > This patchset makes one API change; a deprecation notice will be > > posted in a separate commit. > > > > This patchset depends on the non-blocking stack patchset[1]. > > > > [1] http://mails.dpdk.org/archives/dev/2019-January/123653.html > > > > v3: > > - Avoid the ABI break by putting 64-bit head and tail values in the > > same > > cacheline as struct rte_ring's prod and cons members. > > - Don't attempt to compile rte_atomic128_cmpset without > > ALLOW_EXPERIMENTAL_API, as this would break a large number of > > libraries. > > - Add a helpful warning to __rte_ring_do_nb_enqueue_mp() in case > > someone tries > > to use RING_F_NB without the ALLOW_EXPERIMENTAL_API flag. > > - Update the ring mempool to use experimental APIs > > - Clarify that RINB_F_NB is only limited to x86_64 currently; > > ARMv8.1-A builds > > can eventually support it with the CASP instruction. > ARMv8.0 should be able to implement a 128-bit atomic compare exchange > operation using LDXP/STXP.
I see, I wasn't aware these instructions were available. > > From an ARM perspective, I want all atomic operations to take memory ordering > arguments (e.g. acquire, release). Not all usages of e.g. > atomic compare exchange require sequential consistency (which I think what > x86 cmpxchg instruction provides). DPDK functions should not be modelled after > x86 behaviour. > > Lock-free 128-bit atomics implementations for ARM/AArch64 and x86-64 are > available here: > https://github.com/ARM-software/progress64/blob/master/src/lockfree.h > Sure, I'll address this in the next patchset. Thanks, Gage