> -----Original Message-----
> From: Ola Liljedahl [mailto:ola.liljed...@arm.com]
> Sent: Tuesday, January 22, 2019 3:28 AM
> To: Eads, Gage <gage.e...@intel.com>; dev@dpdk.org
> Cc: olivier.m...@6wind.com; step...@networkplumber.org; Richardson, Bruce
> <bruce.richard...@intel.com>; arybche...@solarflare.com; Ananyev,
> Konstantin <konstantin.anan...@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v3 0/5] Add non-blocking ring
> 
> On Fri, 2019-01-18 at 09:23 -0600, Gage Eads wrote:
> > For some users, the rte ring's "non-preemptive" constraint is not
> > acceptable; for example, if the application uses a mixture of pinned
> > high- priority threads and multiplexed low-priority threads that share
> > a mempool.
> >
> > This patchset introduces a non-blocking ring, on top of which a
> > mempool can run.
> > Crucially, the non-blocking algorithm relies on a 128-bit compare-
> > and-swap, so it is currently limited to x86_64 machines. This is also
> > an experimental API, so RING_F_NB users must build with the
> > ALLOW_EXPERIMENTAL_API flag.
> >
> > The ring uses more compare-and-swap atomic operations than the regular
> > rte ring:
> > With no contention, an enqueue of n pointers uses (1 + 2n) CAS
> > operations and a dequeue of n pointers uses 2. This algorithm has
> > worse average-case performance than the regular rte ring (particularly
> > a highly-contended ring with large bulk accesses), however:
> > - For applications with preemptible pthreads, the regular rte ring's
> > worst-case
> >   performance (i.e. one thread being preempted in the update_tail()
> > critical
> >   section) is much worse than the non-blocking ring's.
> > - Software caching can mitigate the average case performance for
> > ring-based
> >   algorithms. For example, a non-blocking ring based mempool (a likely
> > use case
> >   for this ring) with per-thread caching.
> >
> > The non-blocking ring is enabled via a new flag, RING_F_NB. For ease-
> > of-use, existing ring enqueue/dequeue functions work with both
> > "regular" and non-blocking rings.
> >
> > This patchset also adds non-blocking versions of ring_autotest and
> > ring_perf_autotest, and a non-blocking ring based mempool.
> >
> > This patchset makes one API change; a deprecation notice will be
> > posted in a separate commit.
> >
> > This patchset depends on the non-blocking stack patchset[1].
> >
> > [1] http://mails.dpdk.org/archives/dev/2019-January/123653.html
> >
> > v3:
> >  - Avoid the ABI break by putting 64-bit head and tail values in the
> > same
> >    cacheline as struct rte_ring's prod and cons members.
> >  - Don't attempt to compile rte_atomic128_cmpset without
> >    ALLOW_EXPERIMENTAL_API, as this would break a large number of
> > libraries.
> >  - Add a helpful warning to __rte_ring_do_nb_enqueue_mp() in case
> > someone tries
> >    to use RING_F_NB without the ALLOW_EXPERIMENTAL_API flag.
> >  - Update the ring mempool to use experimental APIs
> >  - Clarify that RINB_F_NB is only limited to x86_64 currently;
> > ARMv8.1-A builds
> >    can eventually support it with the CASP instruction.
> ARMv8.0 should be able to implement a 128-bit atomic compare exchange
> operation using LDXP/STXP.

I see, I wasn't aware these instructions were available.

> 
> From an ARM perspective, I want all atomic operations to take memory ordering
> arguments (e.g. acquire, release). Not all usages of e.g.
> atomic compare exchange require sequential consistency (which I think what
> x86 cmpxchg instruction provides). DPDK functions should not be modelled after
> x86 behaviour.
> 
> Lock-free 128-bit atomics implementations for ARM/AArch64 and x86-64 are
> available here:
> https://github.com/ARM-software/progress64/blob/master/src/lockfree.h
> 

Sure, I'll address this in the next patchset.

Thanks,
Gage

Reply via email to