Re: [dpdk-dev] [PATCH v8 1/3] doc: add optimizations using C11 atomic built-ins

Phil Yang Thu, 16 Jul 2020 21:45:14 -0700

Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> writes:

<snip>
> 
> > Subject: [PATCH v8 1/3] doc: add optimizations using C11 atomic built-ins
> >
> > Add information about possible optimizations using C11 atomic built-ins.
> >
> > Signed-off-by: Phil Yang <phil.y...@arm.com>
> > Signed-off-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>
> 
> Thanks for the changes, they look good now.
> 
> David wanted to change 'built-ins' to 'builtins', otherwise
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>


Will do in the next version.
Thanks.

> 
> > ---
> >  doc/guides/prog_guide/writing_efficient_code.rst | 59
> > +++++++++++++++++++++++-
> >  1 file changed, 58 insertions(+), 1 deletion(-)
> >
> > diff --git a/doc/guides/prog_guide/writing_efficient_code.rst
> > b/doc/guides/prog_guide/writing_efficient_code.rst
> > index 849f63e..53a1ca1 100644
> > --- a/doc/guides/prog_guide/writing_efficient_code.rst
> > +++ b/doc/guides/prog_guide/writing_efficient_code.rst
> > @@ -167,7 +167,13 @@ but with the added cost of lower throughput.
> >  Locks and Atomic Operations
> >  ---------------------------
> >
> > -Atomic operations imply a lock prefix before the instruction,
> > +This section describes some key considerations when using locks and
> > +atomic operations in the DPDK environment.
> > +
> > +Locks
> > +~~~~~
> > +
> > +On x86, atomic operations imply a lock prefix before the instruction,
> >  causing the processor's LOCK# signal to be asserted during execution of
> the
> > following instruction.
> >  This has a big impact on performance in a multicore environment.
> >
> > @@ -176,6 +182,57 @@ It can often be replaced by other solutions like per-
> > lcore variables.
> >  Also, some locking techniques are more efficient than others.
> >  For instance, the Read-Copy-Update (RCU) algorithm can frequently
> replace
> > simple rwlocks.
> >
> > +Atomic Operations: Use C11 Atomic Built-ins
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +DPDK generic rte_atomic operations are implemented by __sync built-ins.
> > +These __sync built-ins result in full barriers on aarch64, which are
> > +unnecessary in many use cases. They can be replaced by __atomic
> > +built-ins that conform to the C11 memory model and provide finer
> memory
> > order control.
> > +
> > +So replacing the rte_atomic operations with __atomic built-ins might
> > +improve performance for aarch64 machines.
> > +
> > +Some typical optimization cases are listed below:
> > +
> > +Atomicity
> > +^^^^^^^^^
> > +
> > +Some use cases require atomicity alone, the ordering of the memory
> > +operations does not matter. For example, the packet statistics counters
> > +need to be incremented atomically but do not need any particular
> memory
> > ordering.
> > +So, RELAXED memory ordering is sufficient.
> > +
> > +One-way Barrier
> > +^^^^^^^^^^^^^^^
> > +
> > +Some use cases allow for memory reordering in one way while requiring
> > +memory ordering in the other direction.
> > +
> > +For example, the memory operations before the spinlock lock are allowed
> > +to move to the critical section, but the memory operations in the
> > +critical section are not allowed to move above the lock. In this case,
> > +the full memory barrier in the compare-and-swap operation can be
> replaced
> > with ACQUIRE memory order.
> > +On the other hand, the memory operations after the spinlock unlock are
> > +allowed to move to the critical section, but the memory operations in
> > +the critical section are not allowed to move below the unlock. So the
> > +full barrier in the store operation can use RELEASE memory order.
> > +
> > +Reader-Writer Concurrency
> > +^^^^^^^^^^^^^^^^^^^^^^^^^
> > +
> > +Lock-free reader-writer concurrency is one of the common use cases in
> DPDK.
> > +
> > +The payload or the data that the writer wants to communicate to the
> > +reader, can be written with RELAXED memory order. However, the guard
> > +variable should be written with RELEASE memory order. This ensures that
> > +the store to guard variable is observable only after the store to payload 
> > is
> > observable.
> > +
> > +Correspondingly, on the reader side, the guard variable should be read
> > +with ACQUIRE memory order. The payload or the data the writer
> > +communicated, can be read with RELAXED memory order. This ensures
> that,
> > +if the store to guard variable is observable, the store to payload is also
> > observable.
> > +
> >  Coding Considerations
> >  ---------------------
> >
> > --
> > 2.7.4

Re: [dpdk-dev] [PATCH v8 1/3] doc: add optimizations using C11 atomic built-ins

Reply via email to