[dpdk-dev] [PATCH v7 1/3] doc: add generic atomic deprecation section

Phil Yang Sun, 12 Jul 2020 23:24:24 -0700

Add deprecating the generic rte_atomic_xx APIs to C11 atomic built-ins
guide and examples.


Signed-off-by: Phil Yang <phil.y...@arm.com>
Signed-off-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>
---
 doc/guides/prog_guide/writing_efficient_code.rst | 64 +++++++++++++++++++++++-
 1 file changed, 63 insertions(+), 1 deletion(-)

diff --git a/doc/guides/prog_guide/writing_efficient_code.rst 
b/doc/guides/prog_guide/writing_efficient_code.rst
index 849f63e..16d6188 100644
--- a/doc/guides/prog_guide/writing_efficient_code.rst
+++ b/doc/guides/prog_guide/writing_efficient_code.rst
@@ -167,7 +167,13 @@ but with the added cost of lower throughput.
 Locks and Atomic Operations
 ---------------------------
 
-Atomic operations imply a lock prefix before the instruction,
+This section describes some key considerations when using locks and atomic
+operations in the DPDK environment.
+
+Locks
+~~~~~
+
+On x86, atomic operations imply a lock prefix before the instruction,
 causing the processor's LOCK# signal to be asserted during execution of the 
following instruction.
 This has a big impact on performance in a multicore environment.
 
@@ -176,6 +182,62 @@ It can often be replaced by other solutions like per-lcore 
variables.
 Also, some locking techniques are more efficient than others.
 For instance, the Read-Copy-Update (RCU) algorithm can frequently replace 
simple rwlocks.
 
+Atomic Operations: Use C11 Atomic Built-ins
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+DPDK generic rte_atomic operations are implemented by `__sync built-ins
+<https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html>`_.
+These __sync built-ins result in full barriers on aarch64, which are 
unnecessary
+in many use cases. They can be replaced by `__atomic built-ins
+<https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html>`_
+that conform to the C11 memory model and provide finer memory order control.
+
+So replacing the rte_atomic operations with __atomic built-ins might improve
+performance for aarch64 machines.
+
+Some typical optimization cases are listed below:
+
+Atomicity
+^^^^^^^^^
+
+Some use cases require atomicity alone, the ordering of the memory operations
+does not matter. For example the packets statistics in ``virtio_xmit()``
+function of ``vhost`` example application. It just updates the number of
+transmitted packets, no subsequent logic depends on these counters. So the
+RELAXED memory ordering is sufficient.
+
+One-way Barrier
+^^^^^^^^^^^^^^^
+
+Some use cases allow for memory reordering in one way while requiring memory
+ordering in the other direction.
+
+For example, the memory operations before the ``rte_spinlock_lock()`` can move
+to the critical section, but the memory operations in the critical section
+cannot move above the lock. In this case, the full memory barrier in the
+compare-and-swap operation can be replaced to ACQUIRE. On the other hand, the
+memory operations after the ``rte_spinlock_unlock()`` can move to the critical
+section, but the memory operations in the critical section cannot move below
+the unlock. So the full barrier in the STORE operation can be replaced with
+RELEASE.
+
+Reader-Writer Concurrency
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Lock-free reader-writer concurrency is one of the common use cases in DPDK.
+
+The payload or the data that the writer wants to communicate to the reader,
+can be written with RELAXED memory order. However, the guard variable should
+be written with RELEASE memory order. This ensures that the store to guard
+variable is observable only after the store to payload is observable.
+Refer to ``rte_hash_cuckoo_insert_mw()`` for an example.
+
+Correspondingly, on the reader side, the guard variable should be read
+with ACQUIRE memory order. The payload or the data the writer communicated,
+can be read with RELAXED memory order. This ensures that, if the store to
+guard variable is observable, the store to payload is also observable.
+Refer to rte_hash ``search_one_bucket_lf()`` for an example.
+
 Coding Considerations
 ---------------------
 
-- 
2.7.4

[dpdk-dev] [PATCH v7 1/3] doc: add generic atomic deprecation section

Reply via email to