On 2020-03-17 02:17, Phil Yang wrote: > DPDK provides generic rte_atomic APIs to do several atomic operations. > These APIs are using the deprecated __sync built-ins and enforce full > memory barriers on aarch64. However, full barriers are not necessary > in many use cases. In order to address such use cases, C language offers > C11 atomic APIs. The C11 atomic APIs provide finer memory barrier control > by making use of the memory ordering parameter provided by the user. > Various patches submitted in the past [2] and the patches in this series > indicate significant performance gains on multiple aarch64 CPUs and no > performance loss on x86. > > But the existing rte_atomic API implementations cannot be changed as the > APIs do not take the memory ordering parameter. The only choice available > is replacing the usage of the rte_atomic APIs with C11 atomic APIs. In > order to make this change, the following steps are proposed:
First of all I must say I much support the effort of introducing C11 atomics or something equivalent into DPDK, across the board. What's being proposed however, is not to use C11 atomics, but rather the GCC built-ins designed to allow an efficient C11 atomics implementation. The C11 atomic API is found in <stdatomic.h>. Also, the <rte_atomic.h> API is not using __sync. It doesn't dictate any particular implementation at all. I don't think directly accessing GCC built-ins across the whole DPDK code base sounds like a good idea at all. Beyond just being plain ugly, and setting a bad precedence, using built-ins directly also effectively prevents API extensions. Although C11 is new and shiny, I'm sure there will come a day when we want to extend this API, to make it easier for consumers and avoid code duplication. Some parts of the DPDK code base already today define their own __atomic_* functions. Bad idea to use the "__*" namespace, especially in a way that has a real risk of future collisions. It's also confusing for anyone reading the code, since they are led to believe it's a GCC built-in. Direct calls to GCC built-ins also prevents the use of any other implementation than the GCC built-ins, if some ISA or ISA implementation would benefit from this. This should be avoided of course, so it's just a minor objection. I think the right way to go about this is not to deprecate <rte_atomic.h>. Rather, <rte_atomic.h> should be reshaped into something that closely maps to the GCC built-ins for C11 (which seems more convenient than real C11 atomics). The parts of <rte_atomic.h> that doesn't fit the new model, should be deprecated. To summarize, I'm not in favor of deprecating <rte_atomic.h>. If we should deprecate anything, it's directly accessing compiler built-ins. > [1] deprecate rte_atomic APIs so that future patches do not use rte_atomic > APIs (a script is added to flag the usages). > [2] refactor the code that uses rte_atomic APIs to use c11 atomic APIs. > > This patchset contains: > 1) the checkpatch script changes to flag rte_atomic API usage in patches. > 2) changes to programmer guide describing writing efficient code for aarch64. > 3) changes to various libraries to make use of c11 atomic APIs. > > We are planning to replicate this idea across all the other libraries, > drivers, examples, test applications. In the next phase, we will add > changes to the mbuf, the EAL interrupts and the event timer adapter libraries. > > v3: > add libatomic dependency for 32-bit clang > > v2: > 1. fix Clang '-Wincompatible-pointer-types' WARNING. > 2. fix typos. > > Honnappa Nagarahalli (2): > service: avoid race condition for MT unsafe service > service: identify service running on another core correctly > > Phil Yang (10): > doc: add generic atomic deprecation section > devtools: prevent use of rte atomic APIs in future patches > eal/build: add libatomic dependency for 32-bit clang > build: remove redundant code > vhost: optimize broadcast rarp sync with c11 atomic > ipsec: optimize with c11 atomic for sa outbound sqn update > service: remove rte prefix from static functions > service: remove redundant code > service: optimize with c11 one-way barrier > service: relax barriers with C11 atomic operations > > devtools/checkpatches.sh | 9 ++ > doc/guides/prog_guide/writing_efficient_code.rst | 60 +++++++- > drivers/event/octeontx/meson.build | 5 - > drivers/event/octeontx2/meson.build | 5 - > drivers/event/opdl/meson.build | 5 - > lib/librte_eal/common/rte_service.c | 175 > ++++++++++++----------- > lib/librte_eal/meson.build | 6 + > lib/librte_ipsec/ipsec_sqn.h | 3 +- > lib/librte_ipsec/sa.h | 2 +- > lib/librte_rcu/meson.build | 5 - > lib/librte_vhost/vhost.h | 2 +- > lib/librte_vhost/vhost_user.c | 7 +- > lib/librte_vhost/virtio_net.c | 16 ++- > 13 files changed, 181 insertions(+), 119 deletions(-) >