<snip> > Subject: RE: [dpdk-dev] Arm roadmap for 20.05 > > <snip> > (apologies Morten - I missed your response, consolidating the discussion in > this thread) > > + Intel x86 and IBM POWER maintainers > > > > > > >>>>> Subject: Re: [dpdk-dev] Arm roadmap for 20.05 > > > >>>>> > > > >>>>> On 2020-03-10 17:42, Honnappa Nagarahalli wrote: > > > >>>>>> Hello, > > > >>>>>> Following are the work items planned for 20.05: > > > >>>>>> > > > >>>>>> 1) Use C11 atomic APIs in timer library > > > >>>>>> 2) Use C11 atomic APIs in service cores > > > >>>>>> 3) Use C11 atomics in VirtIO split ring > > > >>>>>> 4) Performance optimizations in i40e and MLX drivers for Arm > > > >>>>>> platforms > > > >>>>>> 5) RCU defer API > > > >>>>>> 6) Enable Travis CI with no huge-page tests - ~25 test cases > > > >>>>>> > > > >>>>>> Thank you, > > > >>>>>> Honnappa > > > >>>>> Maybe you should have a look at legacy DPDK atomics as well? > > > >>>>> Avoiding a full barrier for the add operation, for example. > > > >>>> By legacy, I believe you meant rte_atomic APIs. Those APIs do > > > >>>> not take > > > >> memory order as a parameter. So, it is difficult to change the > > > >> implementation for those APIs. For ex: the add operation could > > > >> take a RELEASE or RELAXED order depending on the use case. > > > >>>> So, the proposal is to deprecate the rte_atomic APIs and use > > > >>>> C11 APIs directly. The proposal is here: > > > >>>> https://protect2.fireeye.com/v1/url?k=2e04311e-72d039b7- > 2e04718 > > > >>>> 5- > > > >> 865b > > > >>>> 3b1e120b-91a0698f69ff0d1f&q=1&e=976056f3-f089-4fa8-86b2- > > > >> aa5e88331555& > > > >>>> u=https%3A%2F%2Fpatches.dpdk.org%2Fcover%2F66745%2F > > > >>> Even though rte_atomic lacks the flexibility of C11 atomics, > > > >>> there might still be areas of improvement. Such improvements > > > >>> will have an instant effect, as opposed to waiting for all the > > > >>> rte_atomic users to > > change. > > > >>> > > > >>> > > > >>> The rte_atomic API leaves ordering unspecified, unfortunately. > > > >>> In the Linux kernel, from which DPDK seems to borrow much of the > > > >>> atomics and memory order related semantics, an atomic add > > > >>> doesn't imply any memory barriers. The current > > > >>> __sync_fetch_and_add()-based implementation implies a full > > > >>> barrier > > > >>> (ldadd+dmb) or release (ldaddal, on v8.1-a). If you would use > > > >>> C11 atomics to implement rte_atomic in ARM, you could use a > > > >>> relaxed memory order on > > > >>> rte_atomic*_add() (assuming you agree those are the implicit > > > >>> semantics of the legacy API) and just get an ldadd instruction. > > > >>> An alternative would be to implement the same thing in > > > >>> assembler, of > > course. > > > >>> > > > >>> > > > >> Another approach might be to just scrap all of the intrinsics and > > > >> inline assembler used for all the functions in rte_atomic, on all > > > >> architectures, and use C11 atomics instead. > > > > Yes, this is the approach we are taking. But, it does not solve > > > > the use of > > > rte_atomic APIs in the applications. > > > > > > Agreed. > > > > > > > > > Another question. "C11 atomics" here seems to mean using GCC > > > instrinsics, normally used to implement C11 atomics, not C11 atomics (i.e. > > <stdatomic.h>). > > > What is the reason directly calling the intrinsics, rather than > > > using the standard API? > > I did not know they existed for C. Looking at them, they looks like > > just wrappers around the intrinsics. The advantage seems to be the > > type check enforced by the compiler. i.e. if a variable is defined of > > type '_Atomic', the compiler should not allow any non-atomic operations on > them. Anything else? > > I will explore this further. > I see some issues expressed for Intel ICC compiler [1], but they seem to have > been fixed in the latest versions [2]. Please check. > > [1] https://software.intel.com/en-us/forums/intel-c-compiler/topic/681815 > [2] https://software.intel.com/en-us/articles/c11-support-in-intel-c-compiler > I looked into this some more. The built-ins are supported in GCC from 4.7 and in clang from 3.1. The stdatomic.h is supported in GCC from 4.9 and in clang from 3.6.
I see that Intel Compilation CI has 3 configurations that use GCC 4.8.5 and Clang 3.4.2. Any reasoning for using these? Can these be upgraded? > > > > > > > > > > > With this in mind, wouldn't be better to extend <rte_atomic.h> with > > > functions that take a memory ordering parameter? And properly > > > document the memory ordering for the functions already in this API, > > > and maybe deprecate some functions in favor of others, more C11-like, > functions? > > I would prefer to use what the language provides rather than creating > > DPDK's own, which will be just wrappers on top of what C provides. If > > we follow the existing model of rte_atomic APIs, we will be creating > > these for every size of the parameter (rte_atomic8/16/32/64_xxx). This > > results in more core to maintain. > > > > > If not, assuming <stdatomic.h> can't be used, wouldn't it be better > > > if we added a <rte_stdatomic.h>, which mimics the standard API, > > > maybe with some DPDK tweaks, plus potentially with DPDK-specific > > > extensions as > > well? > > What kind of extensions are you thinking about? > > > > > > > > > > > Directly accessing instrinsics will lead to things like > > > __atomic_add_ifless() (already in DPDK code base), when people need > > > to extend the API. This very much look like GCC built-in function, but is > > > not. > > I think the DPDK code should not be using symbols that will > > potentially collide with language/library symbols. > > Luckily, in this case, it is internal to a PMD which can be changed. > > It also contains more symbols which are on the border to collide with > > 'stdatomic.h'. > > > > > > > > > > > Sorry for hijacking the ARM roadmap thread. > > No problem. I am glad we are having these important discussions. > > > > > >