> > > > > For SA outbound packets, rte_atomic64_add_return is used to > > > > > generate SQN atomically. This introduced an unnecessary full > > > > > barrier by calling the '__sync' builtin implemented rte_atomic_XX > > > > > API on aarch64. This patch optimized it with c11 atomic and > > > > > eliminated the expensive barrier for aarch64. > > > > > > > > > > Signed-off-by: Phil Yang <phil.y...@arm.com> > > > > > Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com> > > > > > Reviewed-by: Gavin Hu <gavin...@arm.com> > > > > > --- > > > > > lib/librte_ipsec/ipsec_sqn.h | 3 ++- > > > > > lib/librte_ipsec/sa.h | 2 +- > > > > > 2 files changed, 3 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/lib/librte_ipsec/ipsec_sqn.h > > > > > b/lib/librte_ipsec/ipsec_sqn.h index 0c2f76a..e884af7 100644 > > > > > --- a/lib/librte_ipsec/ipsec_sqn.h > > > > > +++ b/lib/librte_ipsec/ipsec_sqn.h > > > > > @@ -128,7 +128,8 @@ esn_outb_update_sqn(struct rte_ipsec_sa *sa, > > > > > uint32_t *num) > > > > > > > > > > n = *num; > > > > > if (SQN_ATOMIC(sa)) > > > > > - sqn = (uint64_t)rte_atomic64_add_return(&sa- > > > > >sqn.outb.atom, n); > > > > > + sqn = __atomic_add_fetch(&sa->sqn.outb.atom, n, > > > > > + __ATOMIC_RELAXED); > > > > > > > > One generic thing to note: > > > > clang for i686 in some cases will generate a proper function call > > > > for 64-bit __atomic builtins (gcc seems to always generate cmpxchng8b > > > > for > > such cases). > > > > Does anyone consider it as a potential problem? > > > > It probably not a big deal, but would like to know broader opinion. > > > I had looked at this some time back for GCC. The function call is > > > generated only if the underlying platform does not support the atomic > > instructions for the operand size. Otherwise, gcc generates the instructions > > directly. > > > I would think the behavior would be the same for clang. > > > > From what I see not really. > > As an example: > > > > $ cat tatm11.c > > #include <stdint.h> > > > > struct x { > > uint64_t v __attribute__((aligned(8))); }; > > > > uint64_t > > ffxadd1(struct x *x, uint32_t n, uint32_t m) { > > return __atomic_add_fetch(&x->v, n, __ATOMIC_RELAXED); } > > > > uint64_t > > ffxadd11(uint64_t *v, uint32_t n, uint32_t m) { > > return __atomic_add_fetch(v, n, __ATOMIC_RELAXED); } > > > > gcc for i686 will generate code with cmpxchng8b for both cases. > > clang will generate cmpxchng8b for ffxadd1() - when data is explicitly 8B > > aligned, but will emit a function call for ffxadd11(). > Does it require libatomic to be linked in this case?
Yes, it does. In fact same story even with current dpdk.org master. To make i686-native-linuxapp-clang successfully, I have to explicitly add EXTRA_LDFLAGS="-latomic". To be more specific: $ for i in i686-native-linuxapp-clang/lib/*.a; do x=`nm $i | grep __atomic_`; if [[ -n "${x}" ]]; then echo $i; echo $x; fi; done i686-native-linuxapp-clang/lib/librte_distributor.a U __atomic_load_8 U __atomic_store_8 i686-native-linuxapp-clang/lib/librte_pmd_opdl_event.a U __atomic_load_8 U __atomic_store_8 i686-native-linuxapp-clang/lib/librte_rcu.a U __atomic_compare_exchange_8 U __atomic_load_8 As there were no complains so far, it makes me think that probably no-one using clang for IA-32 builds. > Clang documentation calls out unaligned case where it would generate the > function call > [1]. Seems so, and it treats uin64_t as 4B aligned for IA. > On aarch64, the atomic instructions need the address to be aligned. For that particular case (cmpxchng8b) there is no such restrictions for IA-32. Again, as I said before, gcc manages to emit code without function calls for exactly the same source. > > [1] https://clang.llvm.org/docs/Toolchain.html#atomics-library > > > > > > > > > > > > > > > else { > > > > > sqn = sa->sqn.outb.raw + n; > > > > > sa->sqn.outb.raw = sqn; > > > > > diff --git a/lib/librte_ipsec/sa.h b/lib/librte_ipsec/sa.h index > > > > > d22451b..cab9a2e 100644 > > > > > --- a/lib/librte_ipsec/sa.h > > > > > +++ b/lib/librte_ipsec/sa.h > > > > > @@ -120,7 +120,7 @@ struct rte_ipsec_sa { > > > > > */ > > > > > union { > > > > > union { > > > > > - rte_atomic64_t atom; > > > > > + uint64_t atom; > > > > > uint64_t raw; > > > > > } outb; > > > > > > > > If we don't need rte_atomic64 here anymore, then I think we can > > > > collapse the union to just: > > > > uint64_t outb; > > > > > > > > > struct { > > > > > -- > > > > > 2.7.4