> > > +#define RTE_HAS_ACQ(mo) ((mo) != __ATOMIC_RELAXED && (mo) != > > > +__ATOMIC_RELEASE) #define RTE_HAS_RLS(mo) ((mo) == > > > __ATOMIC_RELEASE || \ > > > + (mo) == __ATOMIC_ACQ_REL || \ > > > + (mo) == __ATOMIC_SEQ_CST) > > > + > > > +#define RTE_MO_LOAD(mo) (RTE_HAS_ACQ((mo)) \ > > > + ? __ATOMIC_ACQUIRE : __ATOMIC_RELAXED) #define > > > RTE_MO_STORE(mo) > > > +(RTE_HAS_RLS((mo)) \ > > > + ? __ATOMIC_RELEASE : __ATOMIC_RELAXED) > > > + > > > > The one starts with RTE_ are public symbols, If it is generic enough, > > Move to common layer so that every architecturse can use. > > If you think, otherwise make it internal > > Let's keep it internal. I will remove the 'RTE_' tag.
Probably change to __HAS_ACQ to avoid collision(just in case) > > > > > > > > > +#ifdef __ARM_FEATURE_ATOMICS > > > > This define is added in gcc 9.1 and I believe for clang it is not supported > > yet. > > So old gcc and clang this will be undefined. > > I think, With meson + native build, we can find the presence of > > ATOMIC support by running a.out. Not sure about make and cross build case. > > I don't want block this feature because of this, IMO, We can add this > > code with existing __ARM_FEATURE_ATOMICS scheme and later find a > > method to enhance it. But please check how to fix it. > > OK. After thinking on this a bit, I think, in order to support old gcc(< gcc 9.1) and clang, We can introduce a config option, where, by default it is disabled and enable In specific config(where we know, lse is supported) and meson config. i.e #if defined(__ARM_FEATURE_ATOMICS) || defined(RTE_ARM_FEATURE_ATOMICS) > > > > > > +#define __ATOMIC128_CAS_OP(cas_op_name, op_string) > > > \ > > > +static inline rte_int128_t > > > \ > > > +cas_op_name(rte_int128_t *dst, rte_int128_t old, > > > \ > > > + rte_int128_t updated) > > > \ > > > +{ > > > \ > > > + /* caspX instructions register pair must start from even-numbered > > > + * register at operand 1. > > > + * So, specify registers for local variables here. > > > + */ > > > \ > > > + register uint64_t x0 __asm("x0") = (uint64_t)old.val[0]; > > > \ > > > > Since direct x0 register used in the code and > > cas_op_name() and rte_atomic128_cmp_exchange() is inline function, > > Based on parent function load, we may corrupt x0 register aka > > Since x0/x1 and x2/x3 are used a lot and often contain live values. > Maybe to change them to some relatively less frequently used registers like > x14/x15 and x16/x17 might help for this case? > According to the PCS (Procedure Call Standard), x14-x17 are also temporary > registers. X14-x17 are temporary registers but since cas_op_name() and rte_atomic128_cmp_exchange() are inline functions, Based on the parent function register usage, it _may_ corrupt. > > > Break arm64 ABI. Not sure clobber list will help here or not? > > In my understanding, for the register variable, if it contains a live value > in the > specified register, the compiler will move the live value into a free > register. > Since x0~x3 are present in the input/output operands and x0/x1's value needs > to > be restored to the variable 'old' as a return value. > So I didn't add them into the clobber list. OK > > > Making it as no_inline will help but not sure about the performance impact. > > May be you can check with compiler team. > > > > We burned our hands with this scheme, see > > 5b40ec6b966260e0ff66a8a2c689664f75d6a0e6 ("mempool/octeontx2: fix > > possible arm64 ABI break") > > > > Probably we can choose a scheme for rc2 and adjust as when we have > > complete clarity. > > > > > > > > +#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64) > > > > There is nothing specific to x86 and arm64 here, Can we remove this #ifdef ? > > Without this constraint, it will break 32-bit x86 builds. > http://mails.dpdk.org/archives/test-report/2019-June/086586.html OK . #ifdef RTE_ARCH_64 would help then. > > > > > > +/** > > > + * 128-bit integer structure. > > > + */ > > > +RTE_STD_C11 > > > +typedef struct { > > > + RTE_STD_C11 > > > + union { > > > + uint64_t val[2]; > > > + __extension__ __int128 int128; Instead of guarding RTE_ARCH_64 on this complete structure, How about it only under #ifdef RTE_ARCH_64 __extension__ __int128 int128; #endif So that it rte_int128_t will be available for 32bit as well. > > > + }; > > > +} __rte_aligned(16) rte_int128_t; > > > +#endif > > > + > > > #ifdef __DOXYGEN__