> -----Original Message----- > From: Jerin Jacob Kollanukkaran <jer...@marvell.com> > Sent: Friday, July 19, 2019 2:25 PM > To: Phil Yang (Arm Technology China) <phil.y...@arm.com>; dev@dpdk.org > Cc: tho...@monjalon.net; hemant.agra...@nxp.com; Honnappa > Nagarahalli <honnappa.nagaraha...@arm.com>; Gavin Hu (Arm Technology > China) <gavin...@arm.com>; nd <n...@arm.com>; gage.e...@intel.com > Subject: RE: [EXT] [PATCH v3 1/3] eal/arm64: add 128-bit atomic compare > exchange > > > -----Original Message----- > > From: Phil Yang <phil.y...@arm.com> > > Sent: Friday, June 28, 2019 1:42 PM > > To: dev@dpdk.org > > Cc: tho...@monjalon.net; Jerin Jacob Kollanukkaran > <jer...@marvell.com>; > > hemant.agra...@nxp.com; honnappa.nagaraha...@arm.com; > > gavin...@arm.com; n...@arm.com; gage.e...@intel.com > > Subject: [EXT] [PATCH v3 1/3] eal/arm64: add 128-bit atomic compare > > exchange > > > > External Email > > > > ---------------------------------------------------------------------- > > Add 128-bit atomic compare exchange on aarch64. > > > > Signed-off-by: Phil Yang <phil.y...@arm.com> > > Tested-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > > --- > > +#define RTE_HAS_ACQ(mo) ((mo) != __ATOMIC_RELAXED && (mo) != > > +__ATOMIC_RELEASE) #define RTE_HAS_RLS(mo) ((mo) == > > __ATOMIC_RELEASE || \ > > + (mo) == __ATOMIC_ACQ_REL || \ > > + (mo) == __ATOMIC_SEQ_CST) > > + > > +#define RTE_MO_LOAD(mo) (RTE_HAS_ACQ((mo)) \ > > + ? __ATOMIC_ACQUIRE : __ATOMIC_RELAXED) #define > > RTE_MO_STORE(mo) > > +(RTE_HAS_RLS((mo)) \ > > + ? __ATOMIC_RELEASE : __ATOMIC_RELAXED) > > + > > The one starts with RTE_ are public symbols, If it is generic enough, > Move to common layer so that every architecturse can use. > If you think, otherwise make it internal
Let's keep it internal. I will remove the 'RTE_' tag. > > > > > +#ifdef __ARM_FEATURE_ATOMICS > > This define is added in gcc 9.1 and I believe for clang it is not supported > yet. > So old gcc and clang this will be undefined. > I think, With meson + native build, we can find the presence of > ATOMIC support by running a.out. Not sure about make and cross build case. > I don't want block this feature because of this, IMO, We can add this code > with existing __ARM_FEATURE_ATOMICS scheme and later find a method > to enhance it. But please check how to fix it. OK. > > > +#define __ATOMIC128_CAS_OP(cas_op_name, op_string) > > \ > > +static inline rte_int128_t > > \ > > +cas_op_name(rte_int128_t *dst, rte_int128_t old, > > \ > > + rte_int128_t updated) > > \ > > +{ > > \ > > + /* caspX instructions register pair must start from even-numbered > > + * register at operand 1. > > + * So, specify registers for local variables here. > > + */ > > \ > > + register uint64_t x0 __asm("x0") = (uint64_t)old.val[0]; > > \ > > Since direct x0 register used in the code and > cas_op_name() and rte_atomic128_cmp_exchange() is inline function, > Based on parent function load, we may corrupt x0 register aka Since x0/x1 and x2/x3 are used a lot and often contain live values. Maybe to change them to some relatively less frequently used registers like x14/x15 and x16/x17 might help for this case? According to the PCS (Procedure Call Standard), x14-x17 are also temporary registers. > Break arm64 ABI. Not sure clobber list will help here or not? In my understanding, for the register variable, if it contains a live value in the specified register, the compiler will move the live value into a free register. Since x0~x3 are present in the input/output operands and x0/x1's value needs to be restored to the variable 'old' as a return value. So I didn't add them into the clobber list. > Making it as no_inline will help but not sure about the performance impact. > May be you can check with compiler team. > > We burned our hands with this scheme, see > 5b40ec6b966260e0ff66a8a2c689664f75d6a0e6 ("mempool/octeontx2: fix > possible arm64 ABI break") > > Probably we can choose a scheme for rc2 and adjust as when we have > complete clarity. > > > + register uint64_t x1 __asm("x1") = (uint64_t)old.val[1]; > > \ > > + register uint64_t x2 __asm("x2") = (uint64_t)updated.val[0]; > > \ > > + register uint64_t x3 __asm("x3") = (uint64_t)updated.val[1]; > > \ > > + asm volatile( > > \ > > + op_string " %[old0], %[old1], %[upd0], %[upd1], > > [%[dst]]" \ > > + : [old0] "+r" (x0), > > \ > > + [old1] "+r" (x1) > > \ > > + : [upd0] "r" (x2), > > \ > > + [upd1] "r" (x3), > > \ > > + [dst] "r" (dst) > > \ > > + : "memory"); > > \ > > Should n't we add x0,x1, x2, x3 in clobber list? Same as above. > > > > static inline int __rte_experimental > > rte_atomic128_cmp_exchange(rte_int128_t *dst, > > rte_int128_t *exp, > > diff --git a/lib/librte_eal/common/include/generic/rte_atomic.h > > b/lib/librte_eal/common/include/generic/rte_atomic.h > > index 9958543..2355e50 100644 > > --- a/lib/librte_eal/common/include/generic/rte_atomic.h > > +++ b/lib/librte_eal/common/include/generic/rte_atomic.h > > @@ -1081,6 +1081,20 @@ static inline void > > rte_atomic64_clear(rte_atomic64_t *v) > > > > /*------------------------ 128 bit atomic operations > > -------------------------*/ > > > > +#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64) > > There is nothing specific to x86 and arm64 here, Can we remove this #ifdef ? Without this constraint, it will break 32-bit x86 builds. http://mails.dpdk.org/archives/test-report/2019-June/086586.html > > > +/** > > + * 128-bit integer structure. > > + */ > > +RTE_STD_C11 > > +typedef struct { > > + RTE_STD_C11 > > + union { > > + uint64_t val[2]; > > + __extension__ __int128 int128; > > + }; > > +} __rte_aligned(16) rte_int128_t; > > +#endif > > + > > #ifdef __DOXYGEN__