> -----Original Message-----
> From: Phil Yang <phil.y...@arm.com>
> Sent: Friday, June 28, 2019 1:42 PM
> To: dev@dpdk.org
> Cc: tho...@monjalon.net; Jerin Jacob Kollanukkaran <jer...@marvell.com>;
> hemant.agra...@nxp.com; honnappa.nagaraha...@arm.com;
> gavin...@arm.com; n...@arm.com; gage.e...@intel.com
> Subject: [EXT] [PATCH v3 1/3] eal/arm64: add 128-bit atomic compare
> exchange
> 
> External Email
> 
> ----------------------------------------------------------------------
> Add 128-bit atomic compare exchange on aarch64.
> 
> Signed-off-by: Phil Yang <phil.y...@arm.com>
> Tested-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>
> ---
> +#define RTE_HAS_ACQ(mo) ((mo) != __ATOMIC_RELAXED && (mo) !=
> +__ATOMIC_RELEASE) #define RTE_HAS_RLS(mo) ((mo) ==
> __ATOMIC_RELEASE || \
> +                      (mo) == __ATOMIC_ACQ_REL || \
> +                      (mo) == __ATOMIC_SEQ_CST)
> +
> +#define RTE_MO_LOAD(mo)  (RTE_HAS_ACQ((mo)) \
> +             ? __ATOMIC_ACQUIRE : __ATOMIC_RELAXED) #define
> RTE_MO_STORE(mo)
> +(RTE_HAS_RLS((mo)) \
> +             ? __ATOMIC_RELEASE : __ATOMIC_RELAXED)
> +

The one starts with RTE_ are public symbols, If it is generic enough,
Move to common layer so that every architecturse can use.
If you think, otherwise make it internal 



> +#ifdef __ARM_FEATURE_ATOMICS

This define is added in gcc 9.1 and I believe for clang it is not supported yet.
So old gcc and clang this will be undefined.
I think, With meson + native build, we  can find the presence of 
ATOMIC support by running a.out. Not sure about make and cross build case.
I don't want block this feature because of this, IMO, We can add this code
with  existing __ARM_FEATURE_ATOMICS scheme and later find a method
to enhance it. But please check how to fix it.

> +#define __ATOMIC128_CAS_OP(cas_op_name, op_string)                          \
> +static inline rte_int128_t                                                  \
> +cas_op_name(rte_int128_t *dst, rte_int128_t old,                            \
> +             rte_int128_t updated)                                           
>     \
> +{                                                                           \
> +     /* caspX instructions register pair must start from even-numbered
> +      * register at operand 1.
> +      * So, specify registers for local variables here.
> +      */                                                                     
> \
> +     register uint64_t x0 __asm("x0") = (uint64_t)old.val[0];                
> \

Since direct x0 register used in the code and
cas_op_name() and rte_atomic128_cmp_exchange() is inline function,
Based on parent function load, we may corrupt x0 register aka
Break arm64 ABI. Not sure clobber list will help here or not?
Making it as no_inline will help but not sure about the performance impact.
May be you can check with compiler team. 

We burned our hands with this scheme, see
5b40ec6b966260e0ff66a8a2c689664f75d6a0e6 ("mempool/octeontx2: fix possible 
arm64 ABI break")

Probably we can choose a scheme for rc2 and adjust as when we have complete 
clarity.

> +     register uint64_t x1 __asm("x1") = (uint64_t)old.val[1];                
> \
> +     register uint64_t x2 __asm("x2") = (uint64_t)updated.val[0];            
> \
> +     register uint64_t x3 __asm("x3") = (uint64_t)updated.val[1];            
> \
> +     asm volatile(                                                           
> \
> +                     op_string " %[old0], %[old1], %[upd0], %[upd1],
> [%[dst]]"       \
> +                     : [old0] "+r" (x0),                                     
>         \
> +                       [old1] "+r" (x1)                                      
>         \
> +                     : [upd0] "r" (x2),                                      
>         \
> +                       [upd1] "r" (x3),                                      
>         \
> +                       [dst] "r" (dst)                                       
>         \
> +                     : "memory");                                            
>         \

Should n't we add x0,x1, x2, x3 in clobber list?


>  static inline int __rte_experimental
>  rte_atomic128_cmp_exchange(rte_int128_t *dst,
>                          rte_int128_t *exp,
> diff --git a/lib/librte_eal/common/include/generic/rte_atomic.h
> b/lib/librte_eal/common/include/generic/rte_atomic.h
> index 9958543..2355e50 100644
> --- a/lib/librte_eal/common/include/generic/rte_atomic.h
> +++ b/lib/librte_eal/common/include/generic/rte_atomic.h
> @@ -1081,6 +1081,20 @@ static inline void
> rte_atomic64_clear(rte_atomic64_t *v)
> 
>  /*------------------------ 128 bit atomic operations 
> -------------------------*/
> 
> +#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64)

There is nothing specific to x86 and arm64 here, Can we remove this #ifdef ?

> +/**
> + * 128-bit integer structure.
> + */
> +RTE_STD_C11
> +typedef struct {
> +     RTE_STD_C11
> +     union {
> +             uint64_t val[2];
> +             __extension__ __int128 int128;
> +     };
> +} __rte_aligned(16) rte_int128_t;
> +#endif
> +
>  #ifdef __DOXYGEN__

Reply via email to