> -----Original Message-----
> From: Jerin Jacob Kollanukkaran <jer...@marvell.com>
> Sent: Friday, July 19, 2019 2:25 PM
> To: Phil Yang (Arm Technology China) <phil.y...@arm.com>; dev@dpdk.org
> Cc: tho...@monjalon.net; hemant.agra...@nxp.com; Honnappa
> Nagarahalli <honnappa.nagaraha...@arm.com>; Gavin Hu (Arm Technology
> China) <gavin...@arm.com>; nd <n...@arm.com>; gage.e...@intel.com
> Subject: RE: [EXT] [PATCH v3 1/3] eal/arm64: add 128-bit atomic compare
> exchange
> 
> > -----Original Message-----
> > From: Phil Yang <phil.y...@arm.com>
> > Sent: Friday, June 28, 2019 1:42 PM
> > To: dev@dpdk.org
> > Cc: tho...@monjalon.net; Jerin Jacob Kollanukkaran
> <jer...@marvell.com>;
> > hemant.agra...@nxp.com; honnappa.nagaraha...@arm.com;
> > gavin...@arm.com; n...@arm.com; gage.e...@intel.com
> > Subject: [EXT] [PATCH v3 1/3] eal/arm64: add 128-bit atomic compare
> > exchange
> >
> > External Email
> >
> > ----------------------------------------------------------------------
> > Add 128-bit atomic compare exchange on aarch64.
> >
> > Signed-off-by: Phil Yang <phil.y...@arm.com>
> > Tested-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>
> > ---
> > +#define RTE_HAS_ACQ(mo) ((mo) != __ATOMIC_RELAXED && (mo) !=
> > +__ATOMIC_RELEASE) #define RTE_HAS_RLS(mo) ((mo) ==
> > __ATOMIC_RELEASE || \
> > +                    (mo) == __ATOMIC_ACQ_REL || \
> > +                    (mo) == __ATOMIC_SEQ_CST)
> > +
> > +#define RTE_MO_LOAD(mo)  (RTE_HAS_ACQ((mo)) \
> > +           ? __ATOMIC_ACQUIRE : __ATOMIC_RELAXED) #define
> > RTE_MO_STORE(mo)
> > +(RTE_HAS_RLS((mo)) \
> > +           ? __ATOMIC_RELEASE : __ATOMIC_RELAXED)
> > +
> 
> The one starts with RTE_ are public symbols, If it is generic enough,
> Move to common layer so that every architecturse can use.
> If you think, otherwise make it internal

Let's keep it internal. I will remove the 'RTE_' tag. 

> 
> 
> 
> > +#ifdef __ARM_FEATURE_ATOMICS
> 
> This define is added in gcc 9.1 and I believe for clang it is not supported 
> yet.
> So old gcc and clang this will be undefined.
> I think, With meson + native build, we  can find the presence of
> ATOMIC support by running a.out. Not sure about make and cross build case.
> I don't want block this feature because of this, IMO, We can add this code
> with  existing __ARM_FEATURE_ATOMICS scheme and later find a method
> to enhance it. But please check how to fix it.

OK.

> 
> > +#define __ATOMIC128_CAS_OP(cas_op_name, op_string)                         
> >  \
> > +static inline rte_int128_t                                                 
> >  \
> > +cas_op_name(rte_int128_t *dst, rte_int128_t old,                           
> >  \
> > +           rte_int128_t updated)                                           
> >     \
> > +{                                                                          
> >  \
> > +   /* caspX instructions register pair must start from even-numbered
> > +    * register at operand 1.
> > +    * So, specify registers for local variables here.
> > +    */                                                                     
> > \
> > +   register uint64_t x0 __asm("x0") = (uint64_t)old.val[0];                
> > \
> 
> Since direct x0 register used in the code and
> cas_op_name() and rte_atomic128_cmp_exchange() is inline function,
> Based on parent function load, we may corrupt x0 register aka

Since x0/x1 and x2/x3 are used a lot and often contain live values.
Maybe to change them to some relatively less frequently used registers like 
x14/x15 and x16/x17 might help for this case?
According to the PCS (Procedure Call Standard), x14-x17 are also temporary 
registers.

> Break arm64 ABI. Not sure clobber list will help here or not?

In my understanding, for the register variable, if it contains a live value in 
the specified register, the compiler will move the live value into a free 
register. 
Since x0~x3 are present in the input/output operands and x0/x1's value needs to 
be restored to the variable 'old' as a return value. 
So I didn't add them into the clobber list.

> Making it as no_inline will help but not sure about the performance impact.
> May be you can check with compiler team.
> 
> We burned our hands with this scheme, see
> 5b40ec6b966260e0ff66a8a2c689664f75d6a0e6 ("mempool/octeontx2: fix
> possible arm64 ABI break")
> 
> Probably we can choose a scheme for rc2 and adjust as when we have
> complete clarity.
> 
> > +   register uint64_t x1 __asm("x1") = (uint64_t)old.val[1];                
> > \
> > +   register uint64_t x2 __asm("x2") = (uint64_t)updated.val[0];            
> > \
> > +   register uint64_t x3 __asm("x3") = (uint64_t)updated.val[1];            
> > \
> > +   asm volatile(                                                           
> > \
> > +                   op_string " %[old0], %[old1], %[upd0], %[upd1],
> > [%[dst]]"       \
> > +                   : [old0] "+r" (x0),                                     
> >         \
> > +                     [old1] "+r" (x1)                                      
> >         \
> > +                   : [upd0] "r" (x2),                                      
> >         \
> > +                     [upd1] "r" (x3),                                      
> >         \
> > +                     [dst] "r" (dst)                                       
> >         \
> > +                   : "memory");                                            
> >         \
> 
> Should n't we add x0,x1, x2, x3 in clobber list?

Same as above.

> 
> 
> >  static inline int __rte_experimental
> >  rte_atomic128_cmp_exchange(rte_int128_t *dst,
> >                        rte_int128_t *exp,
> > diff --git a/lib/librte_eal/common/include/generic/rte_atomic.h
> > b/lib/librte_eal/common/include/generic/rte_atomic.h
> > index 9958543..2355e50 100644
> > --- a/lib/librte_eal/common/include/generic/rte_atomic.h
> > +++ b/lib/librte_eal/common/include/generic/rte_atomic.h
> > @@ -1081,6 +1081,20 @@ static inline void
> > rte_atomic64_clear(rte_atomic64_t *v)
> >
> >  /*------------------------ 128 bit atomic operations 
> > -------------------------*/
> >
> > +#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64)
> 
> There is nothing specific to x86 and arm64 here, Can we remove this #ifdef ?

Without this constraint, it will break 32-bit x86 builds.
http://mails.dpdk.org/archives/test-report/2019-June/086586.html 

> 
> > +/**
> > + * 128-bit integer structure.
> > + */
> > +RTE_STD_C11
> > +typedef struct {
> > +   RTE_STD_C11
> > +   union {
> > +           uint64_t val[2];
> > +           __extension__ __int128 int128;
> > +   };
> > +} __rte_aligned(16) rte_int128_t;
> > +#endif
> > +
> >  #ifdef __DOXYGEN__

Reply via email to