> -----Original Message----- > From: Phil Yang (Arm Technology China) <phil.y...@arm.com> > Sent: Tuesday, July 9, 2019 2:58 PM > To: Pavan Nikhilesh Bhagavatula <pbhagavat...@marvell.com>; Honnappa > Nagarahalli <honnappa.nagaraha...@arm.com>; Jerin Jacob Kollanukkaran > <jer...@marvell.com>; dev@dpdk.org > Cc: tho...@monjalon.net; hemant.agra...@nxp.com; Gavin Hu (Arm > Technology China) <gavin...@arm.com>; nd <n...@arm.com>; > gage.e...@intel.com; nd <n...@arm.com>; nd <n...@arm.com> > Subject: RE: [EXT] [PATCH v3 1/3] eal/arm64: add 128-bit atomic compare > exchange > > > -----Original Message----- > > From: Pavan Nikhilesh Bhagavatula <pbhagavat...@marvell.com> > > Sent: Friday, July 5, 2019 12:37 PM > > To: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; > > jer...@marvell.com; Phil Yang (Arm Technology China) > > <phil.y...@arm.com>; dev@dpdk.org > > Cc: tho...@monjalon.net; hemant.agra...@nxp.com; Gavin Hu (Arm > > Technology China) <gavin...@arm.com>; nd <n...@arm.com>; > > gage.e...@intel.com; nd <n...@arm.com> > > Subject: RE: [EXT] [PATCH v3 1/3] eal/arm64: add 128-bit atomic > > compare exchange > > > > > > <snip> > > > >> > > +#ifdef __ARM_FEATURE_ATOMICS > > >> > > +#define __ATOMIC128_CAS_OP(cas_op_name, op_string) > > >> \ > > >> > > +static inline rte_int128_t > > >> > > \ > > >> > > +cas_op_name(rte_int128_t *dst, rte_int128_t old, > > >> > > \ > > >> > > + rte_int128_t updated) > > >> > > \ > > >> > > +{ > > >> > > \ > > >> > > + /* caspX instructions register pair must start from even- > > >numbered > > >> > > + * register at operand 1. > > >> > > + * So, specify registers for local variables here. > > >> > > + */ > > >> > > \ > > >> > > + register uint64_t x0 __asm("x0") = (uint64_t)old.val[0]; > > \ > > >> > > > >> > I understand CASP limitation on register has to be even and odd. > > >> > Is there anyway to remove explicit x0 register allocation and > > >> > choose compiler to decide the register. Some reason with > > >> > optimize(03) gcc makes correctly but not clang. > > >> > > > >> > Hardcoding to specific register makes compiler to not optimize > > >> > the stuff, especially if it is inline function. > > >> > > >> It look like the limitation fixed recently in gcc. > > >> https://patches.linaro.org/patch/147991/ > > >> > > >> Not sure about old gcc and clang. ARM compiler experts may know > > >the exact > > >> status > > >> > > >We could use syntax as follows, an example is in [1] static inline > > >rte_int128_t __rte_casp(rte_int128_t *dst, rte_int128_t old, > > >rte_int128_t updated, int mo) { > > > __asm__ volatile("caspl %0, %H0, %1, %H1, [%2]" > > > : "+r" (old) > > > : "r" (updated), "r" (dst) > > > : "memory"); > > > return old; > > >} > > > > We have used this format for mempool/octeontx2 but clang wasn't too > > happy. > > > > dpdk/drivers/mempool/octeontx2/otx2_mempool_ops.c:151:15: error: > > value size does not match register size specified by the constraint > > and modifier [-Werror,-Wasm-operand-widths] > > [t0] "=&r" (t0), [t1] "=&r" (t1), [t2] "=&r" (t2), > > ^ > > dpdk/drivers/mempool/octeontx2/otx2_mempool_ops.c:82:9: note: use > > constraint modifier "w" > > "casp %[t0], %H[t0], %[wdata], %H[wdata], [%[loc]]\n" > > > > Had to change it to hand coded asm > > > > http://patches.dpdk.org/patch/56110/ > > Hi Jerin, > > The update from the compiler team is 'the LSE CASP fix has not been > backported to older GCC branches'. > So, currently, this seems the only approach works for all versions of GCC and > Clang. > I think we can add another optimization patch for this once all the compilers > were ready.
We are on same page. > > Thanks, > Phil > > > > > > > >[1] https://godbolt.org/z/EUJnuG