On Tue, 2019-01-22 at 09:27 +0000, Ola Liljedahl wrote: > On Fri, 2019-01-18 at 09:23 -0600, Gage Eads wrote: > > v3: > > - Avoid the ABI break by putting 64-bit head and tail values in > > the > > same > > cacheline as struct rte_ring's prod and cons members. > > - Don't attempt to compile rte_atomic128_cmpset without > > ALLOW_EXPERIMENTAL_API, as this would break a large number of > > libraries. > > - Add a helpful warning to __rte_ring_do_nb_enqueue_mp() in case > > someone tries > > to use RING_F_NB without the ALLOW_EXPERIMENTAL_API flag. > > - Update the ring mempool to use experimental APIs > > - Clarify that RINB_F_NB is only limited to x86_64 currently; > > ARMv8.1-A builds > > can eventually support it with the CASP instruction. > ARMv8.0 should be able to implement a 128-bit atomic compare exchange > operation using LDXP/STXP.
Just wondering what would the performance difference between CASP vs LDXP/STXP on LSE supported machine? I think, We can not detect the presese of LSE support in compile time. Right? The dynamic one will be costly like, if (hwcaps & HWCAP_ATOMICS) { casp } else { ldxp stxp } > From an ARM perspective, I want all atomic operations to take memory > ordering arguments (e.g. acquire, release). Not all usages of e.g. +1 > atomic compare exchange require sequential consistency (which I think > what x86 cmpxchg instruction provides). DPDK functions should not be > modelled after x86 behaviour. > > Lock-free 128-bit atomics implementations for ARM/AArch64 and x86-64 > are available here: > https://github.com/ARM-software/progress64/blob/master/src/lockfree.h >