> -----Original Message----- > From: Jerin Jacob Kollanukkaran <jer...@marvell.com> > Sent: Thursday, December 27, 2018 2:58 PM > To: Gavin Hu (Arm Technology China) <gavin...@arm.com>; dev@dpdk.org > Cc: david.march...@redhat.com; chao...@linux.vnet.ibm.com; nd > <n...@arm.com>; bruce.richard...@intel.com; tho...@monjalon.net; Joyce > Kong (Arm Technology China) <joyce.k...@arm.com>; > hemant.agra...@nxp.com; step...@networkplumber.org; Honnappa > Nagarahalli <honnappa.nagaraha...@arm.com> > Subject: Re: [EXT] [PATCH v3 6/6] spinlock: ticket based to improve fairness > > On Thu, 2018-12-27 at 12:13 +0800, Gavin Hu wrote: > > ------------------------------------------------------------------- > > --- > > From: Joyce Kong <joyce.k...@arm.com> > > > > The old implementation is unfair, some threads may take locks > > aggressively > > I think, one issue here is x86 and ppc follows traditional spinlock and > arm64 will be following ticket lock for spinlock implementation. > This would change application behaviour on arm64 compared to x86 and > ppc. > > How about having a separate API for ticket lock? That would give, > # application choice to use the locking strategy > # application behaviour will be same across all arch.
Ok, will do in v4 to have a new named rte_ticket_spinlock API. > Initial ticket lock implementation can be generic with C11 memory > primitive, latter arch can optimize it, if required. Yes, latter we might optimize with new instructions. > > while leaving the other threads starving for long time. As shown in > > the > > following test, within same period of time, there are threads taking > > locks > > much more times than the others. > > > > > > #ifdef RTE_FORCE_INTRINSICS > > static inline void > > -rte_spinlock_unlock (rte_spinlock_t *sl) > > +rte_spinlock_unlock(rte_spinlock_t *sl) > > { > > - __atomic_store_n(&sl->locked, 0, __ATOMIC_RELEASE); > > + uint16_t i = __atomic_load_n(&sl->s.current, __ATOMIC_RELAXED); > > + i++; > > + __atomic_store_n(&sl->s.current, i, __ATOMIC_RELAXED); > > Shouldn't we use __ATOMIC_RELEASE here to pair with lock() ? > > > > } > > #endif > > > > @@ -98,16 +100,19 @@ rte_spinlock_unlock (rte_spinlock_t *sl) > > * 1 if the lock is successfully taken; 0 otherwise. > > */ > > static inline int > > -rte_spinlock_trylock (rte_spinlock_t *sl); > > +rte_spinlock_trylock(rte_spinlock_t *sl); > > > > #ifdef RTE_FORCE_INTRINSICS > > static inline int > > -rte_spinlock_trylock (rte_spinlock_t *sl) > > +rte_spinlock_trylock(rte_spinlock_t *sl) > > { > > - int exp = 0; > > - return __atomic_compare_exchange_n(&sl->locked, &exp, 1, > > - 0, /* disallow spurious failure */ > > - __ATOMIC_ACQUIRE, __ATOMIC_RELAXED); > > + uint16_t me = __atomic_fetch_add(&sl->s.next, 1, > > __ATOMIC_RELAXED); > > + while (__atomic_load_n(&sl->s.current, __ATOMIC_RELAXED) != me) > > { > > + __atomic_sub_fetch(&sl->s.next, 1, __ATOMIC_RELAXED); > > + return 0; > > + } > > + > > Shouldn't we need CAS here? > Similar implementation here: > https://git.linaro.org/lng/odp.git/tree/platform/linux- > generic/include/odp/api/plat/ticketlock_inlines.h That is correct, CAS is required. Assume T2 takes precedence for requesting a ticket, eg. T2.next = 2, but did not take the lock yet, eg. T2.current = 1, then T1 trylock, T1.next = 3, then T2 takes the lock, T2.next = T2.current = 2, T1 fallback, T1.next = 2, both next = 2, that's not correct, next current will be 3, T1 will not get the lock any more. > > > + return 1; > > } > > #endif > > > >