On Thu, Apr 21, 2016 at 11:35:07PM +0800, Pan Xinhui wrote: > On 2016年04月20日 22:24, Peter Zijlstra wrote: > > On Wed, Apr 20, 2016 at 09:24:00PM +0800, Pan Xinhui wrote: > > > >> +#define __XCHG_GEN(cmp, type, sfx, skip, v) > >> \ > >> +static __always_inline unsigned long > >> \ > >> +__cmpxchg_u32##sfx(v unsigned int *p, unsigned long old, \ > >> + unsigned long new); \ > >> +static __always_inline u32 > >> \ > >> +__##cmp##xchg_##type##sfx(v void *ptr, u32 old, u32 new) \ > >> +{ \ > >> + int size = sizeof (type); \ > >> + int off = (unsigned long)ptr % sizeof(u32); \ > >> + volatile u32 *p = ptr - off; \ > >> + int bitoff = BITOFF_CAL(size, off); \ > >> + u32 bitmask = ((0x1 << size * BITS_PER_BYTE) - 1) << bitoff; \ > >> + u32 oldv, newv, tmp; \ > >> + u32 ret; \ > >> + oldv = READ_ONCE(*p); \ > >> + do { \ > >> + ret = (oldv & bitmask) >> bitoff; \ > >> + if (skip && ret != old) \ > >> + break; \ > >> + newv = (oldv & ~bitmask) | (new << bitoff); \ > >> + tmp = oldv; \ > >> + oldv = __cmpxchg_u32##sfx((v u32*)p, oldv, newv); \ > >> + } while (tmp != oldv); \ > >> + return ret; \ > >> +} > > > > So for an LL/SC based arch using cmpxchg() like that is sub-optimal. > > > > Why did you choose to write it entirely in C? > > > yes, you are right. more load/store will be done in C code. > However such xchg_u8/u16 is just used by qspinlock now. and I did not see any > performance regression. > So just wrote in C, for simple. :) > > Of course I have done xchg tests. > we run code just like xchg((u8*)&v, j++); in several threads. > and the result is, > [ 768.374264] use time[1550072]ns in xchg_u8_asm
How was xchg_u8_asm() implemented, using lbarx or using a 32bit ll/sc loop with shifting and masking in it? Regards, Boqun > [ 768.377102] use time[2826802]ns in xchg_u8_c > > I think this is because there is one more load in C. > If possible, we can move such code in asm-generic/. > > thanks > xinhui >
signature.asc
Description: PGP signature