Re: [PATCH V3] powerpc: Implement {cmp}xchg for u8 and u16

Boqun Feng Thu, 21 Apr 2016 20:14:08 -0700

On Fri, Apr 22, 2016 at 09:59:22AM +0800, Pan Xinhui wrote:
> On 2016年04月21日 23:52, Boqun Feng wrote:
> > On Thu, Apr 21, 2016 at 11:35:07PM +0800, Pan Xinhui wrote:
> >> On 2016年04月20日 22:24, Peter Zijlstra wrote:
> >>> On Wed, Apr 20, 2016 at 09:24:00PM +0800, Pan Xinhui wrote:
> >>>
> >>>> +#define __XCHG_GEN(cmp, type, sfx, skip, v)                             
> >>>> \
> >>>> +static __always_inline unsigned long                                    
> >>>> \
> >>>> +__cmpxchg_u32##sfx(v unsigned int *p, unsigned long old,                
> >>>> \
> >>>> +                         unsigned long new);                            
> >>>> \
> >>>> +static __always_inline u32                                              
> >>>> \
> >>>> +__##cmp##xchg_##type##sfx(v void *ptr, u32 old, u32 new)                
> >>>> \
> >>>> +{                                                                       
> >>>> \
> >>>> +        int size = sizeof (type);                                       
> >>>> \
> >>>> +        int off = (unsigned long)ptr % sizeof(u32);                     
> >>>> \
> >>>> +        volatile u32 *p = ptr - off;                                    
> >>>> \
> >>>> +        int bitoff = BITOFF_CAL(size, off);                             
> >>>> \
> >>>> +        u32 bitmask = ((0x1 << size * BITS_PER_BYTE) - 1) << bitoff;    
> >>>> \
> >>>> +        u32 oldv, newv, tmp;                                            
> >>>> \
> >>>> +        u32 ret;                                                        
> >>>> \
> >>>> +        oldv = READ_ONCE(*p);                                           
> >>>> \
> >>>> +        do {                                                            
> >>>> \
> >>>> +                ret = (oldv & bitmask) >> bitoff;                       
> >>>> \
> >>>> +                if (skip && ret != old)                                 
> >>>> \
> >>>> +                        break;                                          
> >>>> \
> >>>> +                newv = (oldv & ~bitmask) | (new << bitoff);             
> >>>> \
> >>>> +                tmp = oldv;                                             
> >>>> \
> >>>> +                oldv = __cmpxchg_u32##sfx((v u32*)p, oldv, newv);       
> >>>> \
> >>>> +        } while (tmp != oldv);                                          
> >>>> \
> >>>> +        return ret;                                                     
> >>>> \
> >>>> +}
> >>>
> >>> So for an LL/SC based arch using cmpxchg() like that is sub-optimal.
> >>>
> >>> Why did you choose to write it entirely in C?
> >>>
> >> yes, you are right. more load/store will be done in C code.
> >> However such xchg_u8/u16 is just used by qspinlock now. and I did not see 
> >> any performance regression.
> >> So just wrote in C, for simple. :)
> >>
> >> Of course I have done xchg tests.
> >> we run code just like xchg((u8*)&v, j++); in several threads.
> >> and the result is,
> >> [  768.374264] use time[1550072]ns in xchg_u8_asm
> > 
> > How was xchg_u8_asm() implemented, using lbarx or using a 32bit ll/sc
> > loop with shifting and masking in it?
> > 
> yes, using 32bit ll/sc loops.
> 
> looks like:
>         __asm__ __volatile__(
> "1:     lwarx   %0,0,%3\n"
> "       and %1,%0,%5\n"
> "       or %1,%1,%4\n"
>        PPC405_ERR77(0,%2)
> "       stwcx.  %1,0,%3\n"
> "       bne-    1b"
>         : "=&r" (_oldv), "=&r" (tmp), "+m" (*(volatile unsigned int *)_p)
>         : "r" (_p), "r" (_newv), "r" (_oldv_mask)
>         : "cc", "memory");
>


Good, so this works for all ppc ISAs too.

Given the performance benefit(maybe caused by the reason Peter
mentioned), I think we should use this as the implementation of u8/u16
{cmp}xchg for now. For Power7 and later, we can always switch to the
lbarx/lharx version if observable performance benefit can be achieved.

But the choice is left to you. After all, as you said, qspinlock is the
only user ;-)

Regards,
Boqun

> 
> > Regards,
> > Boqun
> > 
> >> [  768.377102] use time[2826802]ns in xchg_u8_c
> >>
> >> I think this is because there is one more load in C.
> >> If possible, we can move such code in asm-generic/.
> >>
> >> thanks
> >> xinhui
> >>
>

signature.asc
Description: PGP signature

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V3] powerpc: Implement {cmp}xchg for u8 and u16

Reply via email to