On Mon, 3 Oct 2016 17:03:03 +1100 Anton Blanchard <an...@ozlabs.org> wrote:
> From: Anton Blanchard <an...@samba.org> > > I see quite a lot of static branch mispredictions on a simple > web serving workload. The issue is in __atomic_add_unless(), called > from _atomic_dec_and_lock(). There is no obvious common case, so it > is better to let the hardware predict the branch. Seems reasonable. How problematic is an unmatched lwarx for performance? It seems that it will serialize a subsequent lwarx until it is next to complete (in the case of atomic_dec_and_lock, the spin_lock will immediately do another lwarx). Maybe that's not a big problem. Putting a regular load and test here before the lwarx would be disappointing because many users have success as the common case. > Signed-off-by: Anton Blanchard <an...@samba.org> > --- > arch/powerpc/include/asm/atomic.h | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/include/asm/atomic.h > b/arch/powerpc/include/asm/atomic.h > index f08d567..2b90335 100644 > --- a/arch/powerpc/include/asm/atomic.h > +++ b/arch/powerpc/include/asm/atomic.h > @@ -233,7 +233,7 @@ static __inline__ int __atomic_add_unless(atomic_t *v, > int a, int u) > PPC_ATOMIC_ENTRY_BARRIER > "1: lwarx %0,0,%1 # __atomic_add_unless\n\ > cmpw 0,%0,%3 \n\ > - beq- 2f \n\ > + beq 2f \n\ > add %0,%2,%0 \n" > PPC405_ERR77(0,%2) > " stwcx. %0,0,%1 \n\ > @@ -539,7 +539,7 @@ static __inline__ int atomic64_add_unless(atomic64_t *v, > long a, long u) > PPC_ATOMIC_ENTRY_BARRIER > "1: ldarx %0,0,%1 # __atomic_add_unless\n\ > cmpd 0,%0,%3 \n\ > - beq- 2f \n\ > + beq 2f \n\ > add %0,%2,%0 \n" > " stdcx. %0,0,%1 \n\ > bne- 1b \n"