Hi, I was debugging some performance issues with an application that uses the gcc builtin lock functions on powerpc. A simple test case:
long lock_try(long *value) { return __sync_lock_test_and_set(value, 1); } long unlock(long *value) { __sync_lock_release(value); } 00000010 <lock_try>: 10: 7c 00 04 ac sync 14: 7c 69 1b 78 mr r9,r3 18: 38 00 00 01 li r0,1 1c: 7c 60 48 28 lwarx r3,0,r9 20: 7c 00 49 2d stwcx. r0,0,r9 24: 40 a2 ff f8 bne- 1c <lock_try+0xc> 28: 4c 00 01 2c isync 2c: 4e 80 00 20 blr 00000000 <unlock>: 0: 7c 20 04 ac lwsync 4: 38 00 00 00 li r0,0 8: 90 03 00 00 stw r0,0(r3) c: 4e 80 00 20 blr unlock looks good, but lock has both release and acquire barriers. Even worse, the release barrier is a heavyweight sync which is very slow. Looking at the gcc documentation, sync_lock_test_and_set only needs an aquire barrier: > sync_lock_test_and_set ... > This pattern must issue any memory barrier instructions such that the > pattern as a whole acts as an acquire barrier, that is all memory > operations after the pattern do not occur until the lock is acquired. In light of this, remove the release barrier from rs6000_split_lock_test_and_set: 00000010 <lock_try>: 10: 7c 69 1b 78 mr r9,r3 14: 38 00 00 01 li r0,1 18: 7c 60 48 28 lwarx r3,0,r9 1c: 7c 00 49 2d stwcx. r0,0,r9 20: 40 a2 ff f8 bne- 18 <lock_try+0x8> 24: 4c 00 01 2c isync 28: 4e 80 00 20 blr Anton -- Index: gcc/gcc/config/rs6000/rs6000.c =================================================================== --- gcc.orig/gcc/config/rs6000/rs6000.c 2008-09-03 02:30:14.000000000 -0400 +++ gcc/gcc/config/rs6000/rs6000.c 2008-09-03 02:33:35.000000000 -0400 @@ -14000,8 +14000,6 @@ enum machine_mode mode = GET_MODE (mem); rtx label, x, cond = gen_rtx_REG (CCmode, CR0_REGNO); - emit_insn (gen_memory_barrier ()); - label = gen_rtx_LABEL_REF (VOIDmode, gen_label_rtx ()); emit_label (XEXP (label, 0));