http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54087
Bug #: 54087 Summary: __atomic_fetch_add does not use xadd instruction Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: drepper....@gmail.com Target: x86_64-redhat-linux Compiling this code int a; int f1(int p) { return __atomic_sub_fetch(&a, p, __ATOMIC_SEQ_CST) == 0; } int f2(int p) { return __atomic_fetch_sub(&a, p, __ATOMIC_SEQ_CST) - p == 0; } you'll see that neither function uses the xadd instruction with the lock prefix. Instead an expensive emulation using cmpxchg is used: 0000000000000000 <f1>: 0: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 6 <f1+0x6> 2: R_X86_64_PC32 a-0x4 6: 89 c2 mov %eax,%edx 8: 29 fa sub %edi,%edx a: f0 0f b1 15 00 00 00 lock cmpxchg %edx,0x0(%rip) # 12 <f1+0x12> 11: 00 e: R_X86_64_PC32 a-0x4 12: 75 f2 jne 6 <f1+0x6> 14: 31 c0 xor %eax,%eax 16: 85 d2 test %edx,%edx 18: 0f 94 c0 sete %al 1b: c3 retq This implementation not only is larger, it has possibly (unlikely) unbounded cost and even if the cmpxchg succeeds right away it is costlier. The last point is esepcially true if the cache line for the variable in question is not in the core's cache. In this case the initial load causes a I->S transition for the cache line and the cmpxchg an additional and possibly also very expensive S->E transition. Using xadd would cause a I->E transition. The config/i386/sync.md file in the current tree contains a pattern for atomic_fetch_add which does use xadd but it seems not to be used, even if instead of the function parameter an immediate value is used. ;; For operand 2 nonmemory_operand predicate is used instead of ;; register_operand to allow combiner to better optimize atomic ;; additions of constants. (define_insn "atomic_fetch_add<mode>" [(set (match_operand:SWI 0 "register_operand" "=<r>") (unspec_volatile:SWI [(match_operand:SWI 1 "memory_operand" "+m") (match_operand:SI 3 "const_int_operand")] ;; model UNSPECV_XCHG)) (set (match_dup 1) (plus:SWI (match_dup 1) (match_operand:SWI 2 "nonmemory_operand" "0"))) (clobber (reg:CC FLAGS_REG))] "TARGET_XADD" "lock{%;} %K3xadd{<imodesuffix>}\t{%0, %1|%1, %0}") X86_ARCH_XADD should be defined for every architecture but i386.