http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54087

             Bug #: 54087
           Summary: __atomic_fetch_add does not use xadd instruction
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: drepper....@gmail.com
            Target: x86_64-redhat-linux


Compiling this code

int a;

int f1(int p)
{
  return __atomic_sub_fetch(&a, p, __ATOMIC_SEQ_CST) == 0;
}

int f2(int p)
{
  return __atomic_fetch_sub(&a, p, __ATOMIC_SEQ_CST) - p == 0;
}

you'll see that neither function uses the xadd instruction with the lock
prefix.  Instead an expensive emulation using cmpxchg is used:

0000000000000000 <f1>:
   0:    8b 05 00 00 00 00        mov    0x0(%rip),%eax        # 6 <f1+0x6>
            2: R_X86_64_PC32    a-0x4
   6:    89 c2                    mov    %eax,%edx
   8:    29 fa                    sub    %edi,%edx
   a:    f0 0f b1 15 00 00 00     lock cmpxchg %edx,0x0(%rip)        # 12
<f1+0x12>
  11:    00 
            e: R_X86_64_PC32    a-0x4
  12:    75 f2                    jne    6 <f1+0x6>
  14:    31 c0                    xor    %eax,%eax
  16:    85 d2                    test   %edx,%edx
  18:    0f 94 c0                 sete   %al
  1b:    c3                       retq   

This implementation not only is larger, it has possibly (unlikely) unbounded
cost and even if the cmpxchg succeeds right away it is costlier.  The last
point is esepcially true if the cache line for the variable in question is not
in the core's cache.  In this case the initial load causes a I->S transition
for the cache line and the cmpxchg an additional and possibly also very
expensive S->E transition.  Using xadd would cause a I->E transition.

The config/i386/sync.md file in the current tree contains a pattern for
atomic_fetch_add which does use xadd but it seems not to be used, even if
instead of the function parameter an immediate value is used.

;; For operand 2 nonmemory_operand predicate is used instead of
;; register_operand to allow combiner to better optimize atomic
;; additions of constants.
(define_insn "atomic_fetch_add<mode>"
  [(set (match_operand:SWI 0 "register_operand" "=<r>")
        (unspec_volatile:SWI
          [(match_operand:SWI 1 "memory_operand" "+m")
           (match_operand:SI 3 "const_int_operand")]            ;; model
          UNSPECV_XCHG))
   (set (match_dup 1)
        (plus:SWI (match_dup 1)
                  (match_operand:SWI 2 "nonmemory_operand" "0")))
   (clobber (reg:CC FLAGS_REG))]
  "TARGET_XADD"
  "lock{%;} %K3xadd{<imodesuffix>}\t{%0, %1|%1, %0}")


X86_ARCH_XADD should be defined for every architecture but i386.

Reply via email to