On 06/02/2011 02:52 PM, Aldy Hernandez wrote:
Wouldn't it be better to pass the model (as an extra CONST_INT
operand) to the expanders?  Targets where atomic instructions always act
as full barriers could just ignore that argument, other could decide what
to do based on the value.
*shrug* I don't care.  Whatever everyone agrees on.
Let's do that.  Many of the targets will be expanding these to
a somewhat longer sequence at some stage, and they'll all be
95% identical.  The extra operand ought to make for less
boiler-plate code.

OK, here's Aldy's patch modified to make the memory model a parameter to 
the RTL pattern.  I haven't worked with RTL for a while, so hopefully 
it's close to right :-)
If we can settle on the implementation, I'll proceed with the rest of 
the required atomics and then make the changes required to libstdc++-v3 
all at once.
Fortran seems to decide to copy only some of builtin-types.def in its 
own private types.def... dare I ask why?
Other changes...

Rather than duplicating the code, expand_sync_mem_exchange() now calls expand_sync_lock_test_and_set() if there is no sync_mem_exchange pattern. This will mean when libstdc++ is converted to directly use the new __sync_mem_exchange, all existing architectures will still work as they do today, even if they dont provide the new patterns. That should ease the transition if the current behaviour is retained.
I'm also now inserting a _sync_synchronize() before expanding the 
lock_test_and_set() IFF the memory model is 'acq_rel' or 'seq_cst'. My 
understanding is that lock_test_and_set() is defined to be an acquire 
barrier only, and the results may not be correct without the extra 
synchronization.  (the processor is free to delay stores after the 
instruction if is only an acquire barrier)  I assume the compiler's 
lock_test_and_set builtin is considered to have the same characteristics 
as the intel instruction...      for the i386 port, i turned 
mem_exchange into a define_expand to issue the barrier if need be, and 
then follow it with the current lock_test_and_set insn.
Finally, I moved the definition of the various memmodel modes to 
machmode.h from tree.h.  This allows rtl pattern code to check memory 
order values during the expansion of patterns/insns.
So my rusty hands screwed around with the patch quite a bit... have a 
look.  thoughts?
bootstraps and no regressions on x86_64-unknown-linux.  Do we apply this 
to mainline?  or cxx-mem-model and then bring it all over later when 
they are all done and "perfected" ?
Andrew


        * doc/extend.texi (__sync_mem_exchange): Document.
        * cppbuiltin.c (define__GNUC__): Define __SYNC_MEM*.
        * machmode.h (enum memmodel): New.
        * c-family/c-common.c (BUILT_IN_MEM_EXCHANGE_N): Add case.
        * optabs.c (expand_sync_mem_exchange): New.
        * optabs.h (enum direct_optab_index): Add DOI_sync_mem_exchange entry.
        (sync_mem_exchange_optab): Define.
        * genopinit.c: Add entry for sync_mem_exchange.
        * builtins.c (get_memmodel): New.
        (expand_builtin_mem_exchange): New.
        (expand_builtin_synchronize): Remove static.
        (expand_builtin): Add cases for BUILT_IN_MEM_EXCHANGE_*.
        * sync-builtins.def: Add entries for BUILT_IN_MEM_EXCHANGE_*.
        * testsuite/gcc.dg/x86-sync-1.c: New test.
        * builtin-types.def (BT_FN_I{1,2,4,8,16}_VPTR_I{1,2,4,8,16}_INT): New.
        * expr.h (expand_sync_mem_exchange): Declare.
        (expand_builtin_synchronize): Declare.
        * fortran/types.def (BT_FN_I{1,2,4,8,16}_VPTR_I{1,2,4,8,16}_INT): New.
        * Makefile.in (cppbuiltin.o) Add missing dependency on $(TREE_H)
        * config/i386/i386.md (UNSPECV_MEM_XCHG): New.
        * config/i386/sync.md (sync_mem_exchange<mode>): New pattern.


Index: doc/extend.texi
===================================================================
*** doc/extend.texi     (revision 174933)
--- doc/extend.texi     (working copy)
*************** This builtin is not a full barrier, but 
*** 6728,6733 ****
--- 6728,6751 ----
  This means that all previous memory stores are globally visible, and all
  previous memory loads have been satisfied, but following memory reads
  are not prevented from being speculated to before the barrier.
+ 
+ @item @var{type} __sync_mem_exchange (@var{type} *ptr, @var{type} value, int 
memmodel, ...)
+ @findex __sync_mem_exchange
+ This builtin implements an atomic exchange operation within the
+ constraints of a memory model.  It writes @var{value} into
+ @code{*@var{ptr}}, and returns the previous contents of
+ @code{*@var{ptr}}.
+ 
+ The valid memory model variants for this builtin are
+ __SYNC_MEM_RELAXED, __SYNC_MEM_SEQ_CST, __SYNC_MEM_ACQUIRE,
+ __SYNC_MEM_RELEASE, and __SYNC_MEM_ACQ_REL.  The target pattern is responsible
+ for issuing the different synchronization instructions. It should default to 
+ the more restrictive memory model, the sequentially consistent model.  If 
+ nothing is implemented for the target, the compiler will implement it by
+ calling the __sync_lock_test_and_set builtin.  If the memory model is more
+ restrictive than memory_order_acquire, a memory barrier is emitted before
+ the instruction.
+ 
  @end table
  
  @node Object Size Checking
Index: cppbuiltin.c
===================================================================
*** cppbuiltin.c        (revision 174933)
--- cppbuiltin.c        (working copy)
*************** define__GNUC__ (cpp_reader *pfile)
*** 66,71 ****
--- 66,77 ----
    cpp_define_formatted (pfile, "__GNUC_MINOR__=%d", minor);
    cpp_define_formatted (pfile, "__GNUC_PATCHLEVEL__=%d", patchlevel);
    cpp_define_formatted (pfile, "__VERSION__=\"%s\"", version_string);
+   cpp_define_formatted (pfile, "__SYNC_MEM_RELAXED=%d", MEMMODEL_RELAXED);
+   cpp_define_formatted (pfile, "__SYNC_MEM_SEQ_CST=%d", MEMMODEL_SEQ_CST);
+   cpp_define_formatted (pfile, "__SYNC_MEM_ACQUIRE=%d", MEMMODEL_ACQUIRE);
+   cpp_define_formatted (pfile, "__SYNC_MEM_RELEASE=%d", MEMMODEL_RELEASE);
+   cpp_define_formatted (pfile, "__SYNC_MEM_ACQ_REL=%d", MEMMODEL_ACQ_REL);
+   cpp_define_formatted (pfile, "__SYNC_MEM_CONSUME=%d", MEMMODEL_CONSUME);
  }
  
  
Index: machmode.h
===================================================================
*** machmode.h  (revision 174933)
--- machmode.h  (working copy)
*************** extern enum machine_mode ptr_mode;
*** 275,278 ****
--- 275,291 ----
  /* Target-dependent machine mode initialization - in insn-modes.c.  */
  extern void init_adjust_machine_modes (void);
  
+ /* Memory model types for the __sync_mem* builtins. 
+    This must match the order in libstdc++-v3/include/bits/atomic_base.h.  */
+ enum memmodel
+ {
+   MEMMODEL_RELAXED = 0,
+   MEMMODEL_CONSUME = 1,
+   MEMMODEL_ACQUIRE = 2,
+   MEMMODEL_RELEASE = 3,
+   MEMMODEL_ACQ_REL = 4,
+   MEMMODEL_SEQ_CST = 5,
+   MEMMODEL_LAST = 6
+ };
+ 
  #endif /* not HAVE_MACHINE_MODES */
Index: c-family/c-common.c
===================================================================
*** c-family/c-common.c (revision 174933)
--- c-family/c-common.c (working copy)
*************** resolve_overloaded_builtin (location_t l
*** 9059,9064 ****
--- 9059,9065 ----
      case BUILT_IN_VAL_COMPARE_AND_SWAP_N:
      case BUILT_IN_LOCK_TEST_AND_SET_N:
      case BUILT_IN_LOCK_RELEASE_N:
+     case BUILT_IN_MEM_EXCHANGE_N:
        {
        int n = sync_resolve_size (function, params);
        tree new_function, first_param, result;
Index: optabs.c
===================================================================
*** optabs.c    (revision 174933)
--- optabs.c    (working copy)
*************** expand_sync_lock_test_and_set (rtx mem, 
*** 7037,7042 ****
--- 7037,7082 ----
  
    return NULL_RTX;
  }
+ 
+ /* This function expands the atomic exchange operation:
+    atomically store VAL in MEM and return the previous value in MEM.
+ 
+    MEMMODEL is the memory model variant to use.
+    TARGET is an option place to stick the return value.  */
+ 
+ rtx
+ expand_sync_mem_exchange (enum memmodel model, rtx mem, rtx val, rtx target)
+ {
+   enum machine_mode mode = GET_MODE (mem);
+   enum insn_code icode;
+ 
+   /* If the target supports the exchange directly, great.  */
+   icode = direct_optab_handler (sync_mem_exchange_optab, mode);
+   if (icode != CODE_FOR_nothing)
+     {
+       struct expand_operand ops[4];
+ 
+       create_output_operand (&ops[0], target, mode);
+       create_fixed_operand (&ops[1], mem);
+       /* VAL may have been promoted to a wider mode.  Shrink it if so.  */
+       create_convert_operand_to (&ops[2], val, mode, true);
+       create_integer_operand (&ops[3], model);
+       if (maybe_expand_insn (icode, 4, ops))
+       return ops[0].value;
+     }
+ 
+   /* Legacy sync_lock_test_and_set works the same, but is only defined as an 
+      acquire barrier.  If the pattern exists, and the memory model is stronger
+      than acquire, add a release barrier before the instruction.
+      The barrier is not needed if sync_lock_test_and_set doesn't exist since
+      it will expand into a compare-and-swap loop.  */
+   icode = direct_optab_handler (sync_lock_test_and_set_optab, mode);
+   if ((icode != CODE_FOR_nothing) && (model == MEMMODEL_SEQ_CST || 
+                                    model == MEMMODEL_ACQ_REL))
+     expand_builtin_synchronize ();
+ 
+   return expand_sync_lock_test_and_set (mem, val, target);
+ }
  
  /* Return true if OPERAND is suitable for operand number OPNO of
     instruction ICODE.  */
Index: optabs.h
===================================================================
*** optabs.h    (revision 174933)
--- optabs.h    (working copy)
*************** enum direct_optab_index
*** 675,680 ****
--- 675,683 ----
    /* Atomic clear with release semantics.  */
    DOI_sync_lock_release,
  
+   /* Atomic operations with C++0x memory model parameters. */
+   DOI_sync_mem_exchange,
+ 
    DOI_MAX
  };
  
*************** typedef struct direct_optab_d *direct_op
*** 722,727 ****
--- 725,733 ----
    (&direct_optab_table[(int) DOI_sync_lock_test_and_set])
  #define sync_lock_release_optab \
    (&direct_optab_table[(int) DOI_sync_lock_release])
+ 
+ #define sync_mem_exchange_optab \
+   (&direct_optab_table[(int) DOI_sync_mem_exchange])
  
  /* Target-dependent globals.  */
  struct target_optabs {
Index: genopinit.c
===================================================================
*** genopinit.c (revision 174933)
--- genopinit.c (working copy)
*************** static const char * const optabs[] =
*** 240,245 ****
--- 240,246 ----
    "set_direct_optab_handler (sync_compare_and_swap_optab, $A, 
CODE_FOR_$(sync_compare_and_swap$I$a$))",
    "set_direct_optab_handler (sync_lock_test_and_set_optab, $A, 
CODE_FOR_$(sync_lock_test_and_set$I$a$))",
    "set_direct_optab_handler (sync_lock_release_optab, $A, 
CODE_FOR_$(sync_lock_release$I$a$))",
+   "set_direct_optab_handler (sync_mem_exchange_optab, $A, 
CODE_FOR_$(sync_mem_exchange$I$a$))",
    "set_optab_handler (vec_set_optab, $A, CODE_FOR_$(vec_set$a$))",
    "set_optab_handler (vec_extract_optab, $A, CODE_FOR_$(vec_extract$a$))",
    "set_optab_handler (vec_extract_even_optab, $A, 
CODE_FOR_$(vec_extract_even$a$))",
Index: builtins.c
===================================================================
*** builtins.c  (revision 174933)
--- builtins.c  (working copy)
*************** expand_builtin_lock_test_and_set (enum m
*** 5192,5200 ****
    return expand_sync_lock_test_and_set (mem, val, target);
  }
  
  /* Expand the __sync_synchronize intrinsic.  */
  
! static void
  expand_builtin_synchronize (void)
  {
    gimple x;
--- 5192,5260 ----
    return expand_sync_lock_test_and_set (mem, val, target);
  }
  
+ /* Given an integer representing an ``enum memmodel'', verify its
+    correctness and return the memory model enum.  */
+ 
+ static enum memmodel
+ get_memmodel (tree exp)
+ {
+   rtx op;
+ 
+   if (TREE_CODE (exp) != INTEGER_CST)
+     {
+       error ("third argument to builtin is an invalid memory model");
+       return MEMMODEL_SEQ_CST;
+     }
+   op = expand_normal (exp);
+   if (INTVAL (op) < 0 || INTVAL (op) >= MEMMODEL_LAST)
+     {
+       error ("third argument to builtin is an invalid memory model");
+       return MEMMODEL_SEQ_CST;
+     }
+   return (enum memmodel) INTVAL (op);
+ }
+ 
+ /* Expand the __sync_mem_exchange intrinsic:
+ 
+       TYPE __sync_mem_exchange (TYPE *to, TYPE from, enum memmodel)
+ 
+    EXP is the CALL_EXPR.
+    TARGET is an optional place for us to store the results.  */
+ 
+ static rtx
+ expand_builtin_mem_exchange (enum machine_mode mode, tree exp, rtx target)
+ {
+   rtx val, mem;
+   enum machine_mode old_mode;
+   enum memmodel model;
+ 
+   model = get_memmodel (CALL_EXPR_ARG (exp, 2));
+   if (model != MEMMODEL_RELAXED
+       && model != MEMMODEL_SEQ_CST
+       && model != MEMMODEL_ACQ_REL
+       && model != MEMMODEL_RELEASE
+       && model != MEMMODEL_ACQUIRE)
+     {
+       error ("invalid memory model for %<__sync_mem_exchange%>");
+       return NULL_RTX;
+     }
+ 
+   /* Expand the operands.  */
+   mem = get_builtin_sync_mem (CALL_EXPR_ARG (exp, 0), mode);
+   val = expand_expr (CALL_EXPR_ARG (exp, 1), NULL_RTX, mode, EXPAND_NORMAL);
+   /* If VAL is promoted to a wider mode, convert it back to MODE.  Take care
+      of CONST_INTs, where we know the old_mode only from the call argument.  
*/
+   old_mode = GET_MODE (val);
+   if (old_mode == VOIDmode)
+     old_mode = TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG (exp, 1)));
+   val = convert_modes (mode, old_mode, val, 1);
+ 
+   return expand_sync_mem_exchange (model, mem, val, target);
+ }
+ 
  /* Expand the __sync_synchronize intrinsic.  */
  
! void
  expand_builtin_synchronize (void)
  {
    gimple x;
*************** expand_builtin (tree exp, rtx target, rt
*** 6000,6005 ****
--- 6060,6076 ----
        return target;
        break;
  
+     case BUILT_IN_MEM_EXCHANGE_1:
+     case BUILT_IN_MEM_EXCHANGE_2:
+     case BUILT_IN_MEM_EXCHANGE_4:
+     case BUILT_IN_MEM_EXCHANGE_8:
+     case BUILT_IN_MEM_EXCHANGE_16:
+       mode = get_builtin_sync_mode (fcode - BUILT_IN_MEM_EXCHANGE_1);
+       target = expand_builtin_mem_exchange (mode, exp, target);
+       if (target)
+       return target;
+       break;
+ 
      case BUILT_IN_LOCK_TEST_AND_SET_1:
      case BUILT_IN_LOCK_TEST_AND_SET_2:
      case BUILT_IN_LOCK_TEST_AND_SET_4:
Index: sync-builtins.def
===================================================================
*** sync-builtins.def   (revision 174933)
--- sync-builtins.def   (working copy)
*************** DEF_SYNC_BUILTIN (BUILT_IN_LOCK_RELEASE_
*** 250,252 ****
--- 250,273 ----
  
  DEF_SYNC_BUILTIN (BUILT_IN_SYNCHRONIZE, "__sync_synchronize",
                  BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST)
+ 
+ /* __sync* builtins for the C++ memory model.  */
+ 
+ DEF_SYNC_BUILTIN (BUILT_IN_MEM_EXCHANGE_N,
+                 "__sync_mem_exchange",
+                 BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST)
+ DEF_SYNC_BUILTIN (BUILT_IN_MEM_EXCHANGE_1,
+                 "__sync_mem_exchange_1",
+                 BT_FN_I1_VPTR_I1_INT, ATTR_NOTHROW_LEAF_LIST)
+ DEF_SYNC_BUILTIN (BUILT_IN_MEM_EXCHANGE_2,
+                 "__sync_mem_exchange_2",
+                 BT_FN_I2_VPTR_I2_INT, ATTR_NOTHROW_LEAF_LIST)
+ DEF_SYNC_BUILTIN (BUILT_IN_MEM_EXCHANGE_4,
+                 "__sync_mem_exchange_4",
+                 BT_FN_I4_VPTR_I4_INT, ATTR_NOTHROW_LEAF_LIST)
+ DEF_SYNC_BUILTIN (BUILT_IN_MEM_EXCHANGE_8,
+                 "__sync_mem_exchange_8",
+                 BT_FN_I8_VPTR_I8_INT, ATTR_NOTHROW_LEAF_LIST)
+ DEF_SYNC_BUILTIN (BUILT_IN_MEM_EXCHANGE_16,
+                 "__sync_mem_exchange_16",
+                 BT_FN_I16_VPTR_I16_INT, ATTR_NOTHROW_LEAF_LIST)
Index: testsuite/gcc.dg/x86-sync-1.c
===================================================================
*** testsuite/gcc.dg/x86-sync-1.c       (revision 0)
--- testsuite/gcc.dg/x86-sync-1.c       (revision 0)
***************
*** 0 ****
--- 1,9 ----
+ /* { dg-do compile } */
+ /* { dg-options "-dap" } */
+ 
+ int i;
+ 
+ void foo()
+ {
+   __sync_mem_exchange (&i, 555, __SYNC_MEM_SEQ_CST);
+ }
Index: builtin-types.def
===================================================================
*** builtin-types.def   (revision 174933)
--- builtin-types.def   (working copy)
*************** DEF_FUNCTION_TYPE_3 (BT_FN_VOID_OMPFN_PT
*** 383,388 ****
--- 383,393 ----
                     BT_PTR, BT_UINT)
  DEF_FUNCTION_TYPE_3 (BT_FN_PTR_CONST_PTR_INT_SIZE, BT_PTR,
                     BT_CONST_PTR, BT_INT, BT_SIZE)
+ DEF_FUNCTION_TYPE_3 (BT_FN_I1_VPTR_I1_INT, BT_I1, BT_VOLATILE_PTR, BT_I1, 
BT_INT)
+ DEF_FUNCTION_TYPE_3 (BT_FN_I2_VPTR_I2_INT, BT_I2, BT_VOLATILE_PTR, BT_I2, 
BT_INT)
+ DEF_FUNCTION_TYPE_3 (BT_FN_I4_VPTR_I4_INT, BT_I4, BT_VOLATILE_PTR, BT_I4, 
BT_INT)
+ DEF_FUNCTION_TYPE_3 (BT_FN_I8_VPTR_I8_INT, BT_I8, BT_VOLATILE_PTR, BT_I8, 
BT_INT)
+ DEF_FUNCTION_TYPE_3 (BT_FN_I16_VPTR_I16_INT, BT_I16, BT_VOLATILE_PTR, BT_I16, 
BT_INT)
  
  DEF_FUNCTION_TYPE_4 (BT_FN_SIZE_CONST_PTR_SIZE_SIZE_FILEPTR,
                     BT_SIZE, BT_CONST_PTR, BT_SIZE, BT_SIZE, BT_FILEPTR)
Index: expr.h
===================================================================
*** expr.h      (revision 174933)
--- expr.h      (working copy)
*************** rtx expand_bool_compare_and_swap (rtx, r
*** 217,222 ****
--- 217,223 ----
  rtx expand_sync_operation (rtx, rtx, enum rtx_code);
  rtx expand_sync_fetch_operation (rtx, rtx, enum rtx_code, bool, rtx);
  rtx expand_sync_lock_test_and_set (rtx, rtx, rtx);
+ rtx expand_sync_mem_exchange (enum memmodel, rtx, rtx, rtx);
  
  /* Functions from expmed.c:  */
  
*************** extern void expand_builtin_setjmp_receiv
*** 248,253 ****
--- 249,255 ----
  extern rtx expand_builtin_saveregs (void);
  extern void expand_builtin_trap (void);
  extern rtx builtin_strncpy_read_str (void *, HOST_WIDE_INT, enum 
machine_mode);
+ extern void expand_builtin_synchronize (void);
  
  /* Functions from expr.c:  */
  
Index: fortran/types.def
===================================================================
*** fortran/types.def   (revision 174933)
--- fortran/types.def   (working copy)
*************** DEF_FUNCTION_TYPE_3 (BT_FN_I16_VPTR_I16_
*** 120,125 ****
--- 120,132 ----
  DEF_FUNCTION_TYPE_3 (BT_FN_VOID_OMPFN_PTR_UINT, BT_VOID, BT_PTR_FN_VOID_PTR,
                       BT_PTR, BT_UINT)
  
+ DEF_FUNCTION_TYPE_3 (BT_FN_I1_VPTR_I1_INT, BT_I1, BT_VOLATILE_PTR, BT_I1, 
BT_INT)
+ DEF_FUNCTION_TYPE_3 (BT_FN_I2_VPTR_I2_INT, BT_I2, BT_VOLATILE_PTR, BT_I2, 
BT_INT)
+ DEF_FUNCTION_TYPE_3 (BT_FN_I4_VPTR_I4_INT, BT_I4, BT_VOLATILE_PTR, BT_I4, 
BT_INT)
+ DEF_FUNCTION_TYPE_3 (BT_FN_I8_VPTR_I8_INT, BT_I8, BT_VOLATILE_PTR, BT_I8, 
BT_INT)
+ DEF_FUNCTION_TYPE_3 (BT_FN_I16_VPTR_I16_INT, BT_I16, BT_VOLATILE_PTR, BT_I16, 
BT_INT)
+ 
+ 
  DEF_FUNCTION_TYPE_4 (BT_FN_VOID_OMPFN_PTR_UINT_UINT,
                       BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR, BT_UINT, BT_UINT)
  DEF_FUNCTION_TYPE_4 (BT_FN_VOID_PTR_WORD_WORD_PTR,
Index: Makefile.in
===================================================================
*** Makefile.in (revision 174933)
--- Makefile.in (working copy)
*************** PREPROCESSOR_DEFINES = \
*** 4048,4054 ****
    @TARGET_SYSTEM_ROOT_DEFINE@
  
  cppbuiltin.o: cppbuiltin.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
!       cppbuiltin.h Makefile
        $(COMPILER) $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) \
          $(PREPROCESSOR_DEFINES) -DBASEVER=$(BASEVER_s) \
          -c $(srcdir)/cppbuiltin.c $(OUTPUT_OPTION)
--- 4048,4054 ----
    @TARGET_SYSTEM_ROOT_DEFINE@
  
  cppbuiltin.o: cppbuiltin.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
!       $(TREE_H) cppbuiltin.h Makefile
        $(COMPILER) $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) \
          $(PREPROCESSOR_DEFINES) -DBASEVER=$(BASEVER_s) \
          -c $(srcdir)/cppbuiltin.c $(OUTPUT_OPTION)
Index: config/i386/i386.md
===================================================================
*** config/i386/i386.md (revision 174933)
--- config/i386/i386.md (working copy)
***************
*** 252,257 ****
--- 252,258 ----
    UNSPECV_MWAIT
    UNSPECV_CMPXCHG
    UNSPECV_XCHG
+   UNSPECV_MEM_XCHG
    UNSPECV_LOCK
    UNSPECV_PROLOGUE_USE
    UNSPECV_CLD
Index: config/i386/sync.md
===================================================================
*** config/i386/sync.md (revision 174933)
--- config/i386/sync.md (working copy)
***************
*** 232,237 ****
--- 232,257 ----
    return "lock{%;} add{<imodesuffix>}\t{%1, %0|%0, %1}";
  })
  
+ (define_expand "sync_mem_exchange<mode>"
+   [(set (match_operand:SWI 0 "register_operand" "=<r>")
+       (unspec_volatile:SWI
+          [(match_operand:SWI 1 "memory_operand" "+m")] UNSPECV_MEM_XCHG))
+    (set (match_dup 1)
+       (match_operand:SWI 2 "register_operand" "0"))
+    (match_operand:SI 3 "const_int_operand" "n")]
+   ""
+ {
+   /* lock_test_and_set is only an acquire barrier. If a stronger barrier is
+      required, issue a release barrier before the insn.  */
+   if (INTVAL (operands[3]) == MEMMODEL_ACQ_REL ||
+       INTVAL (operands[3]) == MEMMODEL_SEQ_CST)
+     emit_insn (gen_memory_barrier ());
+   emit_insn (gen_sync_lock_test_and_set<mode> (operands[0], 
+                                              operands[1],
+                                              operands[2]));
+   DONE;
+ })
+ 
  ;; Recall that xchg implicitly sets LOCK#, so adding it again wastes space.
  (define_insn "sync_lock_test_and_set<mode>"
    [(set (match_operand:SWI 0 "register_operand" "=<r>")

Reply via email to