While debugging a transaction lock elision issue, we noticed that the
compiler was moving some loads/stores outside of the transaction body,
because the HTM instructions were not marked as memory barriers, which
is bad.  Looking deeper, I also noticed that neither Intel and S390
have their HTM instructions marked as memory barriers either, although
Andi did submit a patch last year:

    https://gcc.gnu.org/ml/gcc-patches/2014-10/msg02999.html

Richi and r~ both said the memory barrier should be part of the patterns:

    https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02235.html

The following patch uses that suggestion by adding memory barriers to
all of our HTM instructions, which fixes the issue.  I have also added
a __TM_FENCE__ macro users can test to see whether the compiler treats
the HTM instructions as memory barriers or not, in case they want to
explicitly add memory barriers to their code when using older compilers.

On a glibc thread discussing this issue, Torvald also asked that I add
documention describing the memory consistency semantics the HTM instructions
should have, so I added a blurb about that.  Torvald, is the text below
what you were looking for?

This has passed bootstrap/regtesting on powerpc64le-linux.  Is this ok
for mainline?

Since this is a correctness issue, I'd like to eventually backport this to
the release branches.  Is that ok once I've verified bootstrap/regtesting
on them?

Once this is committed, I can take a stab at fixing Intel and S390 similarly,
unless someone beats me to it (hint hint :).  I'd need help testing it though,
since I don't have access to Intel or S390 hardware that support HTM.

Peter

        * config/rs6000/htm.md (UNSPEC_HTM_FENCE): New.
        (tabort, tabort<wd>c, tabort<wd>ci, tbegin, tcheck, tend,
        trechkpt, treclaim, tsr, ttest): Rename define_insns from this...
        (*tabort, *tabort<wd>c, *tabort<wd>ci, *tbegin, *tcheck, *tend,
        *trechkpt, *treclaim, *tsr, *ttest): ...to this.  Add memory barrier.
        (tabort, tabort<wd>c, tabort<wd>ci, tbegin, tcheck, tend,
        trechkpt, treclaim, tsr, ttest): New define_expands.
        * config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Define
        __TM_FENCE__ for htm.
        * doc/extend.texi: Update documentation for htm builtins.

Index: gcc/config/rs6000/htm.md
===================================================================
--- gcc/config/rs6000/htm.md    (revision 227308)
+++ gcc/config/rs6000/htm.md    (working copy)
@@ -45,96 +45,231 @@
    UNSPECV_HTM_MTSPR
   ])
 
+;;
+;; UNSPEC usage
+;;
 
-(define_insn "tabort"
+(define_c_enum "unspec"
+  [UNSPEC_HTM_FENCE
+  ])
+
+(define_expand "tabort"
+  [(parallel
+     [(set (match_operand:CC 1 "cc_reg_operand" "=x")
+          (unspec_volatile:CC [(match_operand:SI 0 "base_reg_operand" "b")]
+                              UNSPECV_HTM_TABORT))
+      (set (match_dup 2) (unspec:BLK [(match_dup 2)] UNSPEC_HTM_FENCE))])]
+  "TARGET_HTM"
+{
+  operands[2] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (operands[2]) = 1;
+})
+
+(define_insn "*tabort"
   [(set (match_operand:CC 1 "cc_reg_operand" "=x")
        (unspec_volatile:CC [(match_operand:SI 0 "base_reg_operand" "b")]
-                           UNSPECV_HTM_TABORT))]
+                           UNSPECV_HTM_TABORT))
+   (set (match_operand:BLK 2) (unspec:BLK [(match_dup 2)] UNSPEC_HTM_FENCE))]
   "TARGET_HTM"
   "tabort. %0"
   [(set_attr "type" "htm")
    (set_attr "length" "4")])
 
-(define_insn "tabort<wd>c"
+(define_expand "tabort<wd>c"
+  [(parallel
+     [(set (match_operand:CC 3 "cc_reg_operand" "=x")
+          (unspec_volatile:CC [(match_operand 0 "u5bit_cint_operand" "n")
+                               (match_operand:GPR 1 "gpc_reg_operand" "r")
+                               (match_operand:GPR 2 "gpc_reg_operand" "r")]
+                              UNSPECV_HTM_TABORTXC))
+      (set (match_dup 4) (unspec:BLK [(match_dup 4)] UNSPEC_HTM_FENCE))])]
+  "TARGET_HTM"
+{
+  operands[4] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (operands[4]) = 1;
+})
+
+(define_insn "*tabort<wd>c"
   [(set (match_operand:CC 3 "cc_reg_operand" "=x")
        (unspec_volatile:CC [(match_operand 0 "u5bit_cint_operand" "n")
                             (match_operand:GPR 1 "gpc_reg_operand" "r")
                             (match_operand:GPR 2 "gpc_reg_operand" "r")]
-                           UNSPECV_HTM_TABORTXC))]
+                           UNSPECV_HTM_TABORTXC))
+   (set (match_operand:BLK 4) (unspec:BLK [(match_dup 4)] UNSPEC_HTM_FENCE))]
   "TARGET_HTM"
   "tabort<wd>c. %0,%1,%2"
   [(set_attr "type" "htm")
    (set_attr "length" "4")])
 
-(define_insn "tabort<wd>ci"
+(define_expand "tabort<wd>ci"
+  [(parallel
+     [(set (match_operand:CC 3 "cc_reg_operand" "=x")
+          (unspec_volatile:CC [(match_operand 0 "u5bit_cint_operand" "n")
+                               (match_operand:GPR 1 "gpc_reg_operand" "r")
+                               (match_operand 2 "s5bit_cint_operand" "n")]
+                              UNSPECV_HTM_TABORTXCI))
+      (set (match_dup 4) (unspec:BLK [(match_dup 4)] UNSPEC_HTM_FENCE))])]
+  "TARGET_HTM"
+{
+  operands[4] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (operands[4]) = 1;
+})
+
+(define_insn "*tabort<wd>ci"
   [(set (match_operand:CC 3 "cc_reg_operand" "=x")
        (unspec_volatile:CC [(match_operand 0 "u5bit_cint_operand" "n")
                             (match_operand:GPR 1 "gpc_reg_operand" "r")
                             (match_operand 2 "s5bit_cint_operand" "n")]
-                           UNSPECV_HTM_TABORTXCI))]
+                           UNSPECV_HTM_TABORTXCI))
+   (set (match_operand:BLK 4) (unspec:BLK [(match_dup 4)] UNSPEC_HTM_FENCE))]
   "TARGET_HTM"
   "tabort<wd>ci. %0,%1,%2"
   [(set_attr "type" "htm")
    (set_attr "length" "4")])
 
-(define_insn "tbegin"
+(define_expand "tbegin"
+  [(parallel
+     [(set (match_operand:CC 1 "cc_reg_operand" "=x")
+          (unspec_volatile:CC [(match_operand 0 "const_0_to_1_operand" "n")]
+                              UNSPECV_HTM_TBEGIN))
+      (set (match_dup 2) (unspec:BLK [(match_dup 2)] UNSPEC_HTM_FENCE))])]
+  "TARGET_HTM"
+{
+  operands[2] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (operands[2]) = 1;
+})
+
+(define_insn "*tbegin"
   [(set (match_operand:CC 1 "cc_reg_operand" "=x")
        (unspec_volatile:CC [(match_operand 0 "const_0_to_1_operand" "n")]
-                           UNSPECV_HTM_TBEGIN))]
+                           UNSPECV_HTM_TBEGIN))
+   (set (match_operand:BLK 2) (unspec:BLK [(match_dup 2)] UNSPEC_HTM_FENCE))]
   "TARGET_HTM"
   "tbegin. %0"
   [(set_attr "type" "htm")
    (set_attr "length" "4")])
 
-(define_insn "tcheck"
+(define_expand "tcheck"
+  [(parallel
+     [(set (match_operand:CC 0 "cc_reg_operand" "=y")
+          (unspec_volatile:CC [(const_int 0)] UNSPECV_HTM_TCHECK))
+      (set (match_dup 1) (unspec:BLK [(match_dup 1)] UNSPEC_HTM_FENCE))])]
+  "TARGET_HTM"
+{
+  operands[1] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (operands[1]) = 1;
+})
+
+(define_insn "*tcheck"
   [(set (match_operand:CC 0 "cc_reg_operand" "=y")
-       (unspec_volatile:CC [(const_int 0)]
-                           UNSPECV_HTM_TCHECK))]
+       (unspec_volatile:CC [(const_int 0)] UNSPECV_HTM_TCHECK))
+   (set (match_operand:BLK 1) (unspec:BLK [(match_dup 1)] UNSPEC_HTM_FENCE))]
   "TARGET_HTM"
   "tcheck %0"
   [(set_attr "type" "htm")
    (set_attr "length" "4")])
 
-(define_insn "tend"
+(define_expand "tend"
+  [(parallel
+     [(set (match_operand:CC 1 "cc_reg_operand" "=x")
+          (unspec_volatile:CC [(match_operand 0 "const_0_to_1_operand" "n")]
+                              UNSPECV_HTM_TEND))
+      (set (match_dup 2) (unspec:BLK [(match_dup 2)] UNSPEC_HTM_FENCE))])]
+  "TARGET_HTM"
+{
+  operands[2] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (operands[2]) = 1;
+})
+
+(define_insn "*tend"
   [(set (match_operand:CC 1 "cc_reg_operand" "=x")
        (unspec_volatile:CC [(match_operand 0 "const_0_to_1_operand" "n")]
-                           UNSPECV_HTM_TEND))]
+                           UNSPECV_HTM_TEND))
+   (set (match_operand:BLK 2) (unspec:BLK [(match_dup 2)] UNSPEC_HTM_FENCE))]
   "TARGET_HTM"
   "tend. %0"
   [(set_attr "type" "htm")
    (set_attr "length" "4")])
 
-(define_insn "trechkpt"
+(define_expand "trechkpt"
+  [(parallel
+     [(set (match_operand:CC 0 "cc_reg_operand" "=x")
+          (unspec_volatile:CC [(const_int 0)] UNSPECV_HTM_TRECHKPT))
+      (set (match_dup 1) (unspec:BLK [(match_dup 1)] UNSPEC_HTM_FENCE))])]
+  "TARGET_HTM"
+{
+  operands[1] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (operands[1]) = 1;
+})
+
+(define_insn "*trechkpt"
   [(set (match_operand:CC 0 "cc_reg_operand" "=x")
-       (unspec_volatile:CC [(const_int 0)]
-                           UNSPECV_HTM_TRECHKPT))]
+       (unspec_volatile:CC [(const_int 0)] UNSPECV_HTM_TRECHKPT))
+   (set (match_operand:BLK 1) (unspec:BLK [(match_dup 1)] UNSPEC_HTM_FENCE))]
   "TARGET_HTM"
   "trechkpt."
   [(set_attr "type" "htm")
    (set_attr "length" "4")])
 
-(define_insn "treclaim"
+(define_expand "treclaim"
+  [(parallel
+     [(set (match_operand:CC 1 "cc_reg_operand" "=x")
+          (unspec_volatile:CC [(match_operand:SI 0 "gpc_reg_operand" "r")]
+                              UNSPECV_HTM_TRECLAIM))
+      (set (match_dup 2) (unspec:BLK [(match_dup 2)] UNSPEC_HTM_FENCE))])]
+  "TARGET_HTM"
+{
+  operands[2] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (operands[2]) = 1;
+})
+
+(define_insn "*treclaim"
   [(set (match_operand:CC 1 "cc_reg_operand" "=x")
        (unspec_volatile:CC [(match_operand:SI 0 "gpc_reg_operand" "r")]
-                           UNSPECV_HTM_TRECLAIM))]
+                           UNSPECV_HTM_TRECLAIM))
+   (set (match_operand:BLK 2) (unspec:BLK [(match_dup 2)] UNSPEC_HTM_FENCE))]
   "TARGET_HTM"
   "treclaim. %0"
   [(set_attr "type" "htm")
    (set_attr "length" "4")])
 
-(define_insn "tsr"
+(define_expand "tsr"
+  [(parallel
+     [(set (match_operand:CC 1 "cc_reg_operand" "=x")
+          (unspec_volatile:CC [(match_operand 0 "const_0_to_1_operand" "n")]
+                              UNSPECV_HTM_TSR))
+      (set (match_dup 2) (unspec:BLK [(match_dup 2)] UNSPEC_HTM_FENCE))])]
+  "TARGET_HTM"
+{
+  operands[2] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (operands[2]) = 1;
+})
+
+(define_insn "*tsr"
   [(set (match_operand:CC 1 "cc_reg_operand" "=x")
        (unspec_volatile:CC [(match_operand 0 "const_0_to_1_operand" "n")]
-                           UNSPECV_HTM_TSR))]
+                           UNSPECV_HTM_TSR))
+   (set (match_operand:BLK 2) (unspec:BLK [(match_dup 2)] UNSPEC_HTM_FENCE))]
   "TARGET_HTM"
   "tsr. %0"
   [(set_attr "type" "htm")
    (set_attr "length" "4")])
 
-(define_insn "ttest"
+(define_expand "ttest"
+  [(parallel
+     [(set (match_operand:CC 0 "cc_reg_operand" "=x")
+          (unspec_volatile:CC [(const_int 0)] UNSPECV_HTM_TTEST))
+      (set (match_dup 1) (unspec:BLK [(match_dup 1)] UNSPEC_HTM_FENCE))])]
+  "TARGET_HTM"
+{
+  operands[1] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
+  MEM_VOLATILE_P (operands[1]) = 1;
+})
+
+(define_insn "*ttest"
   [(set (match_operand:CC 0 "cc_reg_operand" "=x")
-       (unspec_volatile:CC [(const_int 0)]
-                           UNSPECV_HTM_TTEST))]
+       (unspec_volatile:CC [(const_int 0)] UNSPECV_HTM_TTEST))
+   (set (match_operand:BLK 1) (unspec:BLK [(match_dup 1)] UNSPEC_HTM_FENCE))]
   "TARGET_HTM"
   "tabortwci. 0,1,0"
   [(set_attr "type" "htm")
Index: gcc/config/rs6000/rs6000-c.c
===================================================================
--- gcc/config/rs6000/rs6000-c.c        (revision 227308)
+++ gcc/config/rs6000/rs6000-c.c        (working copy)
@@ -372,7 +372,11 @@
   if ((flags & OPTION_MASK_VSX) != 0)
     rs6000_define_or_undefine_macro (define_p, "__VSX__");
   if ((flags & OPTION_MASK_HTM) != 0)
-    rs6000_define_or_undefine_macro (define_p, "__HTM__");
+    {
+      rs6000_define_or_undefine_macro (define_p, "__HTM__");
+      /* Tell the user that our HTM insn patterns act as memory barriers.  */
+      rs6000_define_or_undefine_macro (define_p, "__TM_FENCE__");
+    }
   if ((flags & OPTION_MASK_P8_VECTOR) != 0)
     rs6000_define_or_undefine_macro (define_p, "__POWER8_VECTOR__");
   if ((flags & OPTION_MASK_QUAD_MEMORY) != 0)
Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi (revision 227308)
+++ gcc/doc/extend.texi (working copy)
@@ -16030,6 +16030,23 @@
 unsigned int __builtin_tsuspend (void)
 @end smallexample
 
+Note that the semantics of the above HTM builtins are required to mimic the
+locking semantics used for critical sections.  Builtins that are used to
+create a new transaction or restart a suspended transaction (specifically any
+builtin that changes the state from non-transactional to transactional) must
+have lock acquisition like semantics while those builtins that end or suspend
+a transaction (ie, changes the state from transactional to non-transactional)
+must have lock release like semantics.  The HTM instructions associated with
+with the builtins inherently provide the correct acquisition and release
+semantics required.  However, the compiler must be prohibited from moving
+loads and stores across the HTM instructions.  This has been accomplished
+by adding memory barriers to the associated HTM instructions.  Earlier versions
+of the compiler did not treat the HTM instructions as memory barriers.
+A @code{__TM_FENCE__} macro has been added, which can be used to determine
+whether the current compiler treats HTM instructions as memory barriers or not.
+This allows the user to explicitly add memory barriers to their code when
+using an older version of the compiler.
+
 The following set of built-in functions are available to gain access
 to the HTM specific special purpose registers.
 


Reply via email to