The problem here is that -flto cannot equate the instrumentation
functions being generated with a user supplied version of the library
functions. This would happen if the user tried to link a transactional
program with libitm with -flto (as in -fwhole-program, etc).
This is an easy problem to fix, just by changing DEF_TM_BUILTIN to set
BOTH_P so the __builtin_ITM_* variants and _ITM_* are considered the same.
In doing so, I noticed that there are a myriad of _ITM_* builtins that
have incorrect prototypes as defined in the builtins*.def files. So
using libitm/libitm.h fails miserably because of mismatches. The
attached patch fixes all problems, and the PR.
However...
Looking at the generated code, I've noticed that LTO isn't inlining the
TM builtins as one would expect. This is a two-fold problem:
First, LTO streaming happens before the final TM lowering (tmmark pass),
so there is nothing to inline. The instrumented function calls
(_ITM_RU4, et al), aren't generated until after LTO.
Second, it seems that by design, LTO prefers builtins to user-provided
versions of them. In particular, lto_symtab_prevailing_decl()
stipulates that builtins are their own prevailing decl. So even if we
lowered TM before LTO streaming, user provided builtins wouldn't be
preferred (and thus inlined) as we would expect into application code.
Though I don't think fixing LTO of instrumented TM code is 4.7 material,
I thought I'd write this up for the record.
With regards to the first problem, the reason we lower TM so late (in
contrast to openmp) is because we were hoping to get good optimization
of the memory operations before fudging them all up. So at some point
we should decide what's the biggest gain, deeper optimization of the
memory operations or the ability to inline the instrumentation. Or
perhaps, whether we can simply move the LTO streaming point under
certain conditions and get both.
With regards to the second problem, I suppose I should tackle #1 first,
but I would be curious if the LTO experts could weigh in here.
The attached patch fixes the ICE in the PR, though it won't do what the
user ultimately wants to do, given the limitations described. Perhaps
we could create another PR and tag it with an enhancement request.
OK pending tests?
Aldy
p.s. Oh, and btw, does this all make sense?
PR lto/51698
* builtin-types.def: (BT_CONST_DOUBLE_PTR): New.
(BT_FN_VOID_PTR_CONST_PTR_SIZE): New.
(BT_FN_VOID_PTR_INT_SIZE): New.
(BT_FN_DOUBLE_VPTR): Remove.
(BT_FN_DOUBLE_CONST_DOUBLE_PTR): New.
* gtm-builtins.def (_ITM_abortTransaction): Set return type to
void.
(_ITM_changeTransactionMode): Same.
(_ITM_memmoveRtWt): Change return type to void.
(_ITM_memcpyRtWt): Same.
(_ITM_memsetW): Same.
(_ITM_RaRD): Change types to double.
(_ITM_RD): Same.
(_ITM_RaWD): Same.
(_ITM_RfWD): Same.
* builtins.def (DEF_TM_BUILTIN): Set BOTH_P to true.
Index: testsuite/gcc.dg/lto/trans-mem-4_0.c
===================================================================
--- testsuite/gcc.dg/lto/trans-mem-4_0.c (revision 0)
+++ testsuite/gcc.dg/lto/trans-mem-4_0.c (revision 0)
@@ -0,0 +1,11 @@
+/* { dg-lto-options {{-flto -fgnu-tm}} } */
+/* { dg-lto-do link } */
+
+extern void foo() __attribute__((transaction_safe));
+
+int main()
+{
+ __transaction_atomic {
+ foo();
+ }
+}
Index: testsuite/gcc.dg/lto/trans-mem-4_1.c
===================================================================
--- testsuite/gcc.dg/lto/trans-mem-4_1.c (revision 0)
+++ testsuite/gcc.dg/lto/trans-mem-4_1.c (revision 0)
@@ -0,0 +1,12 @@
+__attribute__((transaction_safe))
+void foo()
+{
+}
+
+unsigned int _ITM_beginTransaction(unsigned int prop, ...)
+{
+}
+
+void __builtin__ITM_commitTransaction (void)
+{
+}
Index: builtin-types.def
===================================================================
--- builtin-types.def (revision 183474)
+++ builtin-types.def (working copy)
@@ -103,6 +103,10 @@ DEF_PRIMITIVE_TYPE (BT_PTRMODE, (*lang_h
DEF_PRIMITIVE_TYPE (BT_INT_PTR, integer_ptr_type_node)
DEF_PRIMITIVE_TYPE (BT_FLOAT_PTR, float_ptr_type_node)
DEF_PRIMITIVE_TYPE (BT_DOUBLE_PTR, double_ptr_type_node)
+DEF_PRIMITIVE_TYPE (BT_CONST_DOUBLE_PTR,
+ build_pointer_type
+ (build_qualified_type (double_type_node,
+ TYPE_QUAL_CONST)))
DEF_PRIMITIVE_TYPE (BT_LONGDOUBLE_PTR, long_double_ptr_type_node)
DEF_PRIMITIVE_TYPE (BT_PID, pid_type_node)
DEF_PRIMITIVE_TYPE (BT_SIZE, size_type_node)
@@ -342,10 +346,14 @@ DEF_FUNCTION_TYPE_3 (BT_FN_INT_CONST_STR
BT_INT, BT_CONST_STRING, BT_CONST_STRING, BT_SIZE)
DEF_FUNCTION_TYPE_3 (BT_FN_PTR_PTR_CONST_PTR_SIZE,
BT_PTR, BT_PTR, BT_CONST_PTR, BT_SIZE)
+DEF_FUNCTION_TYPE_3 (BT_FN_VOID_PTR_CONST_PTR_SIZE,
+ BT_VOID, BT_PTR, BT_CONST_PTR, BT_SIZE)
DEF_FUNCTION_TYPE_3 (BT_FN_INT_CONST_PTR_CONST_PTR_SIZE,
BT_INT, BT_CONST_PTR, BT_CONST_PTR, BT_SIZE)
DEF_FUNCTION_TYPE_3 (BT_FN_PTR_PTR_INT_SIZE,
BT_PTR, BT_PTR, BT_INT, BT_SIZE)
+DEF_FUNCTION_TYPE_3 (BT_FN_VOID_PTR_INT_SIZE,
+ BT_VOID, BT_PTR, BT_INT, BT_SIZE)
DEF_FUNCTION_TYPE_3 (BT_FN_VOID_PTR_INT_INT,
BT_VOID, BT_PTR, BT_INT, BT_INT)
DEF_FUNCTION_TYPE_3 (BT_FN_VOID_CONST_PTR_PTR_SIZE,
@@ -539,7 +547,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_I2_VPTR, BT_I
DEF_FUNCTION_TYPE_1 (BT_FN_I4_VPTR, BT_I4, BT_VOLATILE_PTR)
DEF_FUNCTION_TYPE_1 (BT_FN_I8_VPTR, BT_I8, BT_VOLATILE_PTR)
DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT_VPTR, BT_FLOAT, BT_VOLATILE_PTR)
-DEF_FUNCTION_TYPE_1 (BT_FN_DOUBLE_VPTR, BT_DOUBLE, BT_VOLATILE_PTR)
+DEF_FUNCTION_TYPE_1 (BT_FN_DOUBLE_CONST_DOUBLE_PTR, BT_DOUBLE, BT_DOUBLE_PTR)
DEF_FUNCTION_TYPE_1 (BT_FN_LDOUBLE_VPTR, BT_LONGDOUBLE, BT_VOLATILE_PTR)
DEF_FUNCTION_TYPE_2 (BT_FN_VOID_VPTR_I1, BT_VOID, BT_VOLATILE_PTR, BT_I1)
Index: builtins.def
===================================================================
--- builtins.def (revision 183474)
+++ builtins.def (working copy)
@@ -147,7 +147,7 @@ along with GCC; see the file COPYING3.
#undef DEF_TM_BUILTIN
#define DEF_TM_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE, \
- false, true, true, ATTRS, false, flag_tm)
+ true, true, true, ATTRS, false, flag_tm)
/* Define an attribute list for math functions that are normally
"impure" because some of them may write into global memory for
Index: gtm-builtins.def
===================================================================
--- gtm-builtins.def (revision 183474)
+++ gtm-builtins.def (working copy)
@@ -6,16 +6,16 @@ DEF_TM_BUILTIN (BUILT_IN_TM_COMMIT, "_IT
DEF_TM_BUILTIN (BUILT_IN_TM_COMMIT_EH, "_ITM_commitTransactionEH",
BT_FN_VOID_PTR, ATTR_TM_NOTHROW_LIST)
DEF_TM_BUILTIN (BUILT_IN_TM_ABORT, "_ITM_abortTransaction",
- BT_FN_INT, ATTR_TM_NORETURN_NOTHROW_LIST)
+ BT_FN_VOID_INT, ATTR_TM_NORETURN_NOTHROW_LIST)
DEF_TM_BUILTIN (BUILT_IN_TM_IRREVOCABLE, "_ITM_changeTransactionMode",
- BT_FN_INT_INT, ATTR_TM_NOTHROW_LIST)
+ BT_FN_VOID_INT, ATTR_TM_NOTHROW_LIST)
DEF_TM_BUILTIN (BUILT_IN_TM_MEMCPY, "_ITM_memcpyRtWt",
- BT_FN_PTR_PTR_CONST_PTR_SIZE, ATTR_TM_TMPURE_NOTHROW_LIST)
+ BT_FN_VOID_PTR_CONST_PTR_SIZE, ATTR_TM_TMPURE_NOTHROW_LIST)
DEF_TM_BUILTIN (BUILT_IN_TM_MEMMOVE, "_ITM_memmoveRtWt",
- BT_FN_PTR_PTR_CONST_PTR_SIZE, ATTR_TM_TMPURE_NOTHROW_LIST)
+ BT_FN_VOID_PTR_CONST_PTR_SIZE, ATTR_TM_TMPURE_NOTHROW_LIST)
DEF_TM_BUILTIN (BUILT_IN_TM_MEMSET, "_ITM_memsetW",
- BT_FN_PTR_PTR_INT_SIZE, ATTR_TM_TMPURE_NOTHROW_LIST)
+ BT_FN_VOID_PTR_INT_SIZE, ATTR_TM_TMPURE_NOTHROW_LIST)
DEF_TM_BUILTIN (BUILT_IN_TM_GETTMCLONE_IRR, "_ITM_getTMCloneOrIrrevocable",
BT_FN_PTR_PTR, ATTR_TM_CONST_NOTHROW_LIST)
@@ -172,13 +172,13 @@ DEF_TM_BUILTIN (BUILT_IN_TM_LOAD_RFW_FLO
BT_FN_FLOAT_VPTR, ATTR_TM_PURE_TMPURE_NOTHROW_LIST)
DEF_TM_BUILTIN (BUILT_IN_TM_LOAD_DOUBLE, "_ITM_RD",
- BT_FN_DOUBLE_VPTR, ATTR_TM_PURE_TMPURE_NOTHROW_LIST)
+ BT_FN_DOUBLE_CONST_DOUBLE_PTR, ATTR_TM_PURE_TMPURE_NOTHROW_LIST)
DEF_TM_BUILTIN (BUILT_IN_TM_LOAD_RAR_DOUBLE, "_ITM_RaRD",
- BT_FN_FLOAT_VPTR, ATTR_TM_PURE_TMPURE_NOTHROW_LIST)
+ BT_FN_DOUBLE_CONST_DOUBLE_PTR, ATTR_TM_PURE_TMPURE_NOTHROW_LIST)
DEF_TM_BUILTIN (BUILT_IN_TM_LOAD_RAW_DOUBLE, "_ITM_RaWD",
- BT_FN_FLOAT_VPTR, ATTR_TM_PURE_TMPURE_NOTHROW_LIST)
+ BT_FN_DOUBLE_CONST_DOUBLE_PTR, ATTR_TM_PURE_TMPURE_NOTHROW_LIST)
DEF_TM_BUILTIN (BUILT_IN_TM_LOAD_RFW_DOUBLE, "_ITM_RfWD",
- BT_FN_FLOAT_VPTR, ATTR_TM_PURE_TMPURE_NOTHROW_LIST)
+ BT_FN_DOUBLE_CONST_DOUBLE_PTR, ATTR_TM_PURE_TMPURE_NOTHROW_LIST)
/* These stubs should get defined in the backend if applicable. */
DEF_BUILTIN_STUB (BUILT_IN_TM_LOAD_M64, "__builtin__ITM_RM64")