On 01/06/14 13:40, Richard Henderson wrote:
On 12/19/2013 11:06 AM, Richard Biener wrote:
Aldy Hernandez <al...@redhat.com> wrote:
I'd still like to catch the common cases, like I do with this patch.

Perhaps we move this code to the .tmmark pass and handle the
uninstrumented case.  rth?

tmmark is way way later than you'd want.  I believe that ipa_tm is the right
place.  That's where we generate clones.  The clones know a-priori that they're
called within a transaction and thus all internal transations may be
eliminated.  And thus any inlining that would happen after ipa_tm would
properly inline the clone, and thus no more need be done.

I have a patch (attached for reference) removing the nested transactions while we are creating the clones (as suggested), but the uninstrumented code path complicates things. I'm afraid I don't have any good news.

Consider this:

inline void f() {
  __transaction_atomic {
      a = 12345;
  }
}

void g() {
  __transaction_atomic {
    f();
  }
}

The problem is that when we add the uninstrumented code path later in tmipa, we end up with the following for g():

g ()
{
  <bb 2>:
  __transaction_atomic  // SUBCODE=[ GTMA_HAVE_LOAD GTMA_HAVE_STORE ]
  goto <bb 3>;

  <bb 5>:
  f ();            /* <---- uninstrumented path */
  __builtin__ITM_commitTransaction ();
  goto <bb 4>;

  <bb 3>:
  f (); [tm-clone]    /* <---- instrumented path */
  __builtin__ITM_commitTransaction ();

  <bb 4>:
  return;

}

Since we only removed the transaction in the clone of f(), plain regular f() will still have the additional transaction, so inlining will still yield a g() with a nested transaction in the uninstrumented path.

So we're back to square one, needing a separate pass to remove the nested transactions, and this pass will unfortunately have to deal with the uninstrumented/instrumented paths.

This has taken longer to fix than I expected, so I'm going to put this aside for now and concentrate on some P1-P2's. For the record, since you don't like this pass in the .tmmark pass which is WAY late, can we have a tree pass right after the IPA passes (thus after inlining)?

I'll add some notes to the PR so we can pick this up later.

Aldy
diff --git a/gcc/trans-mem.c b/gcc/trans-mem.c
index fe6dc28..59b589c 100644
--- a/gcc/trans-mem.c
+++ b/gcc/trans-mem.c
@@ -1,5 +1,7 @@
 /* Passes for transactional memory support.
    Copyright (C) 2008-2014 Free Software Foundation, Inc.
+   Contributed by Richard Henderson <r...@redhat.com> and
+                 Aldy Hernandez <al...@redhat.com>.
 
    This file is part of GCC.
 
@@ -4106,8 +4108,8 @@ maybe_push_queue (struct cgraph_node *node,
    code path.  QUEUE are the basic blocks inside the transaction
    represented in REGION.
 
-   Later in split_code_paths() we will add the conditional to choose
-   between the two alternatives.  */
+   Later in the tmmark pass (expand_transaction) we will add the
+   conditional to choose between the two alternatives.  */
 
 static void
 ipa_uninstrument_transaction (struct tm_region *region,
@@ -4192,29 +4194,11 @@ ipa_tm_scan_calls_transaction (struct tm_ipa_cg_data *d,
       bbs = get_tm_region_blocks (r->entry_block, r->exit_blocks, NULL,
                                  d->transaction_blocks_normal, false);
 
-      // Generate the uninstrumented code path for this transaction.
-      ipa_uninstrument_transaction (r, bbs);
-
       FOR_EACH_VEC_ELT (bbs, i, bb)
        ipa_tm_scan_calls_block (callees_p, bb, false);
 
       bbs.release ();
     }
-
-  // ??? copy_bbs should maintain cgraph edges for the blocks as it is
-  // copying them, rather than forcing us to do this externally.
-  rebuild_cgraph_edges ();
-
-  // ??? In ipa_uninstrument_transaction we don't try to update dominators
-  // because copy_bbs doesn't return a VEC like iterate_fix_dominators expects.
-  // Instead, just release dominators here so update_ssa recomputes them.
-  free_dominance_info (CDI_DOMINATORS);
-
-  // When building the uninstrumented code path, copy_bbs will have invoked
-  // create_new_def_for starting an "ssa update context".  There is only one
-  // instance of this context, so resolve ssa updates before moving on to
-  // the next function.
-  update_ssa (TODO_update_ssa);
 }
 
 /* Scan all calls in NODE as if this is the transactional clone,
@@ -4890,10 +4874,11 @@ ipa_tm_create_version_alias (struct cgraph_node *node, 
void *data)
   return false;
 }
 
-/* Create a copy of the function (possibly declaration only) of OLD_NODE,
-   appropriate for the transactional clone.  */
+/* Create a copy of the function (possibly declaration only) of
+   OLD_NODE, appropriate for the transactional clone.  Returns the
+   cgraph node for the newly created clone.  */
 
-static void
+static struct cgraph_node *
 ipa_tm_create_version (struct cgraph_node *old_node)
 {
   tree new_decl, old_decl, tm_name;
@@ -4947,13 +4932,12 @@ ipa_tm_create_version (struct cgraph_node *old_node)
     ipa_tm_mark_forced_by_abi_node (new_node);
 
   /* Do the same thing, but for any aliases of the original node.  */
-  {
-    struct create_version_alias_info data;
-    data.old_node = old_node;
-    data.new_decl = new_decl;
-    cgraph_for_node_and_aliases (old_node, ipa_tm_create_version_alias,
-                                &data, true);
-  }
+  struct create_version_alias_info data;
+  data.old_node = old_node;
+  data.new_decl = new_decl;
+  cgraph_for_node_and_aliases (old_node, ipa_tm_create_version_alias,
+                              &data, true);
+  return new_node;
 }
 
 /* Construct a call to TM_IRREVOCABLE and insert it at the beginning of BB.  */
@@ -5275,16 +5259,14 @@ ipa_tm_transform_transaction (struct cgraph_node *node)
   pop_cfun ();
 }
 
-/* Transform the calls within the transactional clone of NODE.  */
+/* Transform the calls within the transactional clone of NODE.  D is
+   the IPA auxiliary data for the NODE.  */
 
 static void
-ipa_tm_transform_clone (struct cgraph_node *node)
+ipa_tm_transform_clone (struct cgraph_node *node, struct tm_ipa_cg_data *d)
 {
-  struct tm_ipa_cg_data *d;
   bool need_ssa_rename;
 
-  d = get_cg_data (&node, true);
-
   /* If this function makes no calls and has no irrevocable blocks,
      then there's nothing to do.  */
   /* ??? Remove non-aborting top-level transactions.  */
@@ -5305,6 +5287,100 @@ ipa_tm_transform_clone (struct cgraph_node *node)
   pop_cfun ();
 }
 
+/* Callback for expand_regions.
+
+   Given a transaction in REGION, remove the GIMPLE_TRANSACTION and
+   its corresponding BUILT_IN_TM_COMMIT*.  The tm_region itself is
+   removed by the caller.  */
+
+static void *
+remove_transaction (struct tm_region *region, void *data ATTRIBUTE_UNUSED)
+{
+  // Remove the transaction.
+  if (dump_file)
+    fprintf (dump_file, "Folding unnecessary inner transaction in BB%d\n",
+            gimple_bb (region->transaction_stmt)->index);
+  gimple_stmt_iterator gsi = gsi_for_stmt (region->transaction_stmt);
+  gsi_remove (&gsi, true);
+
+  // Find the corresponding BUILT_IN_TM_COMMIT* and remove it as well.
+  if (region->exit_blocks)
+    {
+      bitmap_iterator bi;
+      unsigned i;
+
+      EXECUTE_IF_SET_IN_BITMAP (region->exit_blocks, 0, i, bi)
+       {
+         gimple_stmt_iterator gsi
+           = gsi_last_bb (BASIC_BLOCK_FOR_FN (cfun, i));
+         gimple stmt = gsi_stmt (gsi);
+         if (gimple_code (stmt) != GIMPLE_CALL)
+           continue;
+         tree fndecl = gimple_call_fndecl (stmt);
+         if (fndecl
+             && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL
+             && (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_TM_COMMIT
+                 || DECL_FUNCTION_CODE (fndecl) == BUILT_IN_TM_COMMIT_EH))
+           {
+             /* We technically don't need to do this because our
+                caller (ipa_tm_execute) is brain dead and calls
+                rebuild_cgraph_edges, but it's proper form.
+                Eventually we'd like not to call
+                rebuild_cgraph_edges() and have everyone clean up
+                their own mess, like we do here.  */
+             struct cgraph_node *cfun_node
+               = cgraph_get_node (current_function_decl);
+             cgraph_remove_edge (cgraph_edge (cfun_node, stmt));
+
+             tree vdef = gimple_vdef (stmt);
+             tree vuse = gimple_vuse (stmt);
+             if (vdef && vuse)
+               replace_uses_by (vdef, vuse);
+             gsi_remove (&gsi, true);
+           }
+       }
+    }
+  return NULL;
+}
+
+/* A helper function for flatten_nested_transactions.
+
+   Starting at TOP, go through all the transactional regions and
+   remove any nested transactions that are redundant, removing both
+   the start/end of a transaction.  */
+
+static void
+flatten_nested_transactions_1 (struct tm_region *top)
+{
+  for (struct tm_region *region = top; region; region = region->next)
+    {
+      if (region->inner)
+       flatten_nested_transactions_1 (region->inner);
+
+      /* Transactions which have no explicit aborts can be removed.  */
+      unsigned int sub = gimple_transaction_subcode (region->transaction_stmt);
+      if (!(sub & GTMA_HAVE_ABORT))
+       expand_regions (region, remove_transaction, NULL, false);
+    }
+}
+
+/* Remove any transactions in the given CLONE that are redundant.
+   This is done by removing the GIMPLE_TRANSACTION +
+   BUILT_IN_TM_COMMIT* pairs.  */
+
+static void
+flatten_nested_transactions (struct cgraph_node *clone)
+{
+  /* Initialize all_tm_regions and friends for this clone.  */
+  push_cfun (DECL_STRUCT_FUNCTION (clone->decl));
+  calculate_dominance_info (CDI_DOMINATORS);
+  tm_region_init (NULL);
+
+  flatten_nested_transactions_1 (all_tm_regions);
+
+  pop_cfun ();
+}
+
 /* Main entry point for the transactional memory IPA pass.  */
 
 static unsigned int
@@ -5352,14 +5428,12 @@ ipa_tm_execute (void)
 
        push_cfun (DECL_STRUCT_FUNCTION (node->decl));
        calculate_dominance_info (CDI_DOMINATORS);
-
        tm_region_init (NULL);
        if (all_tm_regions)
          {
            d = get_cg_data (&node, true);
 
-           /* Scan for calls that are in each transaction, and
-              generate the uninstrumented code path.  */
+           /* Scan for calls that are in each transaction.  */
            ipa_tm_scan_calls_transaction (d, &tm_callees);
 
            /* Put it in the worklist so we can scan the function
@@ -5537,9 +5611,49 @@ ipa_tm_execute (void)
        doit = true;
 
       if (doit)
-       ipa_tm_create_version (node);
+       {
+         struct cgraph_node *new_node = ipa_tm_create_version (node);
+         flatten_nested_transactions (new_node);
+       }
     }
 
+  FOR_EACH_DEFINED_FUNCTION (node)
+    if (node->lowered
+       && cgraph_function_body_availability (node) >= AVAIL_OVERWRITABLE)
+      {
+       push_cfun (DECL_STRUCT_FUNCTION (node->decl));
+       calculate_dominance_info (CDI_DOMINATORS);
+       tm_region_init (NULL);
+
+       for (struct tm_region *r = all_tm_regions; r; r = r->next)
+         {
+           struct tm_ipa_cg_data *d = get_cg_data (&node, true);
+           vec<basic_block> bbs
+             = get_tm_region_blocks (r->entry_block, r->exit_blocks, NULL,
+                                     d->transaction_blocks_normal, false);
+           ipa_uninstrument_transaction (r, bbs);
+         }
+
+       // ??? copy_bbs should maintain cgraph edges for the blocks as
+       // it is copying them, rather than forcing us to do this
+       // externally.
+       rebuild_cgraph_edges ();
+
+       // ??? In ipa_uninstrument_transaction we don't try to update
+       // dominators because copy_bbs doesn't return a VEC like
+       // iterate_fix_dominators expects.  Instead, just release
+       // dominators here so update_ssa recomputes them.
+       free_dominance_info (CDI_DOMINATORS);
+
+       // When building the uninstrumented code path, copy_bbs will
+       // have invoked create_new_def_for starting an "ssa update
+       // context".  There is only one instance of this context, so
+       // resolve ssa updates before moving on to the next function.
+       update_ssa (TODO_update_ssa);
+
+       pop_cfun ();
+      }
+
   /* Redirect calls to the new clones, and insert irrevocable marks.  */
   for (i = 0; i < tm_callees.length (); ++i)
     {
@@ -5548,7 +5662,7 @@ ipa_tm_execute (void)
        {
          d = get_cg_data (&node, true);
          if (d->clone)
-           ipa_tm_transform_clone (node);
+           ipa_tm_transform_clone (node, d);
        }
     }
   FOR_EACH_DEFINED_FUNCTION (node)

Reply via email to