On Wed, Jun 28, 2017 at 8:29 AM, Richard Biener
<richard.guent...@gmail.com> wrote:
> On Tue, Jun 27, 2017 at 6:46 PM, Bin.Cheng <amker.ch...@gmail.com> wrote:
>> On Tue, Jun 27, 2017 at 3:59 PM, Richard Biener
>> <richard.guent...@gmail.com> wrote:
>>> On June 27, 2017 4:27:17 PM GMT+02:00, "Bin.Cheng" <amker.ch...@gmail.com> 
>>> wrote:
>>>>On Tue, Jun 27, 2017 at 1:58 PM, Richard Biener
>>>><richard.guent...@gmail.com> wrote:
>>>>> On Fri, Jun 23, 2017 at 12:10 PM, Bin.Cheng <amker.ch...@gmail.com>
>>>>wrote:
>>>>>> On Mon, Jun 12, 2017 at 6:02 PM, Bin Cheng <bin.ch...@arm.com>
>>>>wrote:
>>>>>>> Hi,
>>>>>>> I was asked by upstream to split the loop distribution patch into
>>>>small ones.
>>>>>>> It is hard because data structure and algorithm are closely coupled
>>>>together.
>>>>>>> Anyway, this is the patch series with smaller patches.  Basically I
>>>>tried to
>>>>>>> separate data structure and bug-fix changes apart with one as the
>>>>main patch.
>>>>>>> Note I only made necessary code refactoring in order to separate
>>>>patch, apart
>>>>>>> from that, there is no change against the last version.
>>>>>>>
>>>>>>> This is the first patch introducing new internal function
>>>>IFN_LOOP_DIST_ALIAS.
>>>>>>> GCC will distribute loops under condition of this function call.
>>>>>>>
>>>>>>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>>>>> Hi,
>>>>>> I need to update this patch fixing an issue in
>>>>>> vect_loop_dist_alias_call.  The previous patch fails to find some
>>>>>> IFN_LOOP_DIST_ALIAS calls.
>>>>>>
>>>>>> Bootstrap and test in series.  Is it OK?
>>>>>
>>>>> So I wonder if we really need to track ldist_alias_id or if we can do
>>>>sth
>>>>Yes, it is needed because otherwise we probably falsely trying to
>>>>search for IFN_LOOP_DIST_ALIAS for a normal (not from distribution)
>>>>loop.
>>>>
>>>>> more "general", like tracking a copy_of or origin and then directly
>>>>> go to nearest_common_dominator (loop->header, copy_of->header)
>>>>> to find the controlling condition?
>>>>I tend to not record any pointer in loop structure, it can easily go
>>>>dangling for a across passes data structure.
>>>
>>> I didn't mean to record a pointer, just rename your field and make it more 
>>> general.  The common dominator thing shod still work, no?
>> I might not be following.  If we record the original loop->num in the
>> renamed field, nearest_common_dominator can't work because we don't
>> have basic blocks to start the call?  The original loop could be
>> eliminated at several points, for example, instantly after
>> distribution, or folded in vectorizer for other loops distributed from
>> the original loop.
>> BTW, setting the copy_of/origin field in loop_version is not enough
>> for this use case, all generated loops (actually, except the versioned
>> loop) from distribution need to be set.
>
> Of course it would need to be set for all distributed loops.
>
> I'm not sure "loop vanishes" is the case to optimize for though.  If the loop
> is still available then origin->header should work as BB.  If not then can't
> we conclude, for the purpose of IFN_LOOP_DIST_ALIAS, that the whole
> region must be dead?  We still have to identify it of course, but it means
> we can fold stray IFN_LOOP_DIST_ALIAS calls left over after vectorization
> to whatever we like?
>
>>>
>>> As far as memory usage
>>>>is concerned.  I actually don't need a whole integer to record the
>>>>loop num.  I can simply restrict number of distributions in one
>>>>function to at most 256, and record such id in a char field in struct
>>>>loop?  Does this sounds better?
>>>
>>> As said, tracking loop origin sounds useful anyway so I'd rather add and 
>>> use that somehow.
>> To be honest, I don't know.  the current field works like a unique
>> index of distribution operation.  The original loop could be destroyed
>> at different points thus no longer exists, this makes the recorded
>> copy_of/origin less meaningful?
>
> I think we talked about prologue and epilogue loops to be easier identifiable
> as so (and as to what "main" loop).  So lets say we have one "origin" field
> and accompaning flags "origin_is_loop_dist_alias_version",
> "origin_is_main_loop_of_prologue", etc.?  I can't think of the case where
> origin would be two things at the same time (you can always walk up
> the origin tree).
Hi,
Here is the updated patch working in this way.  There is still one
problem with this method.  Considering one distributed loop is
if-converted later, the orig loop for if-converted distributed loop is
different.  Though we can update orig_loop_num, it's inaccurate and
one consequence is we need to walk up dominance tree till entry_block.
Note if orig_loop_num is not shared, we can stop once basic block goes
beyond outer loop.
I didn't introduce flags in this patch, which can be done along with
solving the above problem.

>
> Vanishing origins could also be cleared btw, or we can make the new origin
> the origin of the vanishing origin...
Yes, I did this in distribution patch, so that dominance walking
routine can be simplified.

Bootstrap and test.  Is it OK?

Thanks,
bin

2017-06-27  Bin Cheng  <bin.ch...@arm.com>

* cfgloop.h (struct loop): Add comment.  New field orig_loop_num.
* cfgloopmanip.c (lv_adjust_loop_entry_edge): Comment change.
* internal-fn.c (expand_LOOP_DIST_ALIAS): New function.
* internal-fn.def (LOOP_DIST_ALIAS): New.
* tree-vectorizer.c (fold_loop_vectorized_call): Rename to ...
(fold_loop_internal_call): ... this.
(vect_loop_dist_alias_call): New function.
(set_uid_loop_bbs): Call fold_loop_internal_call.
(vectorize_loops): Fold IFN_LOOP_VECTORIZED and IFN_LOOP_DIST_ALIAS
internal calls.
From 8c3631f383a53653a40bd30c1133ca03f9cc6973 Mon Sep 17 00:00:00 2001
From: Bin Cheng <binch...@e108451-lin.cambridge.arm.com>
Date: Wed, 7 Jun 2017 13:04:03 +0100
Subject: [PATCH 01/13] ifn_loop_dist_alias-20170621.txt

---
 gcc/cfgloop.h         | 13 +++++++-
 gcc/cfgloopmanip.c    |  3 +-
 gcc/internal-fn.c     |  8 +++++
 gcc/internal-fn.def   |  1 +
 gcc/tree-vectorizer.c | 89 +++++++++++++++++++++++++++++++++++++++++++++------
 5 files changed, 103 insertions(+), 11 deletions(-)

diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index a8bec1d..e7ffa23 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -114,7 +114,8 @@ struct GTY ((chain_next ("%h.next"))) control_iv {
 
 /* Structure to hold information for each natural loop.  */
 struct GTY ((chain_next ("%h.next"))) loop {
-  /* Index into loops array.  */
+  /* Index into loops array.  Note indices will never be reused after loop
+     is destroyed.  */
   int num;
 
   /* Number of loop insns.  */
@@ -225,6 +226,16 @@ struct GTY ((chain_next ("%h.next"))) loop {
      builtins.  */
   tree simduid;
 
+  /* In loop optimization, it's common to generate loops from the original
+     loop.  This field records the index of the original loop which can be
+     used to track the original loop from newly generated loops.  This can
+     be done by calling function get_loop (cfun, orig_loop_num).  Note the
+     original loop could be destroyed for various reasons thus no longer
+     exists, as a result, function call to get_loop returns NULL pointer.
+     In this case, this field should not be used and needs to be cleared
+     whenever possible.  */
+  int orig_loop_num;
+
   /* Upper bound on number of iterations of a loop.  */
   struct nb_iter_bound *bounds;
 
diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
index d764ab9..adb2f65 100644
--- a/gcc/cfgloopmanip.c
+++ b/gcc/cfgloopmanip.c
@@ -1653,7 +1653,8 @@ force_single_succ_latches (void)
 
   THEN_PROB is the probability of then branch of the condition.
   ELSE_PROB is the probability of else branch. Note that they may be both
-  REG_BR_PROB_BASE when condition is IFN_LOOP_VECTORIZED.  */
+  REG_BR_PROB_BASE when condition is IFN_LOOP_VECTORIZED or
+  IFN_LOOP_DIST_ALIAS.  */
 
 static basic_block
 lv_adjust_loop_entry_edge (basic_block first_head, basic_block second_head,
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 75fe027..96e40cb 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -2250,6 +2250,14 @@ expand_LOOP_VECTORIZED (internal_fn, gcall *)
   gcc_unreachable ();
 }
 
+/* This should get folded in tree-vectorizer.c.  */
+
+static void
+expand_LOOP_DIST_ALIAS (internal_fn, gcall *)
+{
+  gcc_unreachable ();
+}
+
 /* Expand MASK_LOAD call STMT using optab OPTAB.  */
 
 static void
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index e162d81..79c19fb 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -158,6 +158,7 @@ DEF_INTERNAL_FN (GOMP_SIMD_LAST_LANE, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMD_ORDERED_START, ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMD_ORDERED_END, ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (LOOP_VECTORIZED, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (LOOP_DIST_ALIAS, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (ANNOTATE,  ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (UBSAN_NULL, ECF_LEAF | ECF_NOTHROW, ".R.")
 DEF_INTERNAL_FN (UBSAN_BOUNDS, ECF_LEAF | ECF_NOTHROW, NULL)
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index 0d62c82..33a1f58 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -448,11 +448,11 @@ vect_loop_vectorized_call (struct loop *loop)
   return NULL;
 }
 
-/* Fold LOOP_VECTORIZED internal call G to VALUE and
-   update any immediate uses of it's LHS.  */
+/* Fold loop internal call G like IFN_LOOP_VECTORIZED/IFN_LOOP_DIST_ALIAS
+   to VALUE and update any immediate uses of it's LHS.  */
 
 static void
-fold_loop_vectorized_call (gimple *g, tree value)
+fold_loop_internal_call (gimple *g, tree value)
 {
   tree lhs = gimple_call_lhs (g);
   use_operand_p use_p;
@@ -469,6 +469,60 @@ fold_loop_vectorized_call (gimple *g, tree value)
     }
 }
 
+/* If LOOP has been versioned during loop distribution, return the gurading
+   internal call.  */
+
+static gimple *
+vect_loop_dist_alias_call (struct loop *loop)
+{
+  basic_block bb;
+  basic_block entry;
+  struct loop *outer, *orig;
+  gimple_stmt_iterator gsi;
+  gimple *g;
+
+  if (loop->orig_loop_num == 0)
+    return NULL;
+
+  orig = get_loop (cfun, loop->orig_loop_num);
+  if (orig == NULL)
+    {
+      /* The original loop is somehow destroyed.  Clear the information.  */
+      loop->orig_loop_num = 0;
+      return NULL;
+    }
+
+  if (loop != orig)
+    bb = nearest_common_dominator (CDI_DOMINATORS, loop->header, orig->header);
+  else
+    bb = loop_preheader_edge (loop)->src;
+
+  outer = bb->loop_father;
+  entry = ENTRY_BLOCK_PTR_FOR_FN (cfun);
+
+  /* Look upward in dominance tree.  */
+  for (; bb != entry && flow_bb_inside_loop_p (outer, bb);
+       bb = get_immediate_dominator (CDI_DOMINATORS, bb))
+    {
+      g = last_stmt (bb);
+      if (g == NULL || gimple_code (g) != GIMPLE_COND)
+	continue;
+
+      gsi = gsi_for_stmt (g);
+      gsi_prev (&gsi);
+      if (gsi_end_p (gsi))
+	continue;
+
+      g = gsi_stmt (gsi);
+      /* The guarding internal function call must have the same distribution
+	 alias id.  */
+      if (gimple_call_internal_p (g, IFN_LOOP_DIST_ALIAS)
+	  && (tree_to_shwi (gimple_call_arg (g, 0)) == loop->orig_loop_num))
+	return g;
+    }
+  return NULL;
+}
+
 /* Set the uids of all the statements in basic blocks inside loop
    represented by LOOP_VINFO. LOOP_VECTORIZED_CALL is the internal
    call guarding the loop which has been if converted.  */
@@ -494,7 +548,7 @@ set_uid_loop_bbs (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
 	{
 	  arg = gimple_call_arg (g, 0);
 	  get_loop (cfun, tree_to_shwi (arg))->dont_vectorize = true;
-	  fold_loop_vectorized_call (g, boolean_false_node);
+	  fold_loop_internal_call (g, boolean_false_node);
 	}
     }
   bbs = get_loop_body (scalar_loop);
@@ -595,7 +649,7 @@ vectorize_loops (void)
     else
       {
 	loop_vec_info loop_vinfo, orig_loop_vinfo;
-	gimple *loop_vectorized_call;
+	gimple *loop_vectorized_call, *loop_dist_alias_call;
        try_vectorize:
 	if (!((flag_tree_loop_vectorize
 	       && optimize_loop_nest_for_speed_p (loop))
@@ -603,6 +657,7 @@ vectorize_loops (void)
 	  continue;
 	orig_loop_vinfo = NULL;
 	loop_vectorized_call = vect_loop_vectorized_call (loop);
+	loop_dist_alias_call = vect_loop_dist_alias_call (loop);
        vectorize_epilogue:
 	vect_location = find_loop_location (loop);
         if (LOCATION_LOCUS (vect_location) != UNKNOWN_LOCATION
@@ -652,8 +707,8 @@ vectorize_loops (void)
 		  {
 		    dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
 				     "basic block vectorized\n");
-		    fold_loop_vectorized_call (loop_vectorized_call,
-					       boolean_true_node);
+		    fold_loop_internal_call (loop_vectorized_call,
+					     boolean_true_node);
 		    loop_vectorized_call = NULL;
 		    ret |= TODO_cleanup_cfg;
 		  }
@@ -706,10 +761,17 @@ vectorize_loops (void)
 
 	if (loop_vectorized_call)
 	  {
-	    fold_loop_vectorized_call (loop_vectorized_call, boolean_true_node);
+	    fold_loop_internal_call (loop_vectorized_call, boolean_true_node);
 	    loop_vectorized_call = NULL;
 	    ret |= TODO_cleanup_cfg;
 	  }
+	if (loop_dist_alias_call)
+	  {
+	    tree value = gimple_call_arg (loop_dist_alias_call, 1);
+	    fold_loop_internal_call (loop_dist_alias_call, value);
+	    loop_dist_alias_call = NULL;
+	    ret |= TODO_cleanup_cfg;
+	  }
 
 	if (new_loop)
 	  {
@@ -741,7 +803,16 @@ vectorize_loops (void)
 	    gimple *g = vect_loop_vectorized_call (loop);
 	    if (g)
 	      {
-		fold_loop_vectorized_call (g, boolean_false_node);
+		fold_loop_internal_call (g, boolean_false_node);
+		ret |= TODO_cleanup_cfg;
+		g = NULL;
+	      }
+	    else
+	      g = vect_loop_dist_alias_call (loop);
+
+	    if (g)
+	      {
+		fold_loop_internal_call (g, boolean_false_node);
 		ret |= TODO_cleanup_cfg;
 	      }
 	  }
-- 
1.9.1

Reply via email to