On Wed, Jun 28, 2017 at 8:29 AM, Richard Biener <richard.guent...@gmail.com> wrote: > On Tue, Jun 27, 2017 at 6:46 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >> On Tue, Jun 27, 2017 at 3:59 PM, Richard Biener >> <richard.guent...@gmail.com> wrote: >>> On June 27, 2017 4:27:17 PM GMT+02:00, "Bin.Cheng" <amker.ch...@gmail.com> >>> wrote: >>>>On Tue, Jun 27, 2017 at 1:58 PM, Richard Biener >>>><richard.guent...@gmail.com> wrote: >>>>> On Fri, Jun 23, 2017 at 12:10 PM, Bin.Cheng <amker.ch...@gmail.com> >>>>wrote: >>>>>> On Mon, Jun 12, 2017 at 6:02 PM, Bin Cheng <bin.ch...@arm.com> >>>>wrote: >>>>>>> Hi, >>>>>>> I was asked by upstream to split the loop distribution patch into >>>>small ones. >>>>>>> It is hard because data structure and algorithm are closely coupled >>>>together. >>>>>>> Anyway, this is the patch series with smaller patches. Basically I >>>>tried to >>>>>>> separate data structure and bug-fix changes apart with one as the >>>>main patch. >>>>>>> Note I only made necessary code refactoring in order to separate >>>>patch, apart >>>>>>> from that, there is no change against the last version. >>>>>>> >>>>>>> This is the first patch introducing new internal function >>>>IFN_LOOP_DIST_ALIAS. >>>>>>> GCC will distribute loops under condition of this function call. >>>>>>> >>>>>>> Bootstrap and test on x86_64 and AArch64. Is it OK? >>>>>> Hi, >>>>>> I need to update this patch fixing an issue in >>>>>> vect_loop_dist_alias_call. The previous patch fails to find some >>>>>> IFN_LOOP_DIST_ALIAS calls. >>>>>> >>>>>> Bootstrap and test in series. Is it OK? >>>>> >>>>> So I wonder if we really need to track ldist_alias_id or if we can do >>>>sth >>>>Yes, it is needed because otherwise we probably falsely trying to >>>>search for IFN_LOOP_DIST_ALIAS for a normal (not from distribution) >>>>loop. >>>> >>>>> more "general", like tracking a copy_of or origin and then directly >>>>> go to nearest_common_dominator (loop->header, copy_of->header) >>>>> to find the controlling condition? >>>>I tend to not record any pointer in loop structure, it can easily go >>>>dangling for a across passes data structure. >>> >>> I didn't mean to record a pointer, just rename your field and make it more >>> general. The common dominator thing shod still work, no? >> I might not be following. If we record the original loop->num in the >> renamed field, nearest_common_dominator can't work because we don't >> have basic blocks to start the call? The original loop could be >> eliminated at several points, for example, instantly after >> distribution, or folded in vectorizer for other loops distributed from >> the original loop. >> BTW, setting the copy_of/origin field in loop_version is not enough >> for this use case, all generated loops (actually, except the versioned >> loop) from distribution need to be set. > > Of course it would need to be set for all distributed loops. > > I'm not sure "loop vanishes" is the case to optimize for though. If the loop > is still available then origin->header should work as BB. If not then can't > we conclude, for the purpose of IFN_LOOP_DIST_ALIAS, that the whole > region must be dead? We still have to identify it of course, but it means > we can fold stray IFN_LOOP_DIST_ALIAS calls left over after vectorization > to whatever we like? > >>> >>> As far as memory usage >>>>is concerned. I actually don't need a whole integer to record the >>>>loop num. I can simply restrict number of distributions in one >>>>function to at most 256, and record such id in a char field in struct >>>>loop? Does this sounds better? >>> >>> As said, tracking loop origin sounds useful anyway so I'd rather add and >>> use that somehow. >> To be honest, I don't know. the current field works like a unique >> index of distribution operation. The original loop could be destroyed >> at different points thus no longer exists, this makes the recorded >> copy_of/origin less meaningful? > > I think we talked about prologue and epilogue loops to be easier identifiable > as so (and as to what "main" loop). So lets say we have one "origin" field > and accompaning flags "origin_is_loop_dist_alias_version", > "origin_is_main_loop_of_prologue", etc.? I can't think of the case where > origin would be two things at the same time (you can always walk up > the origin tree). Hi, Here is the updated patch working in this way. There is still one problem with this method. Considering one distributed loop is if-converted later, the orig loop for if-converted distributed loop is different. Though we can update orig_loop_num, it's inaccurate and one consequence is we need to walk up dominance tree till entry_block. Note if orig_loop_num is not shared, we can stop once basic block goes beyond outer loop. I didn't introduce flags in this patch, which can be done along with solving the above problem.
> > Vanishing origins could also be cleared btw, or we can make the new origin > the origin of the vanishing origin... Yes, I did this in distribution patch, so that dominance walking routine can be simplified. Bootstrap and test. Is it OK? Thanks, bin 2017-06-27 Bin Cheng <bin.ch...@arm.com> * cfgloop.h (struct loop): Add comment. New field orig_loop_num. * cfgloopmanip.c (lv_adjust_loop_entry_edge): Comment change. * internal-fn.c (expand_LOOP_DIST_ALIAS): New function. * internal-fn.def (LOOP_DIST_ALIAS): New. * tree-vectorizer.c (fold_loop_vectorized_call): Rename to ... (fold_loop_internal_call): ... this. (vect_loop_dist_alias_call): New function. (set_uid_loop_bbs): Call fold_loop_internal_call. (vectorize_loops): Fold IFN_LOOP_VECTORIZED and IFN_LOOP_DIST_ALIAS internal calls.
From 8c3631f383a53653a40bd30c1133ca03f9cc6973 Mon Sep 17 00:00:00 2001 From: Bin Cheng <binch...@e108451-lin.cambridge.arm.com> Date: Wed, 7 Jun 2017 13:04:03 +0100 Subject: [PATCH 01/13] ifn_loop_dist_alias-20170621.txt --- gcc/cfgloop.h | 13 +++++++- gcc/cfgloopmanip.c | 3 +- gcc/internal-fn.c | 8 +++++ gcc/internal-fn.def | 1 + gcc/tree-vectorizer.c | 89 +++++++++++++++++++++++++++++++++++++++++++++------ 5 files changed, 103 insertions(+), 11 deletions(-) diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h index a8bec1d..e7ffa23 100644 --- a/gcc/cfgloop.h +++ b/gcc/cfgloop.h @@ -114,7 +114,8 @@ struct GTY ((chain_next ("%h.next"))) control_iv { /* Structure to hold information for each natural loop. */ struct GTY ((chain_next ("%h.next"))) loop { - /* Index into loops array. */ + /* Index into loops array. Note indices will never be reused after loop + is destroyed. */ int num; /* Number of loop insns. */ @@ -225,6 +226,16 @@ struct GTY ((chain_next ("%h.next"))) loop { builtins. */ tree simduid; + /* In loop optimization, it's common to generate loops from the original + loop. This field records the index of the original loop which can be + used to track the original loop from newly generated loops. This can + be done by calling function get_loop (cfun, orig_loop_num). Note the + original loop could be destroyed for various reasons thus no longer + exists, as a result, function call to get_loop returns NULL pointer. + In this case, this field should not be used and needs to be cleared + whenever possible. */ + int orig_loop_num; + /* Upper bound on number of iterations of a loop. */ struct nb_iter_bound *bounds; diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c index d764ab9..adb2f65 100644 --- a/gcc/cfgloopmanip.c +++ b/gcc/cfgloopmanip.c @@ -1653,7 +1653,8 @@ force_single_succ_latches (void) THEN_PROB is the probability of then branch of the condition. ELSE_PROB is the probability of else branch. Note that they may be both - REG_BR_PROB_BASE when condition is IFN_LOOP_VECTORIZED. */ + REG_BR_PROB_BASE when condition is IFN_LOOP_VECTORIZED or + IFN_LOOP_DIST_ALIAS. */ static basic_block lv_adjust_loop_entry_edge (basic_block first_head, basic_block second_head, diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index 75fe027..96e40cb 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -2250,6 +2250,14 @@ expand_LOOP_VECTORIZED (internal_fn, gcall *) gcc_unreachable (); } +/* This should get folded in tree-vectorizer.c. */ + +static void +expand_LOOP_DIST_ALIAS (internal_fn, gcall *) +{ + gcc_unreachable (); +} + /* Expand MASK_LOAD call STMT using optab OPTAB. */ static void diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index e162d81..79c19fb 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -158,6 +158,7 @@ DEF_INTERNAL_FN (GOMP_SIMD_LAST_LANE, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (GOMP_SIMD_ORDERED_START, ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (GOMP_SIMD_ORDERED_END, ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (LOOP_VECTORIZED, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL) +DEF_INTERNAL_FN (LOOP_DIST_ALIAS, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (ANNOTATE, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (UBSAN_NULL, ECF_LEAF | ECF_NOTHROW, ".R.") DEF_INTERNAL_FN (UBSAN_BOUNDS, ECF_LEAF | ECF_NOTHROW, NULL) diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c index 0d62c82..33a1f58 100644 --- a/gcc/tree-vectorizer.c +++ b/gcc/tree-vectorizer.c @@ -448,11 +448,11 @@ vect_loop_vectorized_call (struct loop *loop) return NULL; } -/* Fold LOOP_VECTORIZED internal call G to VALUE and - update any immediate uses of it's LHS. */ +/* Fold loop internal call G like IFN_LOOP_VECTORIZED/IFN_LOOP_DIST_ALIAS + to VALUE and update any immediate uses of it's LHS. */ static void -fold_loop_vectorized_call (gimple *g, tree value) +fold_loop_internal_call (gimple *g, tree value) { tree lhs = gimple_call_lhs (g); use_operand_p use_p; @@ -469,6 +469,60 @@ fold_loop_vectorized_call (gimple *g, tree value) } } +/* If LOOP has been versioned during loop distribution, return the gurading + internal call. */ + +static gimple * +vect_loop_dist_alias_call (struct loop *loop) +{ + basic_block bb; + basic_block entry; + struct loop *outer, *orig; + gimple_stmt_iterator gsi; + gimple *g; + + if (loop->orig_loop_num == 0) + return NULL; + + orig = get_loop (cfun, loop->orig_loop_num); + if (orig == NULL) + { + /* The original loop is somehow destroyed. Clear the information. */ + loop->orig_loop_num = 0; + return NULL; + } + + if (loop != orig) + bb = nearest_common_dominator (CDI_DOMINATORS, loop->header, orig->header); + else + bb = loop_preheader_edge (loop)->src; + + outer = bb->loop_father; + entry = ENTRY_BLOCK_PTR_FOR_FN (cfun); + + /* Look upward in dominance tree. */ + for (; bb != entry && flow_bb_inside_loop_p (outer, bb); + bb = get_immediate_dominator (CDI_DOMINATORS, bb)) + { + g = last_stmt (bb); + if (g == NULL || gimple_code (g) != GIMPLE_COND) + continue; + + gsi = gsi_for_stmt (g); + gsi_prev (&gsi); + if (gsi_end_p (gsi)) + continue; + + g = gsi_stmt (gsi); + /* The guarding internal function call must have the same distribution + alias id. */ + if (gimple_call_internal_p (g, IFN_LOOP_DIST_ALIAS) + && (tree_to_shwi (gimple_call_arg (g, 0)) == loop->orig_loop_num)) + return g; + } + return NULL; +} + /* Set the uids of all the statements in basic blocks inside loop represented by LOOP_VINFO. LOOP_VECTORIZED_CALL is the internal call guarding the loop which has been if converted. */ @@ -494,7 +548,7 @@ set_uid_loop_bbs (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) { arg = gimple_call_arg (g, 0); get_loop (cfun, tree_to_shwi (arg))->dont_vectorize = true; - fold_loop_vectorized_call (g, boolean_false_node); + fold_loop_internal_call (g, boolean_false_node); } } bbs = get_loop_body (scalar_loop); @@ -595,7 +649,7 @@ vectorize_loops (void) else { loop_vec_info loop_vinfo, orig_loop_vinfo; - gimple *loop_vectorized_call; + gimple *loop_vectorized_call, *loop_dist_alias_call; try_vectorize: if (!((flag_tree_loop_vectorize && optimize_loop_nest_for_speed_p (loop)) @@ -603,6 +657,7 @@ vectorize_loops (void) continue; orig_loop_vinfo = NULL; loop_vectorized_call = vect_loop_vectorized_call (loop); + loop_dist_alias_call = vect_loop_dist_alias_call (loop); vectorize_epilogue: vect_location = find_loop_location (loop); if (LOCATION_LOCUS (vect_location) != UNKNOWN_LOCATION @@ -652,8 +707,8 @@ vectorize_loops (void) { dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location, "basic block vectorized\n"); - fold_loop_vectorized_call (loop_vectorized_call, - boolean_true_node); + fold_loop_internal_call (loop_vectorized_call, + boolean_true_node); loop_vectorized_call = NULL; ret |= TODO_cleanup_cfg; } @@ -706,10 +761,17 @@ vectorize_loops (void) if (loop_vectorized_call) { - fold_loop_vectorized_call (loop_vectorized_call, boolean_true_node); + fold_loop_internal_call (loop_vectorized_call, boolean_true_node); loop_vectorized_call = NULL; ret |= TODO_cleanup_cfg; } + if (loop_dist_alias_call) + { + tree value = gimple_call_arg (loop_dist_alias_call, 1); + fold_loop_internal_call (loop_dist_alias_call, value); + loop_dist_alias_call = NULL; + ret |= TODO_cleanup_cfg; + } if (new_loop) { @@ -741,7 +803,16 @@ vectorize_loops (void) gimple *g = vect_loop_vectorized_call (loop); if (g) { - fold_loop_vectorized_call (g, boolean_false_node); + fold_loop_internal_call (g, boolean_false_node); + ret |= TODO_cleanup_cfg; + g = NULL; + } + else + g = vect_loop_dist_alias_call (loop); + + if (g) + { + fold_loop_internal_call (g, boolean_false_node); ret |= TODO_cleanup_cfg; } } -- 1.9.1