On 28/07/15 12:11, Richard Biener wrote:
On Fri, Jul 24, 2015 at 12:10 PM, Tom de Vries <tom_devr...@mentor.com> wrote:
On 20/07/15 15:04, Tom de Vries wrote:
On 16/07/15 12:15, Richard Biener wrote:
On Thu, Jul 16, 2015 at 11:39 AM, Tom de Vries
<tom_devr...@mentor.com> wrote:
On 16/07/15 10:44, Richard Biener wrote:
On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries <tom_devr...@mentor.com>
wrote:
Hi,
I.
In openmp expansion of loops, we do some effort to try to create
matching
loops in the loop state of the child function, f.i.in
expand_omp_for_generic:
...
struct loop *outer_loop;
if (seq_loop)
outer_loop = l0_bb->loop_father;
else
{
outer_loop = alloc_loop ();
outer_loop->header = l0_bb;
outer_loop->latch = l2_bb;
add_loop (outer_loop, l0_bb->loop_father);
}
if (!gimple_omp_for_combined_p (fd->for_stmt))
{
struct loop *loop = alloc_loop ();
loop->header = l1_bb;
/* The loop may have multiple latches. */
add_loop (loop, outer_loop);
}
...
And if that doesn't work out, we try to mark the loop state for
fixup, in
expand_omp_taskreg and expand_omp_target:
...
/* When the OMP expansion process cannot guarantee an
up-to-date
loop tree arrange for the child function to fixup
loops. */
if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
...
and expand_omp_for:
...
else
/* If there isn't a continue then this is a degerate case where
the introduction of abnormal edges during lowering will
prevent
original loops from being detected. Fix that up. */
loops_state_set (LOOPS_NEED_FIXUP);
...
However, loops are fixed up anyway, because the first pass we execute
with
the new child function is pass_fixup_cfg.
The new child function contains a function call to
__builtin_omp_get_num_threads, which is marked with ECF_CONST, so
execute_fixup_cfg marks the function for TODO_cleanup_cfg, and
subsequently
the loops with LOOPS_NEED_FIXUP.
II.
This patch adds a verification that at the end of the omp-expand
processing
of the child function, either the loop structure is ok, or marked for
fixup.
This verfication triggered a failure in parloops. When an outer
loop is
being parallelized, both the outer and inner loop are cancelled. Then
during
omp-expansion, we create a loop in the loop state for the outer
loop (the
one that is transformed), but not for the inner, which causes the
verification failure:
...
outer-1.c:11:3: error: loop with header 5 not in loop tree
...
[ I ran into this verification failure with an openacc kernels
testcase
on
the gomp-4_0-branch, where parloops is called additionally from a
different
location, and pass_fixup_cfg is not the first pass that the child
function
is processed by. ]
The patch contains a bit that makes sure that the loop state of the
child
function is marked for fixup in parloops. The bit is non-trival
since it
create a loop state and sets the fixup flag on the loop state, but
postpones
the init_loops_structure call till move_sese_region_to_fn, where it
can
succeed.
<SNIP>
Can we fix the root-cause of the issue instead? That is, build a
valid loop
structure in the first place?
This patch manages to keep the loop structure, that is, to not cancel
the loop tree in parloops, and guarantee a valid loop structure at the
end of parloops.
The transformation to insert the omp_for invalidates the loop state
properties LOOPS_HAVE_RECORDED_EXITS and LOOPS_HAVE_SIMPLE_LATCHES, so
we drop those in parloops.
In expand_omp_for_static_nochunk, we detect the existing loop struct of
the omp_for, and keep it.
Then by calling pass_tree_loop_init after pass_expand_omp_ssa, we get
the loop state properties LOOPS_HAVE_RECORDED_EXITS and
LOOPS_HAVE_SIMPLE_LATCHES back.
This updated patch tries a more minimal approach.
Rather than dropping property LOOPS_HAVE_RECORDED_EXITS, we record the new
exit instead.
And rather than adding pass_tree_loop_init after pass_expand_omp_ssa, we
just set LOOPS_HAVE_SIMPLE_LATCHES back at the end of pass_expand_omp_ssa.
Bootstrapped and reg-tested on x86_64.
OK for trunk?
I wonder about the need to clear LOOPS_HAVE_SIMPLE_LATCHES (and esp. turning
that back on in execute_expand_omp). The parloops code lacks comments and
the /* Prepare cfg. */ part looks twisty to me - but I don't see why
it should be difficult
to preserve simple latches as well - is this just because we insert
the GIMPLE_OMP_CONTINUE
in it?
It's not difficult to do. It's just that the omp-for that is generated
via the normal route (omp-annotated source code) doesn't have this
simple latch, and parloops just mimics that cfg shape.
Attached updated patch preserves simple latches in parloops. We need
some minor adjustments in omp-expand to handle that.
If execute_expand_omp is not performed in a loop pipeline where things
expect simple latches
(well, obviously it isn't) then why set LOOPS_HAVE_SIMPLE_LATCHES here
at all? If somebody
needs it it will just request simple latches.
I've looked into this. The (first?) pass that gives me trouble is
pass_iv_optimize. It doesn't request simple latches, but it does need them.
So if the GIMPLE_OMP_CONTINUE part is correct then I'd prefer to keep
LOOPS_HAVE_SIMPLE_LATCHES
unset in execute_expand_omp.
Ok with that change.
OK with this approach of keeping LOOPS_HAVE_SIMPLE_LATCHES in parloops
(if bootstrap and reg-test succeeds)?
Thanks,
- Tom
Don't cancel loop tree in parloops
2015-07-28 Tom de Vries <t...@codesourcery.com>
PR tree-optimization/66846
* omp-low.c (expand_omp_taskreg) [ENABLE_CHECKING]: Call
verify_loop_structure for child_cfun if !LOOPS_NEED_FIXUP.
(expand_omp_target) [ENABLE_CHECKING]: Same.
(execute_expand_omp) [ENABLE_CHECKING]: Call verify_loop_structure for
cfun if !LOOPS_NEED_FIXUP.
(expand_omp_for_static_nochunk): Handle simple latch bb. Handle case
that omp_for already has its own loop struct.
* tree-parloops.c (create_phi_for_local_result)
(create_call_for_reduction): Handle simple latch bb.
(create_parallel_loop): Add simple latch bb to preserve
LOOPS_HAVE_SIMPLE_LATCHES. Record new exit. Handle simple latch bb.
(gen_parallel_loop): Remove call to cancel_loop_tree.
(parallelize_loops): Skip loops that are inner loops of parallelized
loops.
(pass_parallelize_loops::execute) [ENABLE_CHECKING]: Call
verify_loop_structure.
---
gcc/omp-low.c | 32 ++++++++++++++++++++++++++++++--
gcc/tree-parloops.c | 49 ++++++++++++++++++++++++++++++++++++-------------
2 files changed, 66 insertions(+), 15 deletions(-)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 3135606..0f5c0f1 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -5604,6 +5604,10 @@ expand_omp_taskreg (struct omp_region *region)
}
if (gimple_in_ssa_p (cfun))
update_ssa (TODO_update_ssa);
+#ifdef ENABLE_CHECKING
+ if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+ verify_loop_structure ();
+#endif
pop_cfun ();
}
@@ -6535,7 +6539,8 @@ expand_omp_for_static_nochunk (struct omp_region *region,
body_bb = single_succ (seq_start_bb);
if (!broken_loop)
{
- gcc_assert (BRANCH_EDGE (cont_bb)->dest == body_bb);
+ gcc_assert (BRANCH_EDGE (cont_bb)->dest == body_bb
+ || single_succ (BRANCH_EDGE (cont_bb)->dest) == body_bb);
gcc_assert (EDGE_COUNT (cont_bb->succs) == 2);
}
exit_bb = region->exit;
@@ -6818,6 +6823,11 @@ expand_omp_for_static_nochunk (struct omp_region *region,
if (!broken_loop)
{
ep = find_edge (cont_bb, body_bb);
+ if (ep == NULL)
+ {
+ ep = BRANCH_EDGE (cont_bb);
+ gcc_assert (single_succ (ep->dest) == body_bb);
+ }
if (gimple_omp_for_combined_p (fd->for_stmt))
{
remove_edge (ep);
@@ -6843,9 +6853,19 @@ expand_omp_for_static_nochunk (struct omp_region *region,
set_immediate_dominator (CDI_DOMINATORS, fin_bb,
recompute_dominator (CDI_DOMINATORS, fin_bb));
+ struct loop *loop = body_bb->loop_father;
+ if (loop != entry_bb->loop_father)
+ {
+ gcc_assert (loop->header == body_bb);
+ gcc_assert (broken_loop
+ || loop->latch == region->cont
+ || single_pred (loop->latch) == region->cont);
+ return;
+ }
+
if (!broken_loop && !gimple_omp_for_combined_p (fd->for_stmt))
{
- struct loop *loop = alloc_loop ();
+ loop = alloc_loop ();
loop->header = body_bb;
if (collapse_bb == NULL)
loop->latch = cont_bb;
@@ -8984,6 +9004,10 @@ expand_omp_target (struct omp_region *region)
if (changed)
cleanup_tree_cfg ();
}
+#ifdef ENABLE_CHECKING
+ if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+ verify_loop_structure ();
+#endif
pop_cfun ();
}
@@ -9492,6 +9516,10 @@ execute_expand_omp (void)
expand_omp (root_omp_region);
+#ifdef ENABLE_CHECKING
+ if (!loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+ verify_loop_structure ();
+#endif
cleanup_tree_cfg ();
free_omp_regions ();
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index b06265c..d017479 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -1032,21 +1032,22 @@ create_phi_for_local_result (reduction_info **slot, struct loop *loop)
struct reduction_info *const reduc = *slot;
edge e;
gphi *new_phi;
- basic_block store_bb;
+ basic_block store_bb, continue_bb;
tree local_res;
source_location locus;
/* STORE_BB is the block where the phi
should be stored. It is the destination of the loop exit.
(Find the fallthru edge from GIMPLE_OMP_CONTINUE). */
- store_bb = FALLTHRU_EDGE (loop->latch)->dest;
+ continue_bb = single_pred (loop->latch);
+ store_bb = FALLTHRU_EDGE (continue_bb)->dest;
/* STORE_BB has two predecessors. One coming from the loop
(the reduction's result is computed at the loop),
and another coming from a block preceding the loop,
when no iterations
are executed (the initial value should be taken). */
- if (EDGE_PRED (store_bb, 0) == FALLTHRU_EDGE (loop->latch))
+ if (EDGE_PRED (store_bb, 0) == FALLTHRU_EDGE (continue_bb))
e = EDGE_PRED (store_bb, 1);
else
e = EDGE_PRED (store_bb, 0);
@@ -1055,7 +1056,7 @@ create_phi_for_local_result (reduction_info **slot, struct loop *loop)
locus = gimple_location (reduc->reduc_stmt);
new_phi = create_phi_node (local_res, store_bb);
add_phi_arg (new_phi, reduc->init, e, locus);
- add_phi_arg (new_phi, lhs, FALLTHRU_EDGE (loop->latch), locus);
+ add_phi_arg (new_phi, lhs, FALLTHRU_EDGE (continue_bb), locus);
reduc->new_phi = new_phi;
return 1;
@@ -1134,7 +1135,8 @@ create_call_for_reduction (struct loop *loop,
{
reduction_list->traverse <struct loop *, create_phi_for_local_result> (loop);
/* Find the fallthru edge from GIMPLE_OMP_CONTINUE. */
- ld_st_data->load_bb = FALLTHRU_EDGE (loop->latch)->dest;
+ basic_block continue_bb = single_pred (loop->latch);
+ ld_st_data->load_bb = FALLTHRU_EDGE (continue_bb)->dest;
reduction_list
->traverse <struct clsn_data *, create_call_for_reduction_1> (ld_st_data);
}
@@ -1981,7 +1983,7 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
tree new_data, unsigned n_threads, location_t loc)
{
gimple_stmt_iterator gsi;
- basic_block bb, paral_bb, for_bb, ex_bb;
+ basic_block bb, paral_bb, for_bb, ex_bb, continue_bb;
tree t, param;
gomp_parallel *omp_par_stmt;
gimple omp_return_stmt1, omp_return_stmt2;
@@ -2052,8 +2054,12 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
gcc_assert (exit == single_dom_exit (loop));
guard = make_edge (for_bb, ex_bb, 0);
- single_succ_edge (loop->latch)->flags = 0;
- end = make_edge (loop->latch, ex_bb, EDGE_FALLTHRU);
+ /* Split the latch edge, so LOOPS_HAVE_SIMPLE_LATCHES is still valid. */
+ loop->latch = split_edge (single_succ_edge (loop->latch));
+ single_pred_edge (loop->latch)->flags = 0;
+ end = make_edge (single_pred (loop->latch), ex_bb, EDGE_FALLTHRU);
+ rescan_loop_exit (end, true, false);
+
for (gphi_iterator gpi = gsi_start_phis (ex_bb);
!gsi_end_p (gpi); gsi_next (&gpi))
{
@@ -2102,7 +2108,8 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
SSA_NAME_DEF_STMT (initvar) = for_stmt;
/* Emit GIMPLE_OMP_CONTINUE. */
- gsi = gsi_last_bb (loop->latch);
+ continue_bb = single_pred (loop->latch);
+ gsi = gsi_last_bb (continue_bb);
omp_cont_stmt = gimple_build_omp_continue (cvar_next, cvar);
gimple_set_location (omp_cont_stmt, loc);
gsi_insert_after (&gsi, omp_cont_stmt, GSI_NEW_STMT);
@@ -2298,10 +2305,6 @@ gen_parallel_loop (struct loop *loop,
scev_reset ();
- /* Cancel the loop (it is simpler to do it here rather than to teach the
- expander to do it). */
- cancel_loop_tree (loop);
-
/* Free loop bound estimations that could contain references to
removed statements. */
FOR_EACH_LOOP (loop, 0)
@@ -2587,6 +2590,7 @@ parallelize_loops (void)
unsigned n_threads = flag_tree_parallelize_loops;
bool changed = false;
struct loop *loop;
+ struct loop *skip_loop = NULL;
struct tree_niter_desc niter_desc;
struct obstack parloop_obstack;
HOST_WIDE_INT estimated;
@@ -2604,6 +2608,19 @@ parallelize_loops (void)
FOR_EACH_LOOP (loop, 0)
{
+ if (loop == skip_loop)
+ {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file,
+ "Skipping loop %d as inner loop of parallelized loop\n",
+ loop->num);
+
+ skip_loop = loop->inner;
+ continue;
+ }
+ else
+ skip_loop = NULL;
+
reduction_list.empty ();
if (dump_file && (dump_flags & TDF_DETAILS))
{
@@ -2663,6 +2680,7 @@ parallelize_loops (void)
continue;
changed = true;
+ skip_loop = loop->inner;
if (dump_file && (dump_flags & TDF_DETAILS))
{
if (loop->inner)
@@ -2729,6 +2747,11 @@ pass_parallelize_loops::execute (function *fun)
if (parallelize_loops ())
{
fun->curr_properties &= ~(PROP_gimple_eomp);
+
+#ifdef ENABLE_CHECKING
+ verify_loop_structure ();
+#endif
+
return TODO_update_ssa;
}
--
1.9.1