On Fri, 29 Sep 2017, Richard Biener wrote: > > For gcc.dg/graphite/scop-4.c when we analyze data-refs of the fist > two inner loops (with the scalar BB in between) > > for (i = 1; i < 100; i++) /// loop 1 > { > -- scop start > for (j = 1; j < 80; j++) /// loop 2 > a[j][i] = a[j+1][2*i-1*j] + 12; > > b[i] = b[i-1] + 10; > > for (j = 1; j < 60; j++) /// loop 3 > a[j][i] = a[j+1][i-1] + 8; > -- scop end > > bar (); > > if (i == 23) > b[i] = a[i-1][i] + 6; > } > > we end up asking data-ref analysis to analyze b[i] and b[i-1] with > respect to loop == nest == 2. That doesn't make much sense in > itself and thus we get some weird answers. The one that confuses > us is {0, +, 1}_1 because later when we try to extract_affine_chrec > on that ISL aborts because we feed it -1 as the dimension for the > loop. > > While we can with some effort suppress those CHRECs I think the > better solution is to make GRAPHITE not ask SCEV to do this > kind of analysis but consistently analyze DRs in the containing > loop even if that loop is not within the current region. > > The idea is that we'd transform the above to > > for (i = 1; i < 100; i++) /// loop 1 > { > -- scop start > do > { > i' = i; > > for (j = 1; j < 80; j++) /// loop 2 > a[j][i'] = a[j+1][2*i'-1*j] + 12; > > b[i'] = b[i'-1] + 10; > > for (j = 1; j < 60; j++) /// loop 3 > a[j][i'] = a[j+1][i'-1] + 8; > } > while (0); > -- scop end > > bar (); > > if (i == 23) > b[i] = a[i-1][i] + 6; > } > > basically wrap each SCOP inside a loop that doesn't iterate. > > This should make analyzing the data-refs possible and enable > dependence analysis for them. > > I needed to create an extra dimension for this "loop" and > adjust some +- 1 stuff (ugh, this can need some better abstraction...). > > On the way the stmt_has_simple_data_refs_p/try_generate_gimple_bb > changes fix PR82355 as well. > > The copy_bb_and_scalar_dependences change is to avoid extra > code-gen errors on some existing graphite testcases where we now > perform more elaborate transforms. > > > I probably started with one of the more difficult code-gen errors > here so bear with my lack of GRAPHITE knowledge. > > Does this look reasonable? > > Bootstrapped and tested on x86_64-unknown-linux-gnu (with & without > -floop-nest-optimize -fgraphite-identity). > > I've verified that the graphite testsuite now has less ICEs when > turning code-gen errors into ICEs. > > Results of throwing this onto SPEC CPU 2006 are somewhat hard to > interpret. Before the patch we see > > loop nest optimized: 105 > loop nest not optimized, code generation error: 42 > loop nest not optimized, optimized schedule is identical to original > schedule: 119 > loop nest not optimized, optimization timed out: 36 > loop nest not optimized, ISL signalled an error: 6 > loop nest: 308 > > and after > > loop nest optimized: 87 > loop nest not optimized, code generation error: 93 > loop nest not optimized, optimized schedule is identical to original > schedule: 216 > loop nest not optimized, optimization timed out: 46 > loop nest not optimized, ISL signalled an error: 8 > loop nest: 450 > > first of all this means we've rejected fewer SCOPs initially > (very likely because of alias-sets now matching and maybe because > asking SCEV for sth sensible turns out to result in less > scev_not_known stuff in affected DRs). I'm not sure why > we end up optimizing less (maybe because I don't analyze DRs > in loops contained in the region with respect to the containing loop > as well -- I'd have to pre-compute that loop).
So fixing the build_iv_mapping hunk to /* Record sth only for real loops. */ if (loop_in_sese_p (old_loop, region)) iv_map[old_loop->num] = t; changes statistics to a favorable loop nest optimized: 134 loop nest not optimized, code generation error: 46 loop nest not optimized, optimized schedule is identical to original schedule: 216 loop nest not optimized, optimization timed out: 46 loop nest not optimized, ISL signalled an error: 8 loop nest: 450 Consider the patch changed that way. I guess I'll add a -fgraphite-no-codegen-error to force ICEing on them and enable that on the graphite.exp testcases while explicitely sticking on a -fno-graphite-no-codegen-error on those who'll ICE and scan for a codegen error message on them. That makes it easier to avoid regressing in that area. Richard. > Anyway. > > Ok for trunk? > > Thanks, > Richard. > > 2017-09-29 Richard Biener <rguent...@suse.de> > > PR tree-optimization/82355 > * graphite-isl-ast-to-gimple.c (build_iv_mapping): Also build > a mapping for the enclosing loop but avoid generating one for > the loop tree root. > (copy_bb_and_scalar_dependences): Remove premature codegen > error on PHIs in blocks duplicated into multiple places. > * graphite-scop-detection.c > (scop_detection::stmt_has_simple_data_refs_p): For a loop not > in the region use it as loop and nest to analyze the DR in. > (try_generate_gimple_bb): Likewise. > * graphite-sese-to-poly.c (extract_affine_chrec): Adjust. > (add_loop_constraints): For blocks in a loop not in the region > create a dimension with a single iteration. > * sese.h (gbb_loop_at_index): Remove assert. > > * gcc.dg/graphite/fuse-1.c: Adjust. > * gcc.dg/graphite/fuse-2.c: Likewise. > * gcc.dg/graphite/pr82355.c: New testcase. > > Index: gcc/graphite-isl-ast-to-gimple.c > =================================================================== > --- gcc/graphite-isl-ast-to-gimple.c (revision 253282) > +++ gcc/graphite-isl-ast-to-gimple.c (working copy) > @@ -774,8 +775,10 @@ build_iv_mapping (vec<tree> iv_map, gimp > if (codegen_error_p ()) > t = integer_zero_node; > > - loop_p old_loop = gbb_loop_at_index (gbb, region, i - 1); > - iv_map[old_loop->num] = t; > + loop_p old_loop = gbb_loop_at_index (gbb, region, i - 2); > + /* Record sth only for real loops. */ > + if (old_loop->num != 0) > + iv_map[old_loop->num] = t; > } > } > > @@ -2587,18 +2590,6 @@ edge translate_isl_ast_to_gimple:: > copy_bb_and_scalar_dependences (basic_block bb, edge next_e, vec<tree> > iv_map) > { > int num_phis = number_of_phi_nodes (bb); > - > - if (region->copied_bb_map->get (bb)) > - { > - /* FIXME: we should be able to handle phi nodes with args coming from > - outside the region. */ > - if (num_phis) > - { > - set_codegen_error (); > - return NULL; > - } > - } > - > basic_block new_bb = NULL; > if (bb_contains_loop_close_phi_nodes (bb)) > { > Index: gcc/graphite-scop-detection.c > =================================================================== > --- gcc/graphite-scop-detection.c (revision 253282) > +++ gcc/graphite-scop-detection.c (working copy) > @@ -1055,10 +1055,12 @@ scop_detection::graphite_can_represent_e > bool > scop_detection::stmt_has_simple_data_refs_p (sese_l scop, gimple *stmt) > { > - loop_p nest = outermost_loop_in_sese (scop, gimple_bb (stmt)); > + loop_p nest; > loop_p loop = loop_containing_stmt (stmt); > if (!loop_in_sese_p (loop, scop)) > - loop = nest; > + nest = loop; > + else > + nest = outermost_loop_in_sese (scop, gimple_bb (stmt)); > > auto_vec<data_reference_p> drs; > if (! graphite_find_data_references_in_stmt (nest, loop, stmt, &drs)) > @@ -1450,11 +1452,12 @@ try_generate_gimple_bb (scop_p scop, bas > vec<scalar_use> reads = vNULL; > > sese_l region = scop->scop_info->region; > - loop_p nest = outermost_loop_in_sese (region, bb); > - > + loop_p nest; > loop_p loop = bb->loop_father; > if (!loop_in_sese_p (loop, region)) > - loop = nest; > + nest = loop; > + else > + nest = outermost_loop_in_sese (region, bb); > > for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi); > gsi_next (&gsi)) > Index: gcc/graphite-sese-to-poly.c > =================================================================== > --- gcc/graphite-sese-to-poly.c (revision 253282) > +++ gcc/graphite-sese-to-poly.c (working copy) > @@ -86,7 +86,7 @@ extract_affine_chrec (scop_p s, tree e, > isl_pw_aff *lhs = extract_affine (s, CHREC_LEFT (e), isl_space_copy > (space)); > isl_pw_aff *rhs = extract_affine (s, CHREC_RIGHT (e), isl_space_copy > (space)); > isl_local_space *ls = isl_local_space_from_space (space); > - unsigned pos = sese_loop_depth (s->scop_info->region, get_chrec_loop (e)) > - 1; > + unsigned pos = sese_loop_depth (s->scop_info->region, get_chrec_loop (e)); > isl_aff *loop = isl_aff_set_coefficient_si > (isl_aff_zero_on_domain (ls), isl_dim_in, pos, 1); > isl_pw_aff *l = isl_pw_aff_from_aff (loop); > @@ -756,10 +759,10 @@ add_loop_constraints (scop_p scop, __isl > return domain; > const sese_l ®ion = scop->scop_info->region; > if (!loop_in_sese_p (loop, region)) > - return domain; > - > - /* Recursion all the way up to the context loop. */ > - domain = add_loop_constraints (scop, domain, loop_outer (loop), context); > + ; > + else > + /* Recursion all the way up to the context loop. */ > + domain = add_loop_constraints (scop, domain, loop_outer (loop), context); > > /* Then, build constraints over the loop in post-order: outer to inner. */ > > @@ -770,6 +773,21 @@ add_loop_constraints (scop_p scop, __isl > domain = add_iter_domain_dimension (domain, loop, scop); > isl_space *space = isl_set_get_space (domain); > > + if (!loop_in_sese_p (loop, region)) > + { > + /* 0 == loop_i */ > + isl_local_space *ls = isl_local_space_from_space (space); > + isl_constraint *c = isl_equality_alloc (ls); > + c = isl_constraint_set_coefficient_si (c, isl_dim_set, loop_index, 1); > + if (dump_file) > + { > + fprintf (dump_file, "[sese-to-poly] adding constraint to the domain: > "); > + print_isl_constraint (dump_file, c); > + } > + domain = isl_set_add_constraint (domain, c); > + return domain; > + } > + > /* 0 <= loop_i */ > isl_local_space *ls = isl_local_space_from_space (isl_space_copy (space)); > isl_constraint *c = isl_inequality_alloc (ls); > Index: gcc/sese.h > =================================================================== > --- gcc/sese.h (revision 253282) > +++ gcc/sese.h (working copy) > @@ -326,8 +326,6 @@ gbb_loop_at_index (gimple_poly_bb_p gbb, > while (--depth > index) > loop = loop_outer (loop); > > - gcc_assert (loop_in_sese_p (loop, region)); > - > return loop; > } > > Index: gcc/testsuite/gcc.dg/graphite/fuse-1.c > =================================================================== > --- gcc/testsuite/gcc.dg/graphite/fuse-1.c (revision 253282) > +++ gcc/testsuite/gcc.dg/graphite/fuse-1.c (working copy) > @@ -1,15 +1,15 @@ > /* Check that the two loops are fused and that we manage to fold the two xor > operations. */ > -/* { dg-options "-O2 -floop-nest-optimize -fdump-tree-forwprop-all > -fdump-tree-graphite-all" } */ > +/* { dg-options "-O2 -floop-nest-optimize -fdump-tree-forwprop4-all > -fdump-tree-graphite-all" } */ > > /* Make sure we fuse the loops like this: > AST generated by isl: > for (int c0 = 0; c0 <= 99; c0 += 1) { > - S_3(c0); > - S_6(c0); > - S_9(c0); > + S_3(0, c0); > + S_6(0, c0); > + S_9(0, c0); > } */ > -/* { dg-final { scan-tree-dump-times "AST generated by isl:.*for \\(int c0 = > 0; c0 <= 99; c0 \\+= 1\\) > \\{.*S_.*\\(c0\\);.*S_.*\\(c0\\);.*S_.*\\(c0\\);.*\\}" 1 "graphite" } } */ > +/* { dg-final { scan-tree-dump-times "AST generated by isl:.*for \\(int c0 = > 0; c0 <= 99; c0 \\+= 1\\) \\{.*S_.*\\(0, c0\\);.*S_.*\\(0, c0\\);.*S_.*\\(0, > c0\\);.*\\}" 1 "graphite" } } */ > > /* Check that after fusing the loops, the scalar computation is also fused. > */ > /* { dg-final { scan-tree-dump-times "gimple_simplified to\[^\\n\]*\\^ 12" 1 > "forwprop4" } } */ > Index: gcc/testsuite/gcc.dg/graphite/fuse-2.c > =================================================================== > --- gcc/testsuite/gcc.dg/graphite/fuse-2.c (revision 253282) > +++ gcc/testsuite/gcc.dg/graphite/fuse-2.c (working copy) > @@ -3,13 +3,13 @@ > /* Make sure we fuse the loops like this: > AST generated by isl: > for (int c0 = 0; c0 <= 99; c0 += 1) { > - S_3(c0); > - S_6(c0); > - S_9(c0); > + S_3(0, c0); > + S_6(0, c0); > + S_9(0, c0); > } > */ > > -/* { dg-final { scan-tree-dump-times "AST generated by isl:.*for \\(int c0 = > 0; c0 <= 99; c0 \\+= 1\\) > \\{.*S_.*\\(c0\\);.*S_.*\\(c0\\);.*S_.*\\(c0\\);.*\\}" 1 "graphite" } } */ > +/* { dg-final { scan-tree-dump-times "AST generated by isl:.*for \\(int c0 = > 0; c0 <= 99; c0 \\+= 1\\) \\{.*S_.*\\(0, c0\\);.*S_.*\\(0, c0\\);.*S_.*\\(0, > c0\\);.*\\}" 1 "graphite" } } */ > > #define MAX 100 > int A[MAX], B[MAX], C[MAX]; > Index: gcc/testsuite/gcc.dg/graphite/pr82355.c > =================================================================== > --- gcc/testsuite/gcc.dg/graphite/pr82355.c (nonexistent) > +++ gcc/testsuite/gcc.dg/graphite/pr82355.c (working copy) > @@ -0,0 +1,23 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -floop-nest-optimize" } */ > + > +int dc, at; > + > +void > +tv (int *ld, int jl) > +{ > + for (;;) > + { > + if (dc != 0) > + for (;;) > + { > + *ld = !!(*ld) + 1; > + for (dc = 0; dc < 3; ++dc) > + at = (jl != 0) ? *ld : 0; > + } > + > + while (at != 0) > + { > + } > + } > +} > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)