On Mon, Oct 17, 2016 at 1:47 PM, Thomas Schwinge <tho...@codesourcery.com> wrote: > Hi! > > On Mon, 17 Oct 2016 13:22:17 +0200, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Mon, Oct 17, 2016 at 11:38 AM, Thomas Schwinge >> <tho...@codesourcery.com> wrote: >> > On Fri, 14 Oct 2016 13:06:59 +0200, Richard Biener >> > <richard.guent...@gmail.com> wrote: >> >> On Fri, Oct 14, 2016 at 1:00 PM, Nathan Sidwell <nat...@acm.org> wrote: >> >> > On 10/14/16 05:28, Richard Biener wrote: >> >> > >> >> >> The BB_VISITED flag has indetermined state at the beginning of a pass. >> >> >> You have to ensure it is cleared yourself. >> >> > >> >> > >> >> > In that case the openacc (&nvptx?) passes should be modified to clear >> >> > the >> >> > flags at their start, rather than at their end. >> > >> > The gcc/config/nvptx/nvptx.c handling seems fine -- it explicitly clears >> > BB_VISITED for all basic block it works on. >> > >> >> Yes. But as I said, I ran into IRA ICEs (somewhere in the testsuite) >> >> when not >> >> cleaning up after tree-ssa-propagate.c. So somebody has to fix IRA first. >> > >> > Is there a GCC PR for that, or where are you tracking such issues? >> >> No, just tracking in my head. > > Tsk, tsk... ;-)
bb-reorder.c has the same issue. Index: gcc/bb-reorder.c =================================================================== --- gcc/bb-reorder.c (revision 241228) +++ gcc/bb-reorder.c (working copy) @@ -2355,7 +2355,10 @@ reorder_basic_blocks_simple (void) To start with, everything points to itself, nothing is assigned yet. */ FOR_ALL_BB_FN (bb, cfun) - bb->aux = bb; + { + bb->aux = bb; + bb->flags &= ~BB_VISITED; + } EXIT_BLOCK_PTR_FOR_FN (cfun)->aux = 0; note that I didn't really understand the IRA issue (the code looks fine from a quick look - it initializes to BB_VISITED). Still removing the tree-ssa-propagate.c BB_VISITED clearing resulted in a bootstrap failure on x86_64. Richard. > >> > OK to commit the following? Is such a test case appropriate (which would >> > have caught this issue right away), in particular the dg-final >> > scan-tree-dump line? >> >> Ugh. Not worse to what we do in various dwarf scanning I guess. > > ;-| > >> Doesn't failure lead to a miscompile eventually? So you could formulate >> this as a dg-do run test with a check for the desired outcome? > > No, unfortunately. In this case the error is "benign" such that the > OpenACC loop processing machinery will decide to not parallelize loops > that ought to be parallelized. So you can scan for "loop parallelized" instead? I fear your pattern is quite fragile to maintain over time. Richard. > This won't generally cause any problem > (apart from performance regression, obviously); it just caused problems > in a few libgomp test cases that actually at run time test for > parallelized execution -- which will/did trigger only with nvptx > offloading enabled, which not too many people are testing. The test case > I propose below will trigger also for non-offloading configurations. > >> > Clear basic block flags before using BB_VISITED for OpenACC loops >> > processing >> > >> > gcc/ >> > * omp-low.c (oacc_loop_discovery): Call clear_bb_flags. >> > >> > gcc/testsuite/ >> > * gcc.dg/goacc/loop-processing-1.c: New file. >> > --- >> > gcc/omp-low.c | 9 +++++---- >> > gcc/testsuite/gcc.dg/goacc/loop-processing-1.c | 18 ++++++++++++++++++ >> > 2 files changed, 23 insertions(+), 4 deletions(-) >> > >> > --- gcc/omp-low.c >> > +++ gcc/omp-low.c >> > @@ -19340,7 +19340,9 @@ oacc_loop_sibling_nreverse (oacc_loop *loop) >> > static oacc_loop * >> > oacc_loop_discovery () >> > { >> > - basic_block bb; >> > + /* Clear basic block flags, in particular BB_VISITED which we're going >> > to use >> > + in the following. */ >> > + clear_bb_flags (); >> > >> > oacc_loop *top = new_oacc_loop_outer (current_function_decl); >> > oacc_loop_discover_walk (top, ENTRY_BLOCK_PTR_FOR_FN (cfun)); >> > @@ -19349,9 +19351,8 @@ oacc_loop_discovery () >> > that diagnostics come out in an unsurprising order. */ >> > top = oacc_loop_sibling_nreverse (top); >> > >> > - /* Reset the visited flags. */ >> > - FOR_ALL_BB_FN (bb, cfun) >> > - bb->flags &= ~BB_VISITED; >> > + /* Clear basic block flags again, as otherwise IRA will explode later >> > on. */ >> > + clear_bb_flags (); >> > >> > return top; >> > } >> > --- /dev/null >> > +++ gcc/testsuite/gcc.dg/goacc/loop-processing-1.c >> > @@ -0,0 +1,18 @@ >> > +/* Make sure that OpenACC loop processing happens. */ >> > +/* { dg-additional-options "-O2 -fdump-tree-oaccdevlow" } */ >> > + >> > +extern int place (); >> > + >> > +void vector_1 (int *ary, int size) >> > +{ >> > +#pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) >> > firstprivate (size) >> > + { >> > +#pragma acc loop gang >> > + for (int jx = 0; jx < 1; jx++) >> > +#pragma acc loop auto >> > + for (int ix = 0; ix < size; ix++) >> > + ary[ix] = place (); >> > + } >> > +} >> > + >> > +/* { dg-final { scan-tree-dump "OpenACC loops.*Loop 0\\\(0\\\).*Loop >> > 14\\\(1\\\).*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_HEAD_MARK, 0, 1, >> > 20\\\);.*Head-0:.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_HEAD_MARK, >> > 0, 1, 20\\\);.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_FORK, >> > \\\.data_dep\\\.\[0-9_\]+, 0\\\);.*Tail-0:.*\\\.data_dep\\\.\[0-9_\]+ = >> > UNIQUE \\\(OACC_TAIL_MARK, \\\.data_dep\\\.\[0-9_\]+, >> > 1\\\);.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_JOIN, >> > \\\.data_dep\\\.\[0-9_\]+, 0\\\);.*Loop >> > 6\\\(4\\\).*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_HEAD_MARK, 0, 1, >> > 6\\\);.*Head-0:.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_HEAD_MARK, 0, >> > 1, 6\\\);.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_FORK, >> > \\\.data_dep\\\.\[0-9_\]+, 2\\\);.*Tail-0:.*\\\.data_dep\\\.\[0-9_\]+ = >> > UNIQUE \\\(OACC_TAIL_MARK, \\\.data_dep\\\.\[0-9_\]+, >> > 1\\\);.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_JOIN, >> > \\\.data_dep\\\.\[0-9_\]+, 2\\\);" "oaccdevlow" } } */ > > > Grüße > Thomas