Hi!
On Mon, 17 Oct 2016 15:38:50 +0200, I wrote:
> On Mon, 17 Oct 2016 14:08:44 +0200, Richard Biener
> <[email protected]> wrote:
> > On Mon, Oct 17, 2016 at 1:47 PM, Thomas Schwinge
> > <[email protected]> wrote:
> > > On Mon, 17 Oct 2016 13:22:17 +0200, Richard Biener
> > > <[email protected]> wrote:
> > >> On Mon, Oct 17, 2016 at 11:38 AM, Thomas Schwinge
> > >> <[email protected]> wrote:
> > >> > On Fri, 14 Oct 2016 13:06:59 +0200, Richard Biener
> > >> > <[email protected]> wrote:
> > >> >> On Fri, Oct 14, 2016 at 1:00 PM, Nathan Sidwell <[email protected]>
> > >> >> wrote:
> > >> >> > On 10/14/16 05:28, Richard Biener wrote:
> > >> >> >
> > >> >> >> The BB_VISITED flag has indetermined state at the beginning of a
> > >> >> >> pass.
> > >> >> >> You have to ensure it is cleared yourself.
> > >> >> >
> > >> >> >
> > >> >> > In that case the openacc (&nvptx?) passes should be modified to
> > >> >> > clear the
> > >> >> > flags at their start, rather than at their end.
This already is a "conceptual acknowledgement" of my patch, so...
> > >> > OK to commit the following? Is such a test case appropriate (which
> > >> > would
> > >> > have caught this issue right away), in particular the dg-final
> > >> > scan-tree-dump line?
> > >>
> > >> Ugh. Not worse to what we do in various dwarf scanning I guess.
> > >
> > > ;-|
> > >
> > >> Doesn't failure lead to a miscompile eventually? So you could formulate
> > >> this as a dg-do run test with a check for the desired outcome?
> > >
> > > No, unfortunately. In this case the error is "benign" such that the
> > > OpenACC loop processing machinery will decide to not parallelize loops
> > > that ought to be parallelized.
> >
> > So you can scan for "loop parallelized" instead?
>
> The dump would still contain the outer loop's "Loop 0(0)" marker, so I'd
> have to scan for "Head"/"Tail"/"UNIQUE" or similar instead -- but that
> seems likewise fragile (for false negatives), and less useful than
> scanning for the complete pattern.
>
> > I fear your pattern
> > is quite fragile
> > to maintain over time.
>
> Agreed -- but then, that's intentional: my idea for this new test case
> has been to have it actually verify the expected OpenACC loop processing,
> so it's clear that this pattern will need to be adjusted if changing the
> OpenACC loop processing.
>
> > > This won't generally cause any problem
> > > (apart from performance regression, obviously); it just caused problems
> > > in a few libgomp test cases that actually at run time test for
> > > parallelized execution -- which will/did trigger only with nvptx
> > > offloading enabled, which not too many people are testing. The test case
> > > I propose below will trigger also for non-offloading configurations.
>
> On IRC, Segher suggested to 'use {} instead of "" to avoid [all those
> backslashes]' -- thanks, done.
If you don't like the test case as-is (do we need multi-line tree dump
scanning, just like we recently got for compiler diagnostics?), can I at
least commit the OpenACC loops processing fix? Here is the latest
version, simplified after your r241296 IRA vs. BB_VISITED fixes:
commit 766cf9959b15a17e17e89a50e905b4c546893823
Author: Thomas Schwinge <[email protected]>
Date: Mon Oct 17 15:33:09 2016 +0200
Clear basic block flags before using BB_VISITED for OpenACC loops processing
gcc/
* omp-low.c (oacc_loop_discovery): Call clear_bb_flags before, and
don't clear BB_VISITED after processing.
gcc/testsuite/
* gcc.dg/goacc/loop-processing-1.c: New file.
---
gcc/omp-low.c | 8 +++-----
gcc/testsuite/gcc.dg/goacc/loop-processing-1.c | 18 ++++++++++++++++++
2 files changed, 21 insertions(+), 5 deletions(-)
diff --git gcc/omp-low.c gcc/omp-low.c
index 77f89d5..3ef796f 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -19236,7 +19236,9 @@ oacc_loop_sibling_nreverse (oacc_loop *loop)
static oacc_loop *
oacc_loop_discovery ()
{
- basic_block bb;
+ /* Clear basic block flags, in particular BB_VISITED which we're going to use
+ in the following. */
+ clear_bb_flags ();
oacc_loop *top = new_oacc_loop_outer (current_function_decl);
oacc_loop_discover_walk (top, ENTRY_BLOCK_PTR_FOR_FN (cfun));
@@ -19245,10 +19247,6 @@ oacc_loop_discovery ()
that diagnostics come out in an unsurprising order. */
top = oacc_loop_sibling_nreverse (top);
- /* Reset the visited flags. */
- FOR_ALL_BB_FN (bb, cfun)
- bb->flags &= ~BB_VISITED;
-
return top;
}
diff --git gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
new file mode 100644
index 0000000..619576a
--- /dev/null
+++ gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
@@ -0,0 +1,18 @@
+/* Make sure that OpenACC loop processing happens. */
+/* { dg-additional-options "-O2 -fdump-tree-oaccdevlow" } */
+
+extern int place ();
+
+void vector_1 (int *ary, int size)
+{
+#pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size])
firstprivate (size)
+ {
+#pragma acc loop gang
+ for (int jx = 0; jx < 1; jx++)
+#pragma acc loop auto
+ for (int ix = 0; ix < size; ix++)
+ ary[ix] = place ();
+ }
+}
+
+/* { dg-final { scan-tree-dump {OpenACC loops.*Loop 0\(0\).*Loop
14\(1\).*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_HEAD_MARK, 0, 1,
20\);.*Head-0:.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_HEAD_MARK, 0, 1,
20\);.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+,
0\);.*Tail-0:.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_TAIL_MARK,
\.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_JOIN,
\.data_dep\.[0-9_]+, 0\);.*Loop 6\(4\).*\.data_dep\.[0-9_]+ = UNIQUE
\(OACC_HEAD_MARK, 0, 1, 6\);.*Head-0:.*\.data_dep\.[0-9_]+ = UNIQUE
\(OACC_HEAD_MARK, 0, 1, 6\);.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_FORK,
\.data_dep\.[0-9_]+, 2\);.*Tail-0:.*\.data_dep\.[0-9_]+ = UNIQUE
\(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = UNIQUE
\(OACC_JOIN, \.data_dep\.[0-9_]+, 2\);} "oaccdevlow" } } */
Grüße
Thomas