On 2016/8/16 7:06 PM, Thomas Schwinge wrote: > Hi! > > On Mon, 15 Aug 2016 19:25:48 +0800, Chung-Lin Tang <clt...@codesourcery.com> > wrote: >> per the discussion on the bugzilla PR page, reductions on OpenACC loop >> directives will automatically get a copy clause mapping on an enclosing >> parallel construct (unless bounded by a local variable or an explicit >> firstprivate clause). >> >> There is also a patch for libgomp testsuite cases. Asides from the >> fortran case which now needs explicit firstprivate clauses to work, >> other C/C++ cases have been adjusted to remove explicit copy clauses. >> (I have not exhaustively searched everywhere to eliminate them though) >> >> This has been tested using gomp-4_0-branch, which is based on GCC 6, >> which is what this PR was originally filed for. >> >> I will be committing this soon for gomp-4_0-branch, >> is this okay for gcc-6-branch and trunk as well? > > On Mon, 15 Aug 2016 15:23:14 +0200, Jakub Jelinek <ja...@redhat.com> wrote: >> The gimplify.c change is ok for trunk and 6.3 (after 6.2 is released). >> As for the testsuite, I'll leave it to Thomas/Nathan on what they prefer, >> I'd think that having explicit clauses in e.g. half of the testcases and >> implicit ones in the other half wouldn't hurt, so that both are tested >> enough. > > ACK, but from a quick scan it seems as if there's still sufficient > coverage remaining with explicit usage. > > What I'd like to see changed/added, though, is some libgomp.oacc-fortran > test coverage of the new implicit copy clauses, and a handful of C/C++ as > well as Fortran tree-scanning tests in gcc/testsuite/ -- basically, to > document the expected behavior.
I've added the kernels case assertion you mentioned below, and committed the gimplify patch to trunk, gcc-6-branch, and gomp-4_0-branch. I've also added some of those testing items you mentioned, see the attached updated testsuite patches. >> + /* For reductions clauses in OpenACC loop directives, by default create a >> + copy clause on the enclosing parallel construct for carrying back the >> + results. */ >> + if (ctx->region_type == ORT_ACC && (flags & GOVD_REDUCTION)) > > Should this be "ctx->region_type & ORT_ACC" instead of "=="? I suppose > the same thing also applies to OpenACC nested parallelism: > > #pragma acc parallel > { > [...] > #pragma acc parallel reduction([...]) > { > [...] > > ..., which we're not supporting right now, but that'll make it easier to > spot once adding such support. I'm ignoring this issue for now; this patch only deals with acc loop directives, hence "== ORT_ACC". For Cesar's code path that deals with adding copy clauses for parallel construct reductions, see the case in gimple_adjust_omp_clauses(). IMHO, to infer how we'll deal with nested parallelism right now is too long a shot. > I suppose we can also run into this for ORT_ACC_KERNELS (if not, should > mark/document that with gcc_unreachable or some such); per OpenACC 2.0a, > 2.5.2 Kernels Construct, "a scalar variable referenced in the kernels > construct that does not appear in a data clause for the construct or any > enclosing data construct will be treated as if it appeared in a copy > clause", so we should assert that this already is a GOVD_MAP. Fair enough, I've added that in the gimplify patch. > Can we also run into this for ORT_ACC_DATA and ORT_ACC_HOST_DATA, but > nothing needs to be done for these? Don't we have other mechanisms for enforcing proper nesting? data/host_data constructs are supposed to be host code. They should not be directly enclosing loop directives, as what we're concerned here. The attached are the final patches I committed. Thanks, Chung-Lin 2016-08-18 Chung-Lin Tang <clt...@codesourcery.com> PR middle-end/70895 gcc/ * gimplify.c (omp_add_variable): Adjust/add variable mapping on enclosing parallel construct for reduction variables on OpenACC loop directives. gcc/testsuite/ * gfortran.dg/goacc/loop-tree-1.f90: Add gimple scan-tree-dump test. * c-c++-common/goacc/reduction-1.c: Likewise. * c-c++-common/goacc/reduction-2.c: Likewise. * c-c++-common/goacc/reduction-3.c: Likewise. * c-c++-common/goacc/reduction-4.c: Likewise. libgomp/ * testsuite/libgomp.oacc-fortran/reduction-7.f90: Add explicit firstprivate clauses. * testsuite/libgomp.oacc-fortran/reduction-6.f90: Remove explicit copy clauses. * testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/reduction-flt.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/collapse-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/collapse-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/reduction-cplx-dbl.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/reduction-dbl.c: Likewise.
Index: gcc/gimplify.c =================================================================== --- gcc/gimplify.c (revision 239575) +++ gcc/gimplify.c (revision 239576) @@ -6010,6 +6010,45 @@ n->value |= flags; else splay_tree_insert (ctx->variables, (splay_tree_key)decl, flags); + + /* For reductions clauses in OpenACC loop directives, by default create a + copy clause on the enclosing parallel construct for carrying back the + results. */ + if (ctx->region_type == ORT_ACC && (flags & GOVD_REDUCTION)) + { + struct gimplify_omp_ctx *outer_ctx = ctx->outer_context; + while (outer_ctx) + { + n = splay_tree_lookup (outer_ctx->variables, (splay_tree_key)decl); + if (n != NULL) + { + /* Ignore local variables and explicitly declared clauses. */ + if (n->value & (GOVD_LOCAL | GOVD_EXPLICIT)) + break; + else if (outer_ctx->region_type == ORT_ACC_KERNELS) + { + /* According to the OpenACC spec, such a reduction variable + should already have a copy map on a kernels construct, + verify that here. */ + gcc_assert (!(n->value & GOVD_FIRSTPRIVATE) + && (n->value & GOVD_MAP)); + } + else if (outer_ctx->region_type == ORT_ACC_PARALLEL) + { + /* Remove firstprivate and make it a copy map. */ + n->value &= ~GOVD_FIRSTPRIVATE; + n->value |= GOVD_MAP; + } + } + else if (outer_ctx->region_type == ORT_ACC_PARALLEL) + { + splay_tree_insert (outer_ctx->variables, (splay_tree_key)decl, + GOVD_MAP | GOVD_SEEN); + break; + } + outer_ctx = outer_ctx->outer_context; + } + } } /* Notice a threadprivate variable DECL used in OMP context CTX.
Index: gcc/testsuite/c-c++-common/goacc/reduction-1.c =================================================================== --- gcc/testsuite/c-c++-common/goacc/reduction-1.c (revision 239575) +++ gcc/testsuite/c-c++-common/goacc/reduction-1.c (revision 239576) @@ -1,3 +1,4 @@ +/* { dg-additional-options "-fdump-tree-gimple" } */ /* Integer reductions. */ #define n 1000 @@ -65,3 +66,7 @@ return 0; } + +/* Check that default copy maps are generated for loop reductions. */ +/* { dg-final { scan-tree-dump-times "map\\(tofrom:result \\\[len: \[0-9\]+\\\]\\)" 7 "gimple" } } */ +/* { dg-final { scan-tree-dump-times "map\\(tofrom:lresult \\\[len: \[0-9\]+\\\]\\)" 2 "gimple" } } */ Index: gcc/testsuite/c-c++-common/goacc/reduction-3.c =================================================================== --- gcc/testsuite/c-c++-common/goacc/reduction-3.c (revision 239575) +++ gcc/testsuite/c-c++-common/goacc/reduction-3.c (revision 239576) @@ -1,3 +1,4 @@ +/* { dg-additional-options "-fdump-tree-gimple" } */ /* double reductions. */ #define n 1000 @@ -47,3 +48,7 @@ return 0; } + +/* Check that default copy maps are generated for loop reductions. */ +/* { dg-final { scan-tree-dump-times "map\\(tofrom:result \\\[len: \[0-9\]+\\\]\\)" 4 "gimple" } } */ +/* { dg-final { scan-tree-dump-times "map\\(tofrom:lresult \\\[len: \[0-9\]+\\\]\\)" 2 "gimple" } } */ Index: gcc/testsuite/c-c++-common/goacc/reduction-2.c =================================================================== --- gcc/testsuite/c-c++-common/goacc/reduction-2.c (revision 239575) +++ gcc/testsuite/c-c++-common/goacc/reduction-2.c (revision 239576) @@ -1,3 +1,4 @@ +/* { dg-additional-options "-fdump-tree-gimple" } */ /* float reductions. */ #define n 1000 @@ -47,3 +48,7 @@ return 0; } + +/* Check that default copy maps are generated for loop reductions. */ +/* { dg-final { scan-tree-dump-times "map\\(tofrom:result \\\[len: \[0-9\]+\\\]\\)" 4 "gimple" } } */ +/* { dg-final { scan-tree-dump-times "map\\(tofrom:lresult \\\[len: \[0-9\]+\\\]\\)" 2 "gimple" } } */ Index: gcc/testsuite/c-c++-common/goacc/reduction-4.c =================================================================== --- gcc/testsuite/c-c++-common/goacc/reduction-4.c (revision 239575) +++ gcc/testsuite/c-c++-common/goacc/reduction-4.c (revision 239576) @@ -1,3 +1,4 @@ +/* { dg-additional-options "-fdump-tree-gimple" } */ /* complex reductions. */ #define n 1000 @@ -35,3 +36,7 @@ return 0; } + +/* Check that default copy maps are generated for loop reductions. */ +/* { dg-final { scan-tree-dump-times "map\\(tofrom:result \\\[len: \[0-9\]+\\\]\\)" 2 "gimple" } } */ +/* { dg-final { scan-tree-dump-times "map\\(tofrom:lresult \\\[len: \[0-9\]+\\\]\\)" 2 "gimple" } } */ Index: gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90 =================================================================== --- gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90 (revision 239575) +++ gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90 (revision 239576) @@ -1,4 +1,4 @@ -! { dg-additional-options "-fdump-tree-original -std=f2008" } +! { dg-additional-options "-fdump-tree-original -fdump-tree-gimple -std=f2008" } ! test for tree-dump-original and spaces-commas @@ -44,3 +44,4 @@ ! { dg-final { scan-tree-dump-times "private\\(m\\)" 1 "original" } } ! { dg-final { scan-tree-dump-times "reduction\\(\\+:sum\\)" 1 "original" } } +! { dg-final { scan-tree-dump-times "map\\(tofrom:sum \\\[len: \[0-9\]+\\\]\\)" 1 "gimple" } }
Index: libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90 =================================================================== --- libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90 (revision 239575) +++ libgomp/testsuite/libgomp.oacc-fortran/reduction-6.f90 (revision 239576) @@ -19,7 +19,7 @@ hs1 = 0 hs2 = 0 - !$acc parallel num_gangs (1000) copy(gs1, gs2) + !$acc parallel num_gangs (1000) !$acc loop reduction(+:gs1, gs2) gang do i = 1, n gs1 = gs1 + 1 @@ -27,7 +27,7 @@ end do !$acc end parallel - !$acc parallel num_workers (4) vector_length (32) copy(ws1, ws2) + !$acc parallel num_workers (4) vector_length (32) !$acc loop reduction(+:ws1, ws2) worker do i = 1, n ws1 = ws1 + 1 @@ -35,7 +35,7 @@ end do !$acc end parallel - !$acc parallel vector_length (32) copy(vs1, vs2) + !$acc parallel vector_length (32) !$acc loop reduction(+:vs1, vs2) vector do i = 1, n vs1 = vs1 + 1 @@ -43,7 +43,7 @@ end do !$acc end parallel - !$acc parallel num_gangs(8) num_workers(4) vector_length(32) copy(cs1, cs2) + !$acc parallel num_gangs(8) num_workers(4) vector_length(32) !$acc loop reduction(+:cs1, cs2) gang worker vector do i = 1, n cs1 = cs1 + 1 @@ -74,7 +74,7 @@ red = 0 vred = 0 - !$acc parallel num_gangs(10) vector_length(32) copy(red) + !$acc parallel num_gangs(10) vector_length(32) !$acc loop reduction(+:red) gang do i = 1, n/chunksize !$acc loop reduction(+:red) vector Index: libgomp/testsuite/libgomp.oacc-fortran/reduction-7.f90 =================================================================== --- libgomp/testsuite/libgomp.oacc-fortran/reduction-7.f90 (revision 239575) +++ libgomp/testsuite/libgomp.oacc-fortran/reduction-7.f90 (revision 239576) @@ -50,7 +50,7 @@ end subroutine redsub_private -! Bogus reduction on an impliclitly firstprivate variable. The results do +! Bogus reduction on a firstprivate variable. The results do ! survive the parallel region. The goal here is to ensure that gfortran ! doesn't ICE. @@ -58,7 +58,7 @@ integer :: sum, n, arr(n) integer :: i - !$acc parallel + !$acc parallel firstprivate(sum) !$acc loop gang worker vector reduction (+:sum) do i = 1, n sum = sum + 1 @@ -72,7 +72,7 @@ integer :: sum, n, arr(n) integer :: i, j - !$acc parallel copy (arr) + !$acc parallel copy (arr) firstprivate(sum) !$acc loop gang do i = 1, n sum = i; Index: libgomp/testsuite/libgomp.oacc-c-c++-common/collapse-2.c =================================================================== --- libgomp/testsuite/libgomp.oacc-c-c++-common/collapse-2.c (revision 239575) +++ libgomp/testsuite/libgomp.oacc-c-c++-common/collapse-2.c (revision 239576) @@ -8,7 +8,7 @@ int i, j, k, l = 0, f = 0, x = 0; int m1 = 4, m2 = -5, m3 = 17; -#pragma acc parallel copy(l) +#pragma acc parallel #pragma acc loop seq collapse(3) reduction(+:l) for (i = -2; i < m1; i++) for (j = m2; j < -2; j++) Index: libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c =================================================================== --- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c (revision 239575) +++ libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c (revision 239576) @@ -10,7 +10,7 @@ int ondev = 0; int t = 0, h = 0; -#pragma acc parallel num_workers(32) vector_length(32) copy(t) copy(ondev) +#pragma acc parallel num_workers(32) vector_length(32) copy(ondev) { #pragma acc loop worker vector reduction (+:t) for (unsigned ix = 0; ix < N; ix++) Index: libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c =================================================================== --- libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c (revision 239575) +++ libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c (revision 239576) @@ -13,8 +13,7 @@ for (i = 0; i < 1024; i++) arr[i] = i; - #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \ - copy(res) + #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) { #pragma acc loop gang reduction(+:res) for (i = 0; i < 1024; i++) @@ -28,8 +27,7 @@ res = hres = 1; - #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \ - copy(res) + #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) { #pragma acc loop gang reduction(*:res) for (i = 0; i < 12; i++) @@ -53,8 +51,7 @@ for (i = 0; i < 1024; i++) arr[i] = i; - #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \ - copy(res) + #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) { #pragma acc loop gang vector reduction(+:res) for (i = 0; i < 1024; i++) @@ -78,8 +75,7 @@ for (i = 0; i < 1024; i++) arr[i] = i; - #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \ - copy(res) + #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) { #pragma acc loop gang worker reduction(+:res) for (i = 0; i < 1024; i++) @@ -103,8 +99,7 @@ for (i = 0; i < 1024; i++) arr[i] = i; - #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \ - copy(res) + #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) { #pragma acc loop gang worker vector reduction(+:res) for (i = 0; i < 1024; i++) @@ -128,8 +123,7 @@ for (i = 0; i < 32768; i++) arr[i] = i; - #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \ - copy(res) + #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) { #pragma acc loop gang reduction(+:res) for (j = 0; j < 32; j++) @@ -161,7 +155,7 @@ arr[i] = i; #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \ - copyin(arr) copy(res) + copyin(arr) { #pragma acc loop gang reduction(+:res) for (j = 0; j < 32; j++) @@ -191,8 +185,7 @@ for (i = 0; i < 32768; i++) arr[i] = i; - #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \ - copy(res, mres) + #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) { #pragma acc loop gang reduction(+:res) reduction(max:mres) for (j = 0; j < 32; j++) Index: libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c =================================================================== --- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c (revision 239575) +++ libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c (revision 239576) @@ -12,7 +12,7 @@ int ondev = 0; int t = 0, h = 0; -#pragma acc parallel vector_length(32) copy(t) copy(ondev) +#pragma acc parallel vector_length(32) copy(ondev) { #pragma acc loop vector reduction (+:t) for (unsigned ix = 0; ix < N; ix++) Index: libgomp/testsuite/libgomp.oacc-c-c++-common/collapse-4.c =================================================================== --- libgomp/testsuite/libgomp.oacc-c-c++-common/collapse-4.c (revision 239575) +++ libgomp/testsuite/libgomp.oacc-c-c++-common/collapse-4.c (revision 239576) @@ -11,7 +11,7 @@ memset (b, '\0', sizeof (b)); -#pragma acc parallel copy(b[0:3][0:3]) copy(l) +#pragma acc parallel copy(b[0:3][0:3]) { #pragma acc loop collapse(2) reduction(+:l) for (i = 0; i < 2; i++) Index: libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-flt.c =================================================================== --- libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-flt.c (revision 239575) +++ libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-flt.c (revision 239576) @@ -19,7 +19,7 @@ { Type tsum = 0, tprod = 1; -#pragma acc parallel vector_length(32) copyin(ary[0:N]) copy (tsum, tprod) +#pragma acc parallel vector_length(32) copyin(ary[0:N]) { #pragma acc loop vector reduction(+:tsum) reduction (*:tprod) for (int ix = 0; ix < N; ix++) @@ -43,7 +43,7 @@ { Type tsum = 0, tprod = 1; -#pragma acc parallel num_workers(32) copyin(ary[0:N]) copy (tsum, tprod) +#pragma acc parallel num_workers(32) copyin(ary[0:N]) { #pragma acc loop worker reduction(+:tsum) reduction (*:tprod) for (int ix = 0; ix < N; ix++) @@ -67,7 +67,7 @@ { Type tsum = 0, tprod = 1; -#pragma acc parallel num_gangs (32) copyin(ary[0:N]) copy (tsum, tprod) +#pragma acc parallel num_gangs (32) copyin(ary[0:N]) { #pragma acc loop gang reduction(+:tsum) reduction (*:tprod) for (int ix = 0; ix < N; ix++) Index: libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c =================================================================== --- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c (revision 239575) +++ libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c (revision 239576) @@ -11,7 +11,7 @@ int ondev = 0; int t = 0, h = 0; -#pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) copy(t) copy(ondev) +#pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) copy(ondev) { #pragma acc loop gang worker vector reduction(+:t) for (unsigned ix = 0; ix < N; ix++) Index: libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c =================================================================== --- libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c (revision 239575) +++ libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c (revision 239576) @@ -22,7 +22,7 @@ { Type tsum = 0, tprod = 1; -#pragma acc parallel vector_length(32) copyin(ary[0:N]) copy (tsum, tprod) +#pragma acc parallel vector_length(32) copyin(ary[0:N]) { #pragma acc loop vector reduction(+:tsum) reduction (*:tprod) for (int ix = 0; ix < N; ix++) @@ -46,7 +46,7 @@ { Type tsum = 0, tprod = 1; -#pragma acc parallel num_workers(32) copyin(ary[0:N]) copy (tsum, tprod) +#pragma acc parallel num_workers(32) copyin(ary[0:N]) { #pragma acc loop worker reduction(+:tsum) reduction (*:tprod) for (int ix = 0; ix < N; ix++) @@ -70,7 +70,7 @@ { Type tsum = 0, tprod = 1; -#pragma acc parallel num_gangs (32) copyin(ary[0:N]) copy (tsum, tprod) +#pragma acc parallel num_gangs (32) copyin(ary[0:N]) { #pragma acc loop gang reduction(+:tsum) reduction (*:tprod) for (int ix = 0; ix < N; ix++) Index: libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-dbl.c =================================================================== --- libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-dbl.c (revision 239575) +++ libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-dbl.c (revision 239576) @@ -19,7 +19,7 @@ { Type tsum = 0, tprod = 1; -#pragma acc parallel vector_length(32) copyin(ary[0:N]) copy (tsum, tprod) +#pragma acc parallel vector_length(32) copyin(ary[0:N]) { #pragma acc loop vector reduction(+:tsum) reduction (*:tprod) for (int ix = 0; ix < N; ix++) @@ -43,7 +43,7 @@ { Type tsum = 0, tprod = 1; -#pragma acc parallel num_workers(32) copyin(ary[0:N]) copy (tsum, tprod) +#pragma acc parallel num_workers(32) copyin(ary[0:N]) { #pragma acc loop worker reduction(+:tsum) reduction (*:tprod) for (int ix = 0; ix < N; ix++) @@ -67,7 +67,7 @@ { Type tsum = 0, tprod = 1; -#pragma acc parallel num_gangs (32) copyin(ary[0:N]) copy (tsum, tprod) +#pragma acc parallel num_gangs (32) copyin(ary[0:N]) { #pragma acc loop gang reduction(+:tsum) reduction (*:tprod) for (int ix = 0; ix < N; ix++) Index: libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c =================================================================== --- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c (revision 239575) +++ libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c (revision 239576) @@ -11,7 +11,7 @@ int ondev = 0; int t = 0, h = 0; -#pragma acc parallel num_gangs(32) vector_length(32) copy(t) copy(ondev) +#pragma acc parallel num_gangs(32) vector_length(32) copy(ondev) { #pragma acc loop gang reduction (+:t) for (unsigned ix = 0; ix < N; ix++) Index: libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c =================================================================== --- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c (revision 239575) +++ libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c (revision 239576) @@ -11,7 +11,7 @@ int ondev = 0; int t = 0, h = 0; -#pragma acc parallel num_workers(32) vector_length(32) copy(t) copy(ondev) +#pragma acc parallel num_workers(32) vector_length(32) copy(ondev) { #pragma acc loop worker reduction(+:t) for (unsigned ix = 0; ix < N; ix++) Index: libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-dbl.c =================================================================== --- libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-dbl.c (revision 239575) +++ libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-dbl.c (revision 239576) @@ -22,7 +22,7 @@ { Type tsum = 0, tprod = 1; -#pragma acc parallel vector_length(32) copyin(ary[0:N]) copy (tsum, tprod) +#pragma acc parallel vector_length(32) copyin(ary[0:N]) { #pragma acc loop vector reduction(+:tsum) reduction (*:tprod) for (int ix = 0; ix < N; ix++) @@ -46,7 +46,7 @@ { Type tsum = 0, tprod = 1; -#pragma acc parallel num_workers(32) copyin(ary[0:N]) copy (tsum, tprod) +#pragma acc parallel num_workers(32) copyin(ary[0:N]) { #pragma acc loop worker reduction(+:tsum) reduction (*:tprod) for (int ix = 0; ix < N; ix++) @@ -70,7 +70,7 @@ { Type tsum = 0, tprod = 1; -#pragma acc parallel num_gangs (32) copyin(ary[0:N]) copy (tsum, tprod) +#pragma acc parallel num_gangs (32) copyin(ary[0:N]) { #pragma acc loop gang reduction(+:tsum) reduction (*:tprod) for (int ix = 0; ix < N; ix++)