Hi Cesar! On Fri, 17 Jul 2015 11:13:59 -0700, Cesar Philippidis <ce...@codesourcery.com> wrote: > This patch updates the libgomp OpenACC reduction test cases to check > worker, vector and combined gang worker vector reductions. I tried to > use some macros to simplify the c test cases a bit. I probably could > have made them more generic with an additional header file/macro, but > then that makes it too confusing too debug. The fortran tests are a bit > of a lost clause, unless someone knows how to use the preprocessor with > !$acc loops.
> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-2.c > +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-2.c > +static void > +test_reductions (void) > { > - [...] > + const int n = 100; > int i; > - [...] > + float array[n]; > > for (i = 0; i < n; i++) > - [...] > + array[i] = i+1; > > - [...] > + /* Gang reductions. */ > + check_reduction_op (float, +, 0, array[i], num_gangs (ng), gang); > + check_reduction_op (float, *, 1, array[i], num_gangs (ng), gang); I see this one reproducibly FAIL in the x86_64 -m32 multilib's host-fallback testing (there is no nvptx offloading for 32-bit configurations). (The -m32 multilib is configured/enabled by default, so fixing this is a prerequisite for trunk integration.) From a very quick glance, might it be that we're overflowing the float data type with the "1 * 2 * 3 * [...] * 1000" computation? The OpenACC reduction computes "inf" which is then compared against a very high finite reference value -- or the other way round (I lost my debugging session). Instead of multiplying these "big" numbers, I guess we should just do a more idiomatic floating point computation? > --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c > +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c > /* complex reductions. */ > +static void > +test_reductions (void) > { > + double _Complex array[n]; > + > + for (i = 0; i < n; i++) > + array[i] = i+1; > + > + /* Gang reductions. */ > + check_reduction_op (double, +, 0, creal (array[i]), num_gangs (ng), gang); Given that in the check_reduction_op instantiations you're specifying a "double" data type (instead of "double _Complex", for example), and "creal (array[i])" reduction operands (instead of "array[i]", for example), we're not actually testing reductions with complex data types, so I guess that should be changed. :-) > --- /dev/null > +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction.h > @@ -0,0 +1,43 @@ > +#ifndef REDUCTION_H > +#define REDUCTION_H > + > +#define DO_PRAGMA(x) _Pragma (#x) > + > +#define check_reduction_op(type, op, init, b, gwv_par, gwv_loop) \ > + { \ > + type res, vres; \ > + res = (init); \ > +DO_PRAGMA (acc parallel gwv_par copy (res)) \ > +DO_PRAGMA (acc loop gwv_loop reduction (op:res)) \ > + for (i = 0; i < n; i++) \ > + res = res op (b); > \ > + \ > + vres = (init); \ > + for (i = 0; i < n; i++) \ > + vres = vres op (b); \ > + \ > + if (res != vres) \ > + abort (); > \ > + } It's the right thing for integer data types, but for anything floating point, we should be allowing for some small difference (epsilon) between res and vres, due to rounding differences in the OpenACC reduction (possibly offloaded) and reference value computation, and similar. > +#define check_reduction_macro(type, op, init, b, gwv_par, gwv_loop) \ > + { \ > + type res, vres; \ > + res = (init); \ > + DO_PRAGMA (acc parallel gwv_par copy(res)) > \ > +DO_PRAGMA (acc loop gwv_loop reduction (op:res)) \ > + for (i = 0; i < n; i++) \ > + res = op (res, (b)); \ > + \ > + vres = (init); \ > + for (i = 0; i < n; i++) \ > + vres = op (vres, (b)); \ > + \ > + if (res != vres) \ > + abort (); > \ > + } Likewise. > +#define max(a, b) (((a) > (b)) ? (a) : (b)) > +#define min(a, b) (((a) < (b)) ? (a) : (b)) > + > +#endif > --- a/libgomp/testsuite/libgomp.oacc-fortran/reduction-4.f90 > +++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-4.f90 > @@ -5,50 +5,108 @@ > program reduction_4 > implicit none > > - integer, parameter :: n = 10, gangs = 20 > + integer, parameter :: n = 10, ng = 8, nw = 4, vl = 32 > integer :: i > - complex :: vresult, result > + real :: vresult, rg, rw, rv, rc > complex, dimension (n) :: array Same problem as in the C test case: not actually testing complex data types: > do i = 1, n > array(i) = i > end do > > -[...] > + ! > + ! '+' reductions > + ! > + > + rg = 0 > + rw = 0 > + rv = 0 > + rc = 0 > vresult = 0 > > -[...] > + !$acc parallel num_gangs(ng) copy(rg) > + !$acc loop reduction(+:rg) gang > + do i = 1, n > + rg = rg + REAL(array(i)) > + end do > + !$acc end parallel Grüße, Thomas
signature.asc
Description: PGP signature