On 07/28/2014 10:02 AM, Thomas Schwinge wrote: > Hi Cesar! > > On Sun, 6 Jul 2014 16:10:56 -0700, Cesar Philippidis > <cesar_philippi...@mentor.com> wrote: >> This patch is the first step to enabling parallel reductions in openacc. > > I think I have found one issue in this code -- but please verify that my > understanding of reductions is correct. Namely: > >> --- a/gcc/omp-low.c >> +++ b/gcc/omp-low.c >> +/* Helper function to finalize local data for the reduction arrays. The >> + reduction array needs to be reduced to the original reduction variable. >> + FIXME: This function assumes that there are vector_length threads in >> + total. Also, it assumes that there are at least vector_length iterations >> + in the for loop. */ >> + >> +static void >> +finalize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp, >> + omp_context *ctx) >> +{ >> + gcc_assert (is_gimple_omp_oacc_specifically (ctx->stmt)); >> + >> + tree c, var, array, loop_header, loop_body, loop_exit; >> + gimple stmt; >> + >> + /* Create for loop. >> + >> + let var = the original reduction variable >> + let array = reduction variable array >> + >> + var = array[0] >> + for (i = 1; i < nthreads; i++) >> + var op= array[i] >> + */ > > This should also consider the reduction variable's original value. Test > case (which does the expected thing if modified for OpenMP): > > #include <stdlib.h> > > int > main(void) > { > #define I 5 > #define N 11 > #define A 8 > > int a = A; > int s = I; > > #pragma acc parallel vector_length(N) > { > int i; > #pragma acc loop reduction(+:s) > for (i = 0; i < N; ++i) > s += a; > } > > if (s != I + N * A) > abort (); > > return 0; > } > > OK to check in the following?
Reductions can be specified with both the parallel and loop constructs. According to section 2.5.11 in the opacc spec, a reduction in a parallel construct should behave as you described: At the end of the region, the values for each gang are combined using the reduction operator, and the result combined with the value of the original variable and stored in the original variable. However,in section 2.7.11, a reduction in a loop construct behaves as follows: At the end of the loop, the values for each thread are combined using the specified reduction operator, and the result stored in the original variable at the end of the parallel or kernels region. The parallel reduction behavior does make more sense though. I'll ask the openacc gurus if there's a typo in section 2.7.11. It does refer to parallel reduction. Thanks, Cesar