On 07/28/2014 10:02 AM, Thomas Schwinge wrote:
> Hi Cesar!
> 
> On Sun, 6 Jul 2014 16:10:56 -0700, Cesar Philippidis 
> <cesar_philippi...@mentor.com> wrote:
>> This patch is the first step to enabling parallel reductions in openacc.
> 
> I think I have found one issue in this code -- but please verify that my
> understanding of reductions is correct.  Namely:
> 
>> --- a/gcc/omp-low.c
>> +++ b/gcc/omp-low.c
>> +/* Helper function to finalize local data for the reduction arrays. The
>> +   reduction array needs to be reduced to the original reduction variable.
>> +   FIXME: This function assumes that there are vector_length threads in
>> +   total.  Also, it assumes that there are at least vector_length iterations
>> +   in the for loop.  */
>> +
>> +static void
>> +finalize_reduction_data (tree clauses, tree nthreads, gimple_seq *stmt_seqp,
>> +                     omp_context *ctx)
>> +{
>> +  gcc_assert (is_gimple_omp_oacc_specifically (ctx->stmt));
>> +
>> +  tree c, var, array, loop_header, loop_body, loop_exit;
>> +  gimple stmt;
>> +
>> +  /* Create for loop.
>> +
>> +     let var = the original reduction variable
>> +     let array = reduction variable array
>> +
>> +     var = array[0]
>> +     for (i = 1; i < nthreads; i++)
>> +       var op= array[i]
>> + */
> 
> This should also consider the reduction variable's original value.  Test
> case (which does the expected thing if modified for OpenMP):
> 
>     #include <stdlib.h>
>     
>     int
>     main(void)
>     {
>     #define I 5
>     #define N 11
>     #define A 8
>     
>       int a = A;
>       int s = I;
>     
>     #pragma acc parallel vector_length(N)
>       {
>         int i;
>     #pragma acc loop reduction(+:s)
>         for (i = 0; i < N; ++i)
>           s += a;
>       }
>     
>       if (s != I + N * A)
>         abort ();
>     
>       return 0;
>     }
> 
> OK to check in the following?

Reductions can be specified with both the parallel and loop constructs.
According to section 2.5.11 in the opacc spec, a reduction in a parallel
construct should behave as you described:

        At the end of the region, the values for each gang are combined
        using the reduction operator, and the result combined with the
        value of the original variable and stored in the original
        variable.

However,in section 2.7.11, a reduction in a loop construct behaves as
follows:

        At the end of the loop, the values for each thread are combined
        using the specified reduction operator, and the result stored
        in the original variable at the end of the parallel or kernels  
        region.

The parallel reduction behavior does make more sense though. I'll ask
the openacc gurus if there's a typo in section 2.7.11. It does refer to
parallel reduction.

Thanks,
Cesar

Reply via email to