https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70534

            Bug ID: 70534
           Summary: openacc parallel reductions aren't neutered
           Product: gcc
           Version: 6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: cesar at gcc dot gnu.org
          Reporter: cesar at gcc dot gnu.org
                CC: tschwinge at gcc dot gnu.org
  Target Milestone: ---

Created attachment 38182
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38182&action=edit
parallel reduction

The attached test case demonstrates a race condition on the code generated for
acc parallel reduction finalizers. According the OpenACC 2.0a spec, all of the
gang-private copies of the reduction variables are supposed to be combined with
the reduction operator with the original reduction variable at the end of the
parallel region. What's happen now is, there's no code to neuter the worker and
vector threads, so instead of preforming the reduction operation num_gangs
times, it potentially happens num_gangs * num_workers * vector_length times. A
solution to this problem would be to teach lower_oacc_reductions how to neuter
worker and vector threads for parallel reductions. 

For clarification, this problem does not impact acc loop reductions inside acc
parallel regions. It only affects reductions on acc parallel constructs.

Reply via email to