https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70534
Bug ID: 70534 Summary: openacc parallel reductions aren't neutered Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: cesar at gcc dot gnu.org Reporter: cesar at gcc dot gnu.org CC: tschwinge at gcc dot gnu.org Target Milestone: --- Created attachment 38182 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38182&action=edit parallel reduction The attached test case demonstrates a race condition on the code generated for acc parallel reduction finalizers. According the OpenACC 2.0a spec, all of the gang-private copies of the reduction variables are supposed to be combined with the reduction operator with the original reduction variable at the end of the parallel region. What's happen now is, there's no code to neuter the worker and vector threads, so instead of preforming the reduction operation num_gangs times, it potentially happens num_gangs * num_workers * vector_length times. A solution to this problem would be to teach lower_oacc_reductions how to neuter worker and vector threads for parallel reductions. For clarification, this problem does not impact acc loop reductions inside acc parallel regions. It only affects reductions on acc parallel constructs.