https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70895

--- Comment #2 from cesar at gcc dot gnu.org ---
Thomas is correct. Note that when gcc-6.2 is released you should be able to
replace

   !$acc parallel vector_length(vl)
   !$acc loop reduction(+:pi) private(t) 

with

   !$acc parallel loop reduction(+:pi) vector_length(vl)

and that will automatically add a copy clause for 'pi'. See PR70626 for more
details.

Furthermore, as Thomas mentioned, gcc-6 does not automatically assign
parallelism to loops inside parallel regions. Consequently, you need to
explicitly use num_gangs, num_workers and vector_length to determine the amount
of parallelism and gang, worker and vector to partition the acc loops
accordingly. Also note that only nvptx targets are accelerated in gcc-6; the
host code runs in a single thread.

Reply via email to