https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70895
--- Comment #2 from cesar at gcc dot gnu.org --- Thomas is correct. Note that when gcc-6.2 is released you should be able to replace !$acc parallel vector_length(vl) !$acc loop reduction(+:pi) private(t) with !$acc parallel loop reduction(+:pi) vector_length(vl) and that will automatically add a copy clause for 'pi'. See PR70626 for more details. Furthermore, as Thomas mentioned, gcc-6 does not automatically assign parallelism to loops inside parallel regions. Consequently, you need to explicitly use num_gangs, num_workers and vector_length to determine the amount of parallelism and gang, worker and vector to partition the acc loops accordingly. Also note that only nvptx targets are accelerated in gcc-6; the host code runs in a single thread.