https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85445
--- Comment #1 from Tom de Vries <vries at gcc dot gnu.org> --- (In reply to Tom de Vries from comment #0) > og7 test-case ref-1.C fails in execution when run with trunk: At -O2, due to the call to routine Worker. The Vector routine is inlined, due to missing noinline,noclone attributes. > #pragma acc routine worker > void Worker (int *ptr, int m, int n, const int &inc) > { > #pragma acc loop worker > for (unsigned ix = 0; ix < m; ix++) > Vector(ptr + ix * n, n, inc); > } > > int main () > { > #pragma acc parallel copy(ary) > { > Worker (&ary[0][0], m, n, 1<<16); > } The inc parameter is a reference parameter, so the argument 1<<16 (65536) is saved on stack: ... mov.u32 %r25, 65536; st.u32 [%frame], %r25; ... and the address is passed as argument: ... .param.u64 %out_arg4; st.param.u64 [%out_arg4], %frame; call _Z6WorkerPiiiRKi, (%out_arg1, %out_arg2, %out_arg3, %out_arg4); ... The stack is declared with .local: ... .local .align 16 .b8 %frame_ar[16]; .reg.u64 %frame; cvta.local.u64 %frame, %frame_ar; ... which means: ... Local memory, private to each thread. ... The initialization of the stack is done in thread W0V0, but the stack is read in WAVA mode, so it's reading uninitialized stack memory in all but the W0V0 thread.