https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95150
--- Comment #1 from Tobias Burnus <burnus at gcc dot gnu.org> --- * You compilation uses "-O0" – I do not know whether that's intended. * I did not see any timeout message although it did take a while to run with offloading. (See timing results below.) I wonder what causes the problem you are seeing. You could try whether setting the environment variable GOMP_DEBUG=1 shows some useful details for the launch. * The OpenACC test case is wrong as "c" has to be "copy" not "copyout" as the initial value is used (→ NaN) On the technical side, at startup, one calls: cuLaunchKernel and when that has succeeded, one calls cuCtxSynchronize and if that fails, the error message is printed with cuda_error which shows the time-out message: libgomp: cuCtxSynchronize error: the launch timed out and was terminated I added a ", sum(c)" to the print output and did some tests: On AMDGCN: == -O0 == 3.56800008 268048112. == -Ofast == 0.109999999 268698816. == -fopenmp -O0 == 193.227997 268186448. == -fopenmp -Ofast == 43.1559982 268455872. == -fopenacc -O0 == 186.399002 268531136. == -fopenacc -Ofast == 43.4970016 268206464. == -fopenmp -foffload=disable -O0 == 7.27299976 268241776. == -fopenmp -foffload=disable -Ofast == 1.49000001 268171680. On NVidia: == -O0 == 8.00599957 268253520. == -Ofast == 0.254999995 268399056. == -fopenmp -O0 == 64.2089996 268092608. == -fopenmp -Ofast == 33.6360016 268359952. == -fopenacc -O0 == 0.861999989 NaN (see note) == -fopenacc -Ofast == 0.300000012 NaN (see note) == -fopenmp -foffload=disable -O0 == 15.2220001 268511968. == -fopenmp -foffload=disable -Ofast == 3.52900004 268573568. == -fopenacc -foffload=disable -O0 == 14.5790005 268442496. == -fopenacc -foffload=disable -Ofast == 4.41099977 268511968.