https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95150

--- Comment #1 from Tobias Burnus <burnus at gcc dot gnu.org> ---
* You compilation uses "-O0" – I do not know whether that's intended.

* I did not see any timeout message although it did take a while to run
  with offloading. (See timing results below.)
  I wonder what causes the problem you are seeing.

  You could try whether setting the environment variable
    GOMP_DEBUG=1
  shows some useful details for the launch.

* The OpenACC test case is wrong as "c" has to be "copy" not "copyout"
  as the initial value is used (→ NaN)

On the technical side, at startup, one calls:
  cuLaunchKernel
and when that has succeeded, one calls
  cuCtxSynchronize
and if that fails, the error message is printed with
  cuda_error
which shows the time-out message:
  libgomp: cuCtxSynchronize error: the launch timed out and was terminated


I added a ", sum(c)" to the print output and did some tests:

On AMDGCN:
== -O0 ==                                 3.56800008       268048112.    
== -Ofast ==                              0.109999999      268698816.    
== -fopenmp -O0 ==                      193.227997         268186448.    
== -fopenmp -Ofast ==                    43.1559982        268455872.    
== -fopenacc -O0 ==                     186.399002         268531136.    
== -fopenacc -Ofast ==                   43.4970016        268206464.    
== -fopenmp -foffload=disable -O0 ==      7.27299976       268241776.    
== -fopenmp -foffload=disable -Ofast ==   1.49000001       268171680.    


On NVidia:
== -O0 ==                                8.00599957        268253520.    
== -Ofast ==                             0.254999995       268399056.    
== -fopenmp -O0 ==                      64.2089996         268092608.    
== -fopenmp -Ofast ==                   33.6360016         268359952.    
== -fopenacc -O0 ==                      0.861999989             NaN (see note)
== -fopenacc -Ofast ==                   0.300000012             NaN (see note)
== -fopenmp -foffload=disable -O0 ==    15.2220001         268511968.    
== -fopenmp -foffload=disable -Ofast ==  3.52900004        268573568.    
== -fopenacc -foffload=disable -O0 ==   14.5790005         268442496.    
== -fopenacc -foffload=disable -Ofast == 4.41099977        268511968.

Reply via email to