Loops lacking exit edges can trigger an NVIDIA driver sm_50 code generation
bug, which manifested as stack pointer (SASS register R1) corruption in this
case. Adjusting source by hand to arrange a cheap exit branch seems to be the
most reasonable workaround.  NVIDIA bug ID 200177879.

        * config/nvptx/team.c (gomp_thread_start): Work around NVIDIA driver
        bug by adding an exit edge to the loop,
---
 libgomp/ChangeLog.gomp-nvptx | 5 +++++
 libgomp/config/nvptx/team.c  | 6 +++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/libgomp/config/nvptx/team.c b/libgomp/config/nvptx/team.c
index 933f5a0..0291539 100644
--- a/libgomp/config/nvptx/team.c
+++ b/libgomp/config/nvptx/team.c
@@ -84,7 +84,7 @@ gomp_thread_start (struct gomp_thread_pool *pool)
   gomp_sem_init (&thr->release, 0);
   thr->thread_pool = pool;
 
-  for (;;)
+  do
     {
       gomp_simple_barrier_wait (&pool->threads_dock);
       if (!thr->fn)
@@ -96,6 +96,10 @@ gomp_thread_start (struct gomp_thread_pool *pool)
       gomp_team_barrier_wait_final (&thr->ts.team->barrier);
       gomp_finish_task (task);
     }
+  /* Work around an NVIDIA driver bug: when generating sm_50 machine code,
+     it can trash stack pointer R1 in loops lacking exit edges.  Add a cheap
+     artificial exit that the driver would not be able to optimize out.  */
+  while (nvptx_thrs);
 }
 
 /* Launch a team.  */

Reply via email to