Loops lacking exit edges can trigger an NVIDIA driver sm_50 code generation bug, which manifested as stack pointer (SASS register R1) corruption in this case. Adjusting source by hand to arrange a cheap exit branch seems to be the most reasonable workaround. NVIDIA bug ID 200177879.
* config/nvptx/team.c (gomp_thread_start): Work around NVIDIA driver bug by adding an exit edge to the loop, --- libgomp/ChangeLog.gomp-nvptx | 5 +++++ libgomp/config/nvptx/team.c | 6 +++++- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/libgomp/config/nvptx/team.c b/libgomp/config/nvptx/team.c index 933f5a0..0291539 100644 --- a/libgomp/config/nvptx/team.c +++ b/libgomp/config/nvptx/team.c @@ -84,7 +84,7 @@ gomp_thread_start (struct gomp_thread_pool *pool) gomp_sem_init (&thr->release, 0); thr->thread_pool = pool; - for (;;) + do { gomp_simple_barrier_wait (&pool->threads_dock); if (!thr->fn) @@ -96,6 +96,10 @@ gomp_thread_start (struct gomp_thread_pool *pool) gomp_team_barrier_wait_final (&thr->ts.team->barrier); gomp_finish_task (task); } + /* Work around an NVIDIA driver bug: when generating sm_50 machine code, + it can trash stack pointer R1 in loops lacking exit edges. Add a cheap + artificial exit that the driver would not be able to optimize out. */ + while (nvptx_thrs); } /* Launch a team. */