https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88981
Bug ID: 88981 Summary: [nvptx, openacc, libgomp] How to handle async regions without corresponding wait Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- Consider this test-case: ... /* { dg-do run } */ #include <stdlib.h> int main (void) { int a[128]; int N = 128; #pragma acc parallel async { #pragma loop seq for (int i = 0; i < 1024 * 1024 * 10; ++i) a[i % N] += a[N - (i % N) - 1]; } /* no #pragma acc wait */ return 0; } ... Atm the moment, we run into PR88941: ... async-no-wait.exe: libgomp/plugin/plugin-nvptx.c: map_fini: \ Assertion `!s->map->active' failed. ... Now, consider this patch: ... diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index dd2bcf3083f..e9b0e6c660a 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -489,6 +489,14 @@ fini_streams_for_device (struct ptx_device *ptx_dev) struct ptx_stream *s = ptx_dev->active_streams; ptx_dev->active_streams = ptx_dev->active_streams->next; + { + CUresult r; + r = CUDA_CALL_NOCHECK (cuStreamQuery, s->stream); + if (r == CUDA_ERROR_NOT_READY) + GOMP_PLUGIN_error ("Stream destroyed with operation incomplete." + " Forgot to wait on async?"); + } + ret &= map_fini (s); CUresult r = CUDA_CALL_NOCHECK (cuStreamDestroy, s->stream); ... which gets us: ... libgomp: Stream destroyed with operation incomplete. Forgot to wait on async? async-no-wait.exe: libgomp/plugin/plugin-nvptx.c: map_fini: \ Assertion `!s->map->active' failed. ... So, the question is, how to handle async launches without corresponding wait? It might be good to notify the user about it, as above patch does (though perhaps not notify using GOMP_PLUGIN_error, but GOMP_PLUGIN_warning or some such). In the case that we call acc_shutdown, it's considered an error if a stream is still running, so we could not just notify, but error out.