https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88981
Bug ID: 88981
Summary: [nvptx, openacc, libgomp] How to handle async regions
without corresponding wait
Product: gcc
Version: 9.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: vries at gcc dot gnu.org
Target Milestone: ---
Consider this test-case:
...
/* { dg-do run } */
#include <stdlib.h>
int
main (void)
{
int a[128];
int N = 128;
#pragma acc parallel async
{
#pragma loop seq
for (int i = 0; i < 1024 * 1024 * 10; ++i)
a[i % N] += a[N - (i % N) - 1];
}
/* no #pragma acc wait */
return 0;
}
...
Atm the moment, we run into PR88941:
...
async-no-wait.exe: libgomp/plugin/plugin-nvptx.c: map_fini: \
Assertion `!s->map->active' failed.
...
Now, consider this patch:
...
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index dd2bcf3083f..e9b0e6c660a 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -489,6 +489,14 @@ fini_streams_for_device (struct ptx_device *ptx_dev)
struct ptx_stream *s = ptx_dev->active_streams;
ptx_dev->active_streams = ptx_dev->active_streams->next;
+ {
+ CUresult r;
+ r = CUDA_CALL_NOCHECK (cuStreamQuery, s->stream);
+ if (r == CUDA_ERROR_NOT_READY)
+ GOMP_PLUGIN_error ("Stream destroyed with operation incomplete."
+ " Forgot to wait on async?");
+ }
+
ret &= map_fini (s);
CUresult r = CUDA_CALL_NOCHECK (cuStreamDestroy, s->stream);
...
which gets us:
...
libgomp: Stream destroyed with operation incomplete. Forgot to wait on async?
async-no-wait.exe: libgomp/plugin/plugin-nvptx.c: map_fini: \
Assertion `!s->map->active' failed.
...
So, the question is, how to handle async launches without corresponding wait?
It might be good to notify the user about it, as above patch does (though
perhaps not notify using GOMP_PLUGIN_error, but GOMP_PLUGIN_warning or some
such).
In the case that we call acc_shutdown, it's considered an error if a stream is
still running, so we could not just notify, but error out.