https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93030
Bug ID: 93030 Summary: [OpenACC] libgomp.oacc-c-c++-common/deep-copy-10.c FAILS on AMDGCN – invalid 'async' usage? Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: burnus at gcc dot gnu.org CC: ams at gcc dot gnu.org, jakub at gcc dot gnu.org, jules at gcc dot gnu.org, tschwinge at gcc dot gnu.org Target Milestone: --- This test case was added as part of the deep-copy/attach/detach support for OpenACC 1. 6(r279620 to r279631), https://gcc.gnu.org/ml/gcc-patches/2019-12/threads.html#01247 Running libgomp.oacc-c-c++-common/deep-copy-10.c works on nvptx. However, it fails with amdgcn with: Memory access fault by GPU node-2 (Agent handle: 0x6bfe20) on address 0x685000. Reason: Page not present or supervisor privilege. It works if one disables the 'async' clauses. I vaguely recall that nvptx does no real async while amdgcn does. In any case, I am not completely sure how the attaching/detaching works if one runs this concurrently: for (int i = 0; i < 99; i++) { int j; #pragma acc parallel loop copy(m.a[0:N]) async(i % 2) for (j = 0; j < N; j++) m.a[j]++; Is this guaranteed to work? (It might if the increment is atomic and the copyin/out works with locking and ref counts. But I can see plenty of ways this can go wrong.)