Hi Chung-Lin! On Tue, 25 Sep 2018 21:09:49 +0800, Chung-Lin Tang <chunglin_t...@mentor.com> wrote: > This patch is a re-organization of OpenACC asynchronous queues.
Thanks! > The previous style of implementation > was essentially re-defining the entire async API inside the plugin-interface, > and relaying all such > API calls to the target plugin, which is awkward in design; it requires > (each) target plugin to > essentially re-implement large portions of the async functionality to support > OpenACC, and the > way it uses a state-setting style to "select/de-select" asynchronous queues > for operations litters > a lot of code paths. > > The new design proposed here in this patch declares a "struct > goacc_asyncqueue*" opaque type in libgomp.h, > and re-defines the plugin interface to a few operations (e.g. > construct/destruct/test/synchronize/etc.) > on this async-queue type, all details are target-dependent inside the > specific plugin/plugin-<target>.c file. Conceptually, ACK. > Also included in this patch is the code for the acc_get/set_default_async API > functions in OpenACC 2.5. > It's a minor part of this patch, but since some code was merge together, I'm > submitting it together here. As I requested, I'm reviewing those changes separately, and have backed out those changes in my working copy. > Testing has been done with offloading enabled. The results are mostly okay, > but with a few issues > with either yet incomplete submission of our testsuite adjustment patches, or > other independent problems. We'll need to understand these. > Seeking permission to commit this to trunk first. A few things will need to be clarified. For example, for the simple program: int main(void) { #pragma acc parallel async(1) ; #pragma acc wait return 0; } ..., I'm seeing memory corruption, which (oaccasionally...) shows up as an abort due to "free" complaining, but also reproduces more reliably with "valgrind". It also reproduces on openacc-gcc-8-branch: $ valgrind ./a.out [...] ==26392== Invalid read of size 8 ==26392== at 0x4E653B0: goacc_async_unmap_tgt (oacc-async.c:368) ==26392== by 0x5C90901: cuda_callback_wrapper (plugin-nvptx.c:1648) ==26392== by 0x6066B8D: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.390.77) ==26392== by 0x607A10F: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.390.77) ==26392== by 0x50816DA: start_thread (pthread_create.c:463) ==26392== by 0x53BA88E: clone (clone.S:95) ==26392== Address 0x8d19f50 is 0 bytes inside a block of size 64 free'd ==26392== at 0x4C30D3B: free (vg_replace_malloc.c:530) ==26392== by 0x4E65BEE: goacc_async_copyout_unmap_vars (oacc-async.c:383) ==26392== by 0x4E607C9: GOACC_parallel_keyed_internal (oacc-parallel.c:403) ==26392== by 0x4E60EAA: GOACC_parallel_keyed_v2 (oacc-parallel.c:439) ==26392== by 0x40094F: ??? (in [...]/a.out) ==26392== by 0x52BAB96: (below main) (libc-start.c:310) ==26392== Block was alloc'd at ==26392== at 0x4C2FB0F: malloc (vg_replace_malloc.c:299) ==26392== by 0x4E47538: gomp_malloc (alloc.c:37) ==26392== by 0x4E5AEEB: gomp_map_vars_async (target.c:731) ==26392== by 0x4E60C2B: GOACC_parallel_keyed_internal (oacc-parallel.c:345) ==26392== by 0x4E60EAA: GOACC_parallel_keyed_v2 (oacc-parallel.c:439) ==26392== by 0x40094F: ??? (in [...]/a.out) ==26392== by 0x52BAB96: (below main) (libc-start.c:310) [...] Per my understanding, the problem is that, called from libgomp/oacc-async.c:goacc_async_copyout_unmap_vars, libgomp/target.c:gomp_unmap_vars_async runs into: if (tgt->list_count == 0) { free (tgt); return; } ..., and then goacc_async_copyout_unmap_vars does: devicep->openacc.async.queue_callback_func (aq, goacc_async_unmap_tgt, (void *) tgt); ..., which will then call libgomp/oacc-async.c:goacc_async_unmap_tgt: static void goacc_async_unmap_tgt (void *ptr) { struct target_mem_desc *tgt = (struct target_mem_desc *) ptr; if (tgt->refcount > 1) tgt->refcount--; else gomp_unmap_tgt (tgt); } ..., where the "Invalid read of size 8" happens, and which eventually would try to "free (tgt)" again, via libgomp/target.c:gomp_unmap_tgt: attribute_hidden void gomp_unmap_tgt (struct target_mem_desc *tgt) { /* Deallocate on target the tgt->tgt_start .. tgt->tgt_end region. */ if (tgt->tgt_end) gomp_free_device_memory (tgt->device_descr, tgt->to_free); free (tgt->array); free (tgt); } Is the "free (tgt)" in libgomp/target.c:gomp_unmap_vars_async wrong, or something else? Grüße Thomas