Hi Chung-Lin!

On Tue, 25 Sep 2018 21:09:49 +0800, Chung-Lin Tang <chunglin_t...@mentor.com> 
wrote:
> This patch is a re-organization of OpenACC asynchronous queues.

Thanks!

> The previous style of implementation
> was essentially re-defining the entire async API inside the plugin-interface, 
> and relaying all such
> API calls to the target plugin, which is awkward in design; it requires 
> (each) target plugin to
> essentially re-implement large portions of the async functionality to support 
> OpenACC, and the
> way it uses a state-setting style to "select/de-select" asynchronous queues 
> for operations litters
> a lot of code paths.
> 
> The new design proposed here in this patch declares a "struct 
> goacc_asyncqueue*" opaque type in libgomp.h,
> and re-defines the plugin interface to a few operations (e.g. 
> construct/destruct/test/synchronize/etc.)
> on this async-queue type, all details are target-dependent inside the 
> specific plugin/plugin-<target>.c file.

Conceptually, ACK.


> Also included in this patch is the code for the acc_get/set_default_async API 
> functions in OpenACC 2.5.
> It's a minor part of this patch, but since some code was merge together, I'm 
> submitting it together here.

As I requested, I'm reviewing those changes separately, and have backed
out those changes in my working copy.


> Testing has been done with offloading enabled. The results are mostly okay, 
> but with a few issues
> with either yet incomplete submission of our testsuite adjustment patches, or 
> other independent problems.

We'll need to understand these.  


> Seeking permission to commit this to trunk first.

A few things will need to be clarified.


For example, for the simple program:

    int main(void)
    {
    #pragma acc parallel async(1)
      ;
    #pragma acc wait
    
      return 0;
    }

..., I'm seeing memory corruption, which (oaccasionally...) shows up as
an abort due to "free" complaining, but also reproduces more reliably
with "valgrind".  It also reproduces on openacc-gcc-8-branch:

    $ valgrind ./a.out
    [...]
    ==26392== Invalid read of size 8
    ==26392==    at 0x4E653B0: goacc_async_unmap_tgt (oacc-async.c:368)
    ==26392==    by 0x5C90901: cuda_callback_wrapper (plugin-nvptx.c:1648)
    ==26392==    by 0x6066B8D: ??? (in 
/usr/lib/x86_64-linux-gnu/libcuda.so.390.77)
    ==26392==    by 0x607A10F: ??? (in 
/usr/lib/x86_64-linux-gnu/libcuda.so.390.77)
    ==26392==    by 0x50816DA: start_thread (pthread_create.c:463)
    ==26392==    by 0x53BA88E: clone (clone.S:95)
    ==26392==  Address 0x8d19f50 is 0 bytes inside a block of size 64 free'd
    ==26392==    at 0x4C30D3B: free (vg_replace_malloc.c:530)
    ==26392==    by 0x4E65BEE: goacc_async_copyout_unmap_vars (oacc-async.c:383)
    ==26392==    by 0x4E607C9: GOACC_parallel_keyed_internal 
(oacc-parallel.c:403)
    ==26392==    by 0x4E60EAA: GOACC_parallel_keyed_v2 (oacc-parallel.c:439)
    ==26392==    by 0x40094F: ??? (in [...]/a.out)
    ==26392==    by 0x52BAB96: (below main) (libc-start.c:310)
    ==26392==  Block was alloc'd at
    ==26392==    at 0x4C2FB0F: malloc (vg_replace_malloc.c:299)
    ==26392==    by 0x4E47538: gomp_malloc (alloc.c:37)
    ==26392==    by 0x4E5AEEB: gomp_map_vars_async (target.c:731)
    ==26392==    by 0x4E60C2B: GOACC_parallel_keyed_internal 
(oacc-parallel.c:345)
    ==26392==    by 0x4E60EAA: GOACC_parallel_keyed_v2 (oacc-parallel.c:439)
    ==26392==    by 0x40094F: ??? (in [...]/a.out)
    ==26392==    by 0x52BAB96: (below main) (libc-start.c:310)
    [...]

Per my understanding, the problem is that, called from
libgomp/oacc-async.c:goacc_async_copyout_unmap_vars,
libgomp/target.c:gomp_unmap_vars_async runs into:

      if (tgt->list_count == 0)
        {
          free (tgt);
          return;
        }

..., and then goacc_async_copyout_unmap_vars does:

      devicep->openacc.async.queue_callback_func (aq, goacc_async_unmap_tgt,
                                                  (void *) tgt);

..., which will then call libgomp/oacc-async.c:goacc_async_unmap_tgt:

    static void
    goacc_async_unmap_tgt (void *ptr)
    {
      struct target_mem_desc *tgt = (struct target_mem_desc *) ptr;
    
      if (tgt->refcount > 1)
        tgt->refcount--;
      else
        gomp_unmap_tgt (tgt);
    }

..., where the "Invalid read of size 8" happens, and which eventually
would try to "free (tgt)" again, via libgomp/target.c:gomp_unmap_tgt:

    attribute_hidden void
    gomp_unmap_tgt (struct target_mem_desc *tgt)
    {
      /* Deallocate on target the tgt->tgt_start .. tgt->tgt_end region.  */
      if (tgt->tgt_end)
        gomp_free_device_memory (tgt->device_descr, tgt->to_free);
    
      free (tgt->array);
      free (tgt);
    }

Is the "free (tgt)" in libgomp/target.c:gomp_unmap_vars_async wrong, or
something else?


Grüße
 Thomas

Reply via email to