On Mon, 9 Dec 2019 15:44:25 +0100
Thomas Schwinge <tho...@codesourcery.com> wrote:

> Hi Julian!
> 
> On 2019-10-03T09:35:04-0700, Julian Brown <jul...@codesourcery.com>
> wrote:
> > --- a/libgomp/oacc-mem.c
> > +++ b/libgomp/oacc-mem.c  
> 
> > @@ -715,48 +684,34 @@ delete_copyout (unsigned f, void *h, size_t
> > s, int async, const char *libfnname)  
> 
> >        if (f & FLAG_COPYOUT)
> > [...]
> >       gomp_copy_dev2host (acc_dev, aq, h, d, s);
> >     }
> > -      gomp_remove_var (acc_dev, n);
> > +      gomp_remove_var_async (acc_dev, n, aq);  
> 
> Conceptually, I understand correctly that we need to use this (new)
> 'gomp_remove_var_async' to make sure that we don't
> 'gomp_free_device_memory' while the 'gomp_copy_dev2host' cited above
> is still in process?

Yep.

> I'm curious why this isn't causing any problems for nvptx offloading
> already, any thoughts on that?  Or, is this just missing test
> coverage? (Always difficult for 'async' stuff, of course.)  By
> chance, is this right now already causing problems with AMD GCN
> offloading?  (I really need to set up AMD GCN offloading testing...)

In a few cases, async stuff on nvidia seems to "just work" even in
cases where we wouldn't expect it to via inspection (either because the
driver/hardware is doing something "magic", or because we're
somehow driving async operations in such a way that they run
synchronously in practice). One such case is with the "ephemeral"
asynchronous host-to-device memory copy patch.

The AMD side seems much more sensitive to improper async behaviour --
but I don't actually remember if I hit problems with this code in
particular.

> I'm citing below the changes introducing 'gomp_remove_var_async',
> modelled similar to the existing 'gomp_unmap_vars_async'.
> 
> 
> Also for both these, do I understand correctly, that it's actually not
> the 'gomp_unref_tgt' that needs to be "delayed" via
> 'goacc_asyncqueue', but rather really only the
> 'gomp_free_device_memory', called via 'gomp_unmap_tgt', called via
> 'gomp_unref_tgt'?  In other words: why do we need to keep the 'struct
> target_mem_desc' alive?  Per my understanding, that one is one
> component of the mapping table, and not relevant anymore (thus can be
> 'free'd) as soon as it has been determined that 'tgt->refcount ==
> 0'?  Am I missing something there?

IIRC, that was Chung-Lin's choice. I'll CC him in. I think delaying
freeing of the target_mem_desc isn't really a huge problem, in practice.

> It will be OK to clean that up later, but I'd like to understand this
> now.  Well, or, stating that you just blindly copied that from the
> existing 'gomp_unmap_vars_async' is fine, too!  ;-P

Some changes arose via the porting to AMD GCN, and some may have been
drive-by fixes (e.g. where a synchronous call was used in a context
where it is obvious that an asynchronous call is really needed). Like
you mentioned, test coverage could probably be better, and writing
reliable tests for async behaviour is challenging.

Julian

Reply via email to