On Mon, 9 Dec 2019 15:44:25 +0100 Thomas Schwinge <tho...@codesourcery.com> wrote:
> Hi Julian! > > On 2019-10-03T09:35:04-0700, Julian Brown <jul...@codesourcery.com> > wrote: > > --- a/libgomp/oacc-mem.c > > +++ b/libgomp/oacc-mem.c > > > @@ -715,48 +684,34 @@ delete_copyout (unsigned f, void *h, size_t > > s, int async, const char *libfnname) > > > if (f & FLAG_COPYOUT) > > [...] > > gomp_copy_dev2host (acc_dev, aq, h, d, s); > > } > > - gomp_remove_var (acc_dev, n); > > + gomp_remove_var_async (acc_dev, n, aq); > > Conceptually, I understand correctly that we need to use this (new) > 'gomp_remove_var_async' to make sure that we don't > 'gomp_free_device_memory' while the 'gomp_copy_dev2host' cited above > is still in process? Yep. > I'm curious why this isn't causing any problems for nvptx offloading > already, any thoughts on that? Or, is this just missing test > coverage? (Always difficult for 'async' stuff, of course.) By > chance, is this right now already causing problems with AMD GCN > offloading? (I really need to set up AMD GCN offloading testing...) In a few cases, async stuff on nvidia seems to "just work" even in cases where we wouldn't expect it to via inspection (either because the driver/hardware is doing something "magic", or because we're somehow driving async operations in such a way that they run synchronously in practice). One such case is with the "ephemeral" asynchronous host-to-device memory copy patch. The AMD side seems much more sensitive to improper async behaviour -- but I don't actually remember if I hit problems with this code in particular. > I'm citing below the changes introducing 'gomp_remove_var_async', > modelled similar to the existing 'gomp_unmap_vars_async'. > > > Also for both these, do I understand correctly, that it's actually not > the 'gomp_unref_tgt' that needs to be "delayed" via > 'goacc_asyncqueue', but rather really only the > 'gomp_free_device_memory', called via 'gomp_unmap_tgt', called via > 'gomp_unref_tgt'? In other words: why do we need to keep the 'struct > target_mem_desc' alive? Per my understanding, that one is one > component of the mapping table, and not relevant anymore (thus can be > 'free'd) as soon as it has been determined that 'tgt->refcount == > 0'? Am I missing something there? IIRC, that was Chung-Lin's choice. I'll CC him in. I think delaying freeing of the target_mem_desc isn't really a huge problem, in practice. > It will be OK to clean that up later, but I'd like to understand this > now. Well, or, stating that you just blindly copied that from the > existing 'gomp_unmap_vars_async' is fine, too! ;-P Some changes arose via the porting to AMD GCN, and some may have been drive-by fixes (e.g. where a synchronous call was used in a context where it is obvious that an asynchronous call is really needed). Like you mentioned, test coverage could probably be better, and writing reliable tests for async behaviour is challenging. Julian