Hi Julian! On 2020-05-07T17:11:09+0100, Julian Brown <jul...@codesourcery.com> wrote: > Sorry about the delay replying to this email!
No worries, I had other things to do, too. ;-) > On Thu, 30 Jan 2020 16:21:20 +0100 > Thomas Schwinge <tho...@codesourcery.com> wrote: >> Notwithstanding the open question about how to implement this >> checking in libgomp in a non-intrusive (performance-wise) yet >> maintainable (avoid '#if 0') way, I have two more questions. >> Is there a specific reason why this checking isn't also enabled for >> libgomp OpenMP 'target' entry points? > > Just that it was developed in the context of adding manual deep-copy > support to OpenACC -- OpenMP wasn't my focus at that point. So, I > didn't try adding checking for OpenMP also. It might be interesting to > see how that goes though, particularly with regards to dynamic data > lifetimes in OpenMP. ACK. >> Can you please explain (textually?) how this checking (design per your >> textual description below) is working in context of mixed OpenACC >> structured ("S") and dynamic ("D") reference counts? For example: >> >> // S: 0, D: 0 >> >> #pragma acc enter data copyin ([data]) // copyin; S: 0, D: 1 >> >> acc_copyin ([data]) // no-op; S: 0, D: 2 > > Unfortunately it's not quite that simple. Does "not quite that simple" apply to (a) your reference count consistency checking specifically, or to (b) libgomp implementation peculiarities, or to (c) OpenACC reference counting semantics? The latter (c) certainly are meant to be that simple (see OpenACC 3.0, 2.6.7. "Reference Counters", etc.), and these are what my example illustrated. Remember the conceptually simple implementation that we had before your commit 378da98fcc907d05002bcd3d6ff7951f0cf485e5 "OpenACC reference count overhaul", which to the best of my knowledge would be explained as follows: - The OpenACC structured reference count corresponds to libgomp 'key->refcount - key->dynamic_refcount'. - The OpenACC dynamic reference count corresponds to libgomp 'key->dynamic_refcount'. - Thus, libgomp 'key->refcount' corresponds to the sum of OpenACC structured and dynamic reference counts. ..., and this seemed to have worked fine? (... aside from a few specific bugs that we fixed.) Doing it like this meant that 'libgomp/target.c' didn't have to care about the OpenACC-specific 'key->dynamic_refcount' at all. Of course, we could instead have implemented it as follows: - The OpenACC structured reference count corresponds to libgomp 'key->refcount'. - The OpenACC dynamic reference count corresponds to libgomp 'key->dynamic_refcount'. ..., which would've make some things simpler in 'libgomp/oacc-mem.c', but 'libgomp/target.c' then would've had to care about the OpenACC-specific 'key->dynamic_refcount'. Now, explicitly asking the other way round: with your "overhaul", have we now made the libgomp implementation different (and more complicated) just to make it amenable to your reference count consistency checking (a), or are there any actual "functional" reasons (b) that we'd not yet considered in the old scheme? I'm working through your explanation below -- it'll take me some more time, but many thanks already! Grüße Thomas > The "refcount" fields (in > either splay tree keys or target_mem_descs) do not really represent > program-level reference counts, but rather references in the linked > splay tree structure within libgomp. That's correct: the refcounts are > used so as to know when data is still live, and when it can be freed. > > Structured data mapping operations ("acc data", "acc parallel", etc.) > always create a target_mem_desc, with a list of target_var_descs that > describe data mapped in that structured block. That target_mem_desc > either "owns" a block of target memory corresponding to the structured > data block, or it doesn't. > > We might have something like this (excuse ASCII art!): > > +===================+ +=================+ > | TARGET_MEM_DESC 1 | ,--> | TARGET_VAR_DESC | > +-------------------+ | +-----------------+ > | tgt_start... | | | splay_tree_key | --> ... > +-------------------+ | +=================+ > | target_var_desc 0 | --' > | target_var_desc 1 | ---. +=================+ > | target_var_desc 2 | -. `-> | TARGET_VAR_DESC | > +===================+ | +-----------------+ > | | splay_tree_key | --> ... > | +=================+ > | > | +=================+ > `---> | TARGET_VAR_DESC | > +-----------------+ > +=================+ .----- | splay_tree_key | > | SPLAY_TREE_KEY | <-' +=================+ > +-----------------+ > | target_mem_desc | -. +===================+ > +=================+ '-----> | TARGET_MEM_DESC 2 | > +-------------------+ > | tgt_start... | > +-------------------+ > | target_var_desc | > +===================+ > > (Non-virtual/non-dynamic) reference counts correspond to the arrows > between blocks in the diagram (for the pointed-to block -- > target_mem_desc or splay tree key). > > For a structured data mapping, say "TARGET_MEM_DESC 1" is the descriptor > returned from gomp_map_vars. > > Now, "TARGET_MEM_DESC 1" and "TARGET_MEM_DESC 2" can be the same block, > or different blocks. (Each of the TARGET_MEM_DESCs linked from splay > tree keys, linked from TARGET_VAR_DESCs, can be a mix of such > identical or different blocks for each of the splay tree keys linked > from TARGET_VAR_DESCs.) In the case where they're different blocks, and > TARGET_MEM_DESC 2 (etc.) owns its own mapped memory, TARGET_MEM_DESC 1 > may have a NULL tgt_start -- thus, not own a target data block itself. > > In the case of a dynamic mapping, this subtlety is especially > important. A target_mem_desc being returned from > gomp_map_vars{_internal} with a refcount of zero -- one which no splay > tree keys link back to, because it does not own its own block of target > memory -- is discarded before the function returns. > > So, the first time a dynamic data mapping takes place for DATA, we have: > >> // S: 0, D: 0 >> >> #pragma acc enter data copyin ([data]) // copyin; S: 1, D: 0 > > This is because the target_mem_desc created to describe on-target > memory for DATA will "own" that data: nothing has referred to it > beforehand. So there's a "real" link from the splay tree key for DATA's > host region to the target_mem_desc we just created. (Yes, the > splay tree key's reference counts look just like a structured data > mapping. That was a subject for another patch.) > >> acc_copyin ([data]) // no-op; S: 2, D: 1 > > So now we have another dynamic mapping. This time, we already have a > target_mem_desc describing DATA on the target. The > gomp_map_vars_internal function will return NULL -- but before it does > that, it realises that it will "lose" references in doing so. Those are > the ones linked via the discarded target_mem_desc's variable list to > splay tree keys that are referred to in the dynamic mapping operation. > > For OpenACC, that's where the "virtual" refcount comes in -- to keep > track of those "lost" dynamic references. In particular, the "virtual" > refcount is the count by which the structured reference count must be > decremented when we hit an OpenACC "finalize" operation. Without that > (cf. OpenMP), we probably wouldn't need it. > >> #pragma acc data copyout ([data]) // no-op; S: 1, D: 2 >> { >> acc_create ([data]) // no-op; S: 1, D: 3 >> >> #pragma acc data create ([data]) // no-op; S: 2, D: 3 >> { >> #pragma acc parallel copyout ([data]) // no-op; S: 3, D: 3 >> { >> } // no-op; S: 2, D: 3 >> >> acc_delete_finalize ([data]) // no-op; S: 2, D: 0 >> >> acc_create ([data]) // no-op; S: 2, D: 1 >> } // no-op; S: 1, D: 1 >> >> #pragma acc exit data delete ([data]) // no-op; S: 1, D: 0 >> } // copyout; S: 0, D: 0 >> >> assert (!acc_is_present ([data])); >> >> (Haven't compiled but I'm reasonably sure that the nesting and my >> manual "[action]; [S], [D]" annotations are correct. But please >> verify, if course.) > > I'm sure to make a mistake if I try to work through the rest of the > reference counts :-). > > Let me know if that helps. > > Thanks, > > Julian > >> On 2018-11-30T03:50:24-0800, Julian Brown <jul...@codesourcery.com> >> wrote: >> > The model used for checking is as follows. >> > >> > 1. Each splay tree key that references a target memory descriptor >> > increases that descriptor's refcount by 1. >> > >> > 2. Each variable listed in a target memory descriptor that links >> > back to a splay tree key increases that key's refcount by 1. Each >> > target memory descriptor's variable list is counted only once, even >> > if multiple splay tree keys point to it (via their "tgt" field). >> > >> > 3. Additional ("real") target memory descriptors may be present >> > representing data mapped through "acc data" or "acc >> > parallel/kernels" blocks. These descriptors have their refcount >> > bumped, and the variables linked through such blocks have their >> > refcounts bumped also (again, with "once only" semantics). >> > >> > 4. Asynchronous operations "artificially" bump the reference >> > counts for referenced target memory descriptors (but *not* for >> > linked variables/splay tree keys), in order to delay freeing mapped >> > device memory until the asynchronous operation has completed. We >> > model this, for checking purposes only, using an off-side linked >> > list. >> > >> > 5. "Virtual" reference counts ("virtual_refcount") cannot be >> > checked purely statically, so we add the incoming value to each >> > key's statically-determined reference count ("refcount_chk"), and >> > make sure that the total matches the incoming reference count >> > ("refcount"). ----------------- Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter