On Fri, Feb 14, 2014 at 07:24:16PM +0400, Ilya Verbin wrote: > 2014-01-31 22:03 GMT+04:00 Jakub Jelinek <ja...@redhat.com>: > > Implicit map(tofrom: a) on #pragma omp target is what the standard > > requires, so I don't see a bug on the compiler side. > > Jakub > > There is an exception in the standard (page 177, lines 17-21): > > > If a corresponding list item of the original list item is in the enclosing > > device data > > environment, the new device data environment uses the corresponding list > > item from the > > enclosing device data environment. No additional storage is allocated in > > the new device > > data environment and neither initialization nor assignment is performed, > > regardless of > > the map-type that is specified. > > So, the pointer 'a' should inherit map-type ALLOC from the enclosing > device data environment.
The standard itself is very unclear. I'll cite my omp-lang mail from September: > Ok, I'm for now implementing this refcounted model. > > One still unclear thing is what is supposed to happen if multiple host > threads > enter a target data construct mapping at least one same object with different > > map kind. > > Say thread A enters #pragma omp target data map(tofrom:p[:64]), then > > thread B enters #pragma omp target data map(alloc:p[:64]) while thread A is > > still running the body of it's target data (so, the mapping just increments > > refcount of the p[:64] array section), then thread A leaves the target data > > construct, decrements p[:64] refcount, but as it is non-zero, doesn't > > deallocate it, and finally thread B enters end of its target data construct > and > unmaps p[:64]. The question is, when (if ever) is the array section supposed > > to be copied back to host? Shall it be done at the end of thread's A target > > data section, or at the end of thread's B target data section (i.e. propagate > > the flag, has at least one of the mapping's requested copy from the device to > > host at the end of it's lifetime), or not copied at all? > > What if thread B doesn't request the whole array section, but only a portion > > thereof map(alloc:p[:32]) ? Would it copy the whole p[:64] array section > > back, or just a portion of it? Though, admittedly, this latter case of a > subset > might be harder to construct valid non-racy testcase for, one needs to make > > sure one of the target data constructs is always entered before the other; > > though perhaps with #pragma omp atomic and spinning it might be doable. > and will just paraphrase the Sep 9th answer I got for that, because not sure I'm allowed to repost it. The answer was that on entry the standard is pretty clear what happens, the first encountering thread/data construct allocates and optionally copies based on the flags, all others when it is already mapped do nothing. On exit, the standard is silent and none of the solutions are right, the committee will discuss it further. So, for now the implementation choice was to or in the copy from device bit. Now, you could argue this case is different, because it is not different threads, but the same thread, just nested construct on the same thread. But how to reliably differentiate that? Even if you stored some thread identification into the tree along with each mapping (what thread mapped this in), what if some other thread also does the same (outer #pragma omp target data, inner #pragma omp target, where the outer one does just array section mapping and inner tofrom mapping on the pointer), then we'd still copy back. So, perhaps we should just stop for now oring the copyfrom in and just use the copyfrom from the very first mapping only, and wait for what the committee actually agrees on. Richard, your thoughts on this? Jakub