Re: [RFC] Offloading Support in libgomp

Jakub Jelinek Fri, 14 Feb 2014 07:43:52 -0800

On Fri, Feb 14, 2014 at 07:24:16PM +0400, Ilya Verbin wrote:
> 2014-01-31 22:03 GMT+04:00 Jakub Jelinek <ja...@redhat.com>:
> > Implicit map(tofrom: a) on #pragma omp target is what the standard
> > requires, so I don't see a bug on the compiler side.
> >         Jakub
> 
> There is an exception in the standard (page 177, lines 17-21):
> 
> > If a corresponding list item of the original list item is in the enclosing 
> > device data
> > environment, the new device data environment uses the corresponding list 
> > item from the
> > enclosing device data environment. No additional storage is allocated in 
> > the new device
> > data environment and neither initialization nor assignment is performed, 
> > regardless of
> > the map-type that is specified.
> 
> So, the pointer 'a' should inherit map-type ALLOC from the enclosing
> device data environment.


The standard itself is very unclear.  I'll cite my omp-lang mail from
September:

> Ok, I'm for now implementing this refcounted model.                           
>                                                                    
> One still unclear thing is what is supposed to happen if multiple host 
> threads                                                                   
> enter a target data construct mapping at least one same object with different 
>                                                                    
> map kind.                                                                     
>                                                                    
> Say thread A enters #pragma omp target data map(tofrom:p[:64]), then          
>                                                                    
> thread B enters #pragma omp target data map(alloc:p[:64]) while thread A is   
>                                                                    
> still running the body of it's target data (so, the mapping just increments   
>                                                                    
> refcount of the p[:64] array section), then thread A leaves the target data   
>                                                                    
> construct, decrements p[:64] refcount, but as it is non-zero, doesn't         
>                                                                    
> deallocate it, and finally thread B enters end of its target data construct 
> and                                                                  
> unmaps p[:64].  The question is, when (if ever) is the array section supposed 
>                                                                    
> to be copied back to host?  Shall it be done at the end of thread's A target  
>                                                                    
> data section, or at the end of thread's B target data section (i.e. propagate 
>                                                                    
> the flag, has at least one of the mapping's requested copy from the device to 
>                                                                    
> host at the end of it's lifetime), or not copied at all?                      
>                                                                    
> What if thread B doesn't request the whole array section, but only a portion  
>                                                                    
> thereof map(alloc:p[:32]) ?  Would it copy the whole p[:64] array section     
>                                                                    
> back, or just a portion of it?  Though, admittedly, this latter case of a 
> subset                                                                 
> might be harder to construct valid non-racy testcase for, one needs to make   
>                                                                    
> sure one of the target data constructs is always entered before the other;    
>                                                                    
> though perhaps with #pragma omp atomic and spinning it might be doable.       
>                                                                    

and will just paraphrase the Sep 9th answer I got for that, because not sure 
I'm allowed
to repost it.  The answer was that on entry the standard is pretty clear what
happens, the first encountering thread/data construct allocates and optionally 
copies
based on the flags, all others when it is already mapped do nothing.  On exit, 
the
standard is silent and none of the solutions are right, the committee will 
discuss
it further.

So, for now the implementation choice was to or in the copy from device bit.

Now, you could argue this case is different, because it is not different 
threads,
but the same thread, just nested construct on the same thread.  But how to
reliably differentiate that?  Even if you stored some thread identification
into the tree along with each mapping (what thread mapped this in), what if some
other thread also does the same (outer #pragma omp target data, inner
#pragma omp target, where the outer one does just array section mapping and
inner tofrom mapping on the pointer), then we'd still copy back.

So, perhaps we should just stop for now oring the copyfrom in and just use
the copyfrom from the very first mapping only, and wait for what the committee
actually agrees on.

Richard, your thoughts on this?

        Jakub

Re: [RFC] Offloading Support in libgomp

Reply via email to