On Thu, 22 Oct 2015 19:41:51 +0300
Alexander Monakov <amona...@ispras.ru> wrote:

> On Thu, 22 Oct 2015, Jakub Jelinek wrote:
> > Does that apply also to threads within a warp?  I.e. is .local
> > local to each thread in the warp, or to the whole warp, and if the
> > former, how can say at the start of a SIMD region or at its end the
> > local vars be broadcast to other threads and collected back?  One
> > thing is scalar vars, another pointers, or references to various
> > types, or even bigger indirection.  
> 
> .local is indeed local to each warp member, not the warp as a whole.
> What OpenACC/PTX implementation does is to copy the whole stack
> frame, plus live registers: the implementation is in
> nvptx.c:nvptx_propagate.
> 
> I see two possible alternative approaches for OpenMP/PTX.

> The second approach is to run all threads in the warp all the time,
> making sure they execute the same code with the same data, and thus
> build up the same local state.  In this case we'd need to ensure this
> invariant: if threads in the warp have the same state prior to
> executing an instruction, they also have the same state after
> executing that instruction (plus global state changes as if only one
> thread executed that instruction).
> 
> Most instructions are safe w.r.t this invariant.

> Was something like this considered (and rejected?) for OpenACC?

I'm not sure we understood the "global state changes as if only one
thread executed that instruction" bit (do you have a citation?). But
anyway, even if that works for threads within a warp, it doesn't work
for warps within a CTA, so we'd still need some broadcast mechanism for
those.

Julian

Reply via email to