On Thu, 22 Oct 2015, Julian Brown wrote:
> > The second approach is to run all threads in the warp all the time,
> > making sure they execute the same code with the same data, and thus
> > build up the same local state.  In this case we'd need to ensure this
> > invariant: if threads in the warp have the same state prior to
> > executing an instruction, they also have the same state after
> > executing that instruction (plus global state changes as if only one
> > thread executed that instruction).
> > 
> > Most instructions are safe w.r.t this invariant.
> 
> > Was something like this considered (and rejected?) for OpenACC?
> 
> I'm not sure we understood the "global state changes as if only one
> thread executed that instruction" bit (do you have a citation?).

Not sure what kind of citation you want.  It's something I need be satisfied.

Taking a store to memory for example.  I want to ensure that if all threads in
a warp store the same value to the same location, the effect on memory is the
same as if only one thread performed the store (and not writing garbage or
invoking undefined behavior).  PTX gives me that guarantee automatically:

  "If a non-atomic instruction executed by a warp writes to the same location in
  global or shared memory for more than one of the threads of the warp, the
  number of serialized writes that occur to that location and the order in which
  they occur is undefined, but one of the writes is guaranteed to succeed"

> But anyway, even if that works for threads within a warp, it doesn't work
> for warps within a CTA, so we'd still need some broadcast mechanism for
> those.

Yes.  In OpenMP that corresponds to #omp parallel/GOMP_parallel, which was
discussed in relation to the patch where I want to store omp_data_o in shared
memory.

Alexander

Reply via email to