On Thu, Oct 22, 2015 at 07:16:49PM +0200, Bernd Schmidt wrote: > I'm not really familiar with OpenMP and what it allows, so take all my > comments with a grain of salt. > > On 10/22/2015 06:41 PM, Alexander Monakov wrote: > >The second approach is to run all threads in the warp all the time, making > >sure they execute the same code with the same data, and thus build up the > >same > >local state. > > But is that equivalent? If each thread takes the address of a variable on > its own stack, that's not the same as taking an address once and > broadcasting it.
Does PTX allow function scope .shared variables (rather than just file scope)? If yes, then perhaps all the automatic vars that in theory could be passed to other threads (i.e. addressable vars) could be then .shared and the non-addressable ones .local. In target constructs directly embedded into host code you can know what variables are shared (which are shared between teams, then .global, but that is primarily about mapped variables which are heap allocated and firstprivate vars in target but not teams; which are shared between threads, then .shared). In separate functions where it is unknown if they are called from within teams context (where it is run by just the first thread in the first warp), or from within parallel context (where it is run by one or more warps and thus privatized vars need to be .local or ideally warp-local) and what to do for the SIMD stuff broadcasts. Jakub