Re: [gomp4 09/14] libgomp: provide barriers on NVPTX

Bernd Schmidt Tue, 20 Oct 2015 17:14:18 -0700

On 10/20/2015 11:51 PM, Alexander Monakov wrote:

On Tue, 20 Oct 2015, Bernd Schmidt wrote:

My experience has been that there is practically no way of using bar.sync
reliably, since we can't control warp divergence and reconvergence at the
ptx level but the hardware bar.sync instruction only works when executed by
all threads in a warp at the same time.


I don't think it's that bad.  Divergence and reconvergence are implicit: a
non-uniform branch is a divergence point, and the corresponding reconvergence
point is at its immediate post-dominator.

That's good in theory, but I have seen cases where very odd thingsseemed to be happening in ptxas, and another problem is that gcc isquite unconcerned about maintaining such reconvergence points in itsoptimization passes.

For bar.sync behavior the documentation gives an explicit guarantee: every
time a warp encounters a bar.sync instruction, it bumps the count by the warp
width (32), irrespective of how many warp members are active at the time of
encounter.

Yeah, but that's undesirable: you can breeze right past a bar.syncbefore the thing you wanted to synchronize has completed.



Bernd

Re: [gomp4 09/14] libgomp: provide barriers on NVPTX

Reply via email to