Re: [gomp4 09/14] libgomp: provide barriers on NVPTX

2015-10-21 Thread Jakub Jelinek
On Tue, Oct 20, 2015 at 09:34:31PM +0300, Alexander Monakov wrote: > + asm ("bar.sync 0, %0;" : : "r"(32*bar->total)); Formatting, space between "(, spaces around * (in many places). As for re-convergence of threads in a warp, if we use threads in the warp other than thread 0 only for simd regio

Re: [gomp4 09/14] libgomp: provide barriers on NVPTX

2015-10-20 Thread Bernd Schmidt
On 10/20/2015 11:51 PM, Alexander Monakov wrote: On Tue, 20 Oct 2015, Bernd Schmidt wrote: My experience has been that there is practically no way of using bar.sync reliably, since we can't control warp divergence and reconvergence at the ptx level but the hardware bar.sync instruction only wor

Re: [gomp4 09/14] libgomp: provide barriers on NVPTX

2015-10-20 Thread Alexander Monakov
On Tue, 20 Oct 2015, Bernd Schmidt wrote: > On 10/20/2015 08:34 PM, Alexander Monakov wrote: > > On NVPTX, there's 16 hardware barriers for each thread team, each barrier > > has > > a variable waiter count. The instruction 'bar.sync N, M;' allows to wait on > > barrier number N until M threads h

Re: [gomp4 09/14] libgomp: provide barriers on NVPTX

2015-10-20 Thread Bernd Schmidt
On 10/20/2015 08:34 PM, Alexander Monakov wrote: On NVPTX, there's 16 hardware barriers for each thread team, each barrier has a variable waiter count. The instruction 'bar.sync N, M;' allows to wait on barrier number N until M threads have arrived. M should be pre-multiplied by warp width. It

[gomp4 09/14] libgomp: provide barriers on NVPTX

2015-10-20 Thread Alexander Monakov
On NVPTX, there's 16 hardware barriers for each thread team, each barrier has a variable waiter count. The instruction 'bar.sync N, M;' allows to wait on barrier number N until M threads have arrived. M should be pre-multiplied by warp width. It's also possible to 'post' the barrier without susp