On Tue, Oct 20, 2015 at 09:34:31PM +0300, Alexander Monakov wrote:
> + asm ("bar.sync 0, %0;" : : "r"(32*bar->total));
Formatting, space between "(, spaces around * (in many places).
As for re-convergence of threads in a warp, if we use threads in the warp
other than thread 0 only for simd regio
On 10/20/2015 11:51 PM, Alexander Monakov wrote:
On Tue, 20 Oct 2015, Bernd Schmidt wrote:
My experience has been that there is practically no way of using bar.sync
reliably, since we can't control warp divergence and reconvergence at the
ptx level but the hardware bar.sync instruction only wor
On Tue, 20 Oct 2015, Bernd Schmidt wrote:
> On 10/20/2015 08:34 PM, Alexander Monakov wrote:
> > On NVPTX, there's 16 hardware barriers for each thread team, each barrier
> > has
> > a variable waiter count. The instruction 'bar.sync N, M;' allows to wait on
> > barrier number N until M threads h
On 10/20/2015 08:34 PM, Alexander Monakov wrote:
On NVPTX, there's 16 hardware barriers for each thread team, each barrier has
a variable waiter count. The instruction 'bar.sync N, M;' allows to wait on
barrier number N until M threads have arrived. M should be pre-multiplied by
warp width. It
On NVPTX, there's 16 hardware barriers for each thread team, each barrier has
a variable waiter count. The instruction 'bar.sync N, M;' allows to wait on
barrier number N until M threads have arrived. M should be pre-multiplied by
warp width. It's also possible to 'post' the barrier without susp