On 05/20/2015 02:39 PM, Jakub Jelinek wrote:
On Wed, May 20, 2015 at 02:01:44PM +0200, Bernd Schmidt wrote:
To implement OpenACC vector-single mode, we need to ensure that only one
thread out of the group representing a worker executes. The others skip
computations but follow along the CFG, so the results of conditional branch
decisions must be broadcast to them.

The patch below adds a new builtin and nvptx pattern to implement that
broadcast functionality.

So, is the goal of this that threads in the warp other than the 0th
don't do anything except in vectorized regions, where all the threads
in the warp participate in the vectorization?

Yes.

Thus, for OpenMP, should the whole warp be a single thread
(thus omp_get_thread_num () would be tid.x >> 5)?

Do you mean for an OMP port to nvptx? I haven't looked at OpenMP enough to say if or how it could be mapped to GPU hardware; it's not something we intend to do for this project.


Bernd

Reply via email to