On Wed, 21 Oct 2015, Bernd Schmidt wrote: > On 10/20/2015 08:34 PM, Alexander Monakov wrote: > > This patch series ports enough of libgomp.c to get warp-level parallelism > > working for OpenMP offloading. The overall approach is as follows. > > Could you elaborate a bit what you mean by this just so we understand each > other in terms of terminology? "Warp-level" sounds to me like you have all > threads in a warp executing in lockstep at all times. If individual threads > can take different paths, I'd expect it to be called thread-level parallelism > or something like that.
Sorry, that was unclear. What I meant is that there is a degree of parallelism available across different warps, but not across different teams (because only 1 team is spawned), nor across threads in a warp (because all threads in a warp except one exit immediately -- later on we'd need to keep them converged so they can enter a simd region together). > What is your end goal in terms of mapping GPU parallelism onto OpenMP? OpenMP team is mapped to a CUDA thread block, OpenMP thread is mapped to a warp, OpenMP simd lane is mapped to a CUDA thread. So, follow the OpenACC model. Like in OpenACC, we'd need to artificially deactivate/reactivate warp members on simd region boundaires. Alexander