On Tue, 19 Jan 2016, Thomas Schwinge wrote: > Hi! > > On Tue, 19 Jan 2016 17:07:17 +0300, Alexander Monakov <amona...@ispras.ru> > wrote: > > On Tue, 19 Jan 2016, Alexander Monakov wrote: > > > > ... to determine an optimal number of threads per block given the number > > > > of registers (maybe just querying > > > > CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK > > > > would do that already?). > > > > > > I have implemented that for OpenMP offloading, but also since CUDA 6.0 > > > there's > > > cuOcc* (occupancy query) interface that allows to simply ask the driver > > > about > > > the per-function launch limit. > > You mean you already have implemented something along the lines I > proposed?
Yes, I was implementing OpenMP teams, and it made sense to add warps per block limiting at the same time (i.e. query CU_FUNC_ATTRIBUTE_... and limit if default or requested number of threads per team is too high). I intend to post that patch as part of a larger series shortly (but the patch itself is simple enough, although a small tweak will be needed to make it apply to OpenACC too). Alexander