On 11/14/2014 08:18 AM, Jakub Jelinek wrote: >> Also, keep in mind that PTX doesn't have a global TID. The user needs to >> calculate it using ctaid/tid and friends. > > Ok. Is %gridid needed for that combo too?
Eventually, probably. Currently, we're launching all of our kernels with cuLaunchKernel, and that function doesn't take grids into account. Nvidia's documentation is kind of confusing. They use different terminology for their high level CUDA stuff and the low level PTX. E.g., what CUDA refers to blocks/warps, PTX calls CTAs. I'm not sure what grids corresponds to, but I think it might be devices. If that's the case, the runtime does have the capability to select which device to run a kernel on. But, it can't run a single kernel on multiple devices unless you use asynchronous kernel invocations. Cesar