On 11/14/2014 08:18 AM, Jakub Jelinek wrote:

>> Also, keep in mind that PTX doesn't have a global TID. The user needs to
>> calculate it using ctaid/tid and friends.
> 
> Ok.  Is %gridid needed for that combo too?

Eventually, probably. Currently, we're launching all of our kernels with
cuLaunchKernel, and that function doesn't take grids into account.

Nvidia's documentation is kind of confusing. They use different
terminology for their high level CUDA stuff and the low level PTX. E.g.,
what CUDA refers to blocks/warps, PTX calls CTAs. I'm not sure what
grids corresponds to, but I think it might be devices. If that's the
case, the runtime does have the capability to select which device to run
a kernel on. But, it can't run a single kernel on multiple devices
unless you use asynchronous kernel invocations.

Cesar

Reply via email to