tra added a comment.

In D68578#1698864 <https://reviews.llvm.org/D68578#1698864>, @t-tye wrote:

> From a source language point of view, the device function comprises the code 
> that is launched as a grid. We need this fact to be present in the symbols 
> used. Only the device function should have a symbol name matching the mangled 
> name of the device function.


What do you have in mind when you use 'symbol name' here? Is that a symbol as 
seen by linker? If that's the case, do host and device share this name space on 
AMD GPUs? In case of CUDA, linker symbols are per-target (i.e. host and each 
GPU have their own spaces), so they never clash, but the kernel names must have 
identical mangled name on host and all devices, so the host can refer to the 
device-side kernel when it needs to launch it.

> It the device function has both a host and device implementation then both 
> can have the source language function name for the symbol since both actually 
> implement the device function. If the user asks to set a breakpoint in the 
> device function then the debugger would set in both implementations so the 
> user is notified when the source program executes the device function, 
> regardless of which implementation is invoked. This is similar to the 
> debugger setting a breakpoint in a function that is inlined into multiple 
> places: the debugger sets breeakpoints in all the inlined places so the user 
> can tstill think of the program debugging in terms of the source language 
> semantics.

OK. This sounds like `__host__`/`__device__` function overloads and what you're 
saying does make sense for that.

> In contrast, the stub is effectively part of the implementation of actually 
> launching the device function. It should have a distinct name.

I'm not sure how the requirement of distinct name follows from the fact that 
the stub is the host-side part of the device-side kernel? To me it looks like 
an argument for them to have the same name so it's clear that they are both 
part of the same function as written in the source.

The don't have to be different. CUDA (and HIP) does not allow overloading of 
kernels, so the stub and the kernel can have identical names as in the example 
of `__host__` and `__device__` overloads you've described above, only now it's 
`__host__` stub + `__global__` kernel itself, instead of two user-implemented 
functions. Debugger, of course, will need to know about that to pick the stub 
or kernel as the breakpoint location, but that appears doable.

> The debugger can still be used to set a breakpoint in it, or to step into it. 
> But that should be done in terms of the stub name. If the debugger wants to 
> support source language specific intelligence it can provide a helper library 
> that understands the stub names. This helper library (similar to the thread 
> helper library) can be used by the debugger to present a cleaner language 
> view to the user. In fact OpenMP has also done this and provides a helper 
> library called OMPD that can be used by tools such as a debugger to hide 
> OpenMP trampoline functions etc.

Do I understand it correctly that giving the stub distinct name would 
effectively get it out of the way when a breakpoint is set on the kernel? I.e. 
it's essentially a work around the fact that debugger may not have convenient 
way to specify "set breakpoint on this name in device code only". Perhaps it 
would make sense to prove this ability as it sounds quite useful. I.e I may 
want to set breakpoint on all inlined host/device functions, but only on device 
side. That would be handy.

What happens if the stub and the kernel do have identical names?
My understanding, based on your comments above is that debugger does know about 
host and device 'spaces' and that it can find pointers to both host and device 
functions and set appropriate breakpoints for both. In this case it would 
normally attempt to set breakpoint on both the stub and the kernel as it would 
in case of `__host__`/`__device__` overloads you've described above. In case of 
stub/kernel, we would want the breakpoint set only on the kernel itself. Given 
that debugger does have ability to tell host and device functions/symbols 
apart, the difficulty is really in being able to tell a real host function from 
the stub, so we can skip it.

Is that indeed what we want/need? Is there something else?

Does debugger know that device-side function is a kernel? In case of CUDA, the 
kernels are distinct from regular device-side functions. I don't know whether 
that's the case for AMDGPU.
If debugger can tell that particular device function is a kernel, that can be 
used to infer that the matching host-side symbol is a stub and skip setting a 
breakpoint on it.
If that does not work, debugger presumably has access to the mangled symbols 
for the potential breakpoint locations. The stub currently has distinct `.stub` 
suffix. This can also be used to tell it apart from a regular `__host__` 
function.

I do not see how changing the source-level name for the stub is going to change 
things in principle. It's just yet another way to disambiguate a real 
`__host__` function from a `host` stub we generate for the kernel.
Is there anything else about the stubs that requires them to have a name 
different from the kernel?

> I am a little unclear what this patch is doing as it is mentioned that the 
> mangled name has a _stub in it.

Currently the mangled name has .stub suffix which is discarded during 
unmangling, so unmangled names for the stub and the kernel end up being 
identical. I'm trying to figure out why is it a problem to be fixed in the 
compiler.

> My understanding is that the intention was to create a distinct unmangled 
> name for the stub, and then mangle it so that the resulting symbol was a 
> legal mangled name. It sounded like this was the preferred approach, and 
> makes sense to me based on my current understanding. Am I understanding this 
> correctly?

This patch proposes changing the source-level name for the stub. Unfortunately 
the way it attempt to implement it is by doing the renaming during mangling 
phase itself. This appears to be the wrong place to change source-level name. 
Before figuring out what would be the right thing to do here, I want to 
understand why we're doing it. I appreciate your description of what drives 
this requirement. I think I have petter idea of it now, but I still have some 
questions. Please bear with me.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68578/new/

https://reviews.llvm.org/D68578



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to