tra added a comment.

Enforcing explicit GPU target makes sense.

However, I think that singling out a `__global__` as the trigger is not 
sufficient for the intended purpose.

If we can't generate a usable GPU-side binary, then we should produce an error 
if we need to generate *anything* during GPU-side compilation.
Using `__global__` as a proxy would not catch some use cases and, possibly, 
will produce false positives in  others.

E.g. what if I have a TU which only has a `__device__ int var = 42;` combined 
with a host-side code to memcpy to/from it? It would still be a valid, if not 
very useful code, but it would still suffer from runtime being unable to load 
it on a GPU unless that variable is in a GPU binary compiled with a valid 
target.

`__device__` functions in TUs compiled with `-fgpu-rdc` would have a similar 
problem. They would eventually be linked into a GPU binary which will be 
useless if it's not compiled for correct GPU. Granted, `__device__` functions 
will eventually need to be called from a kernel, so we will error out on a 
`__global__`  *somewhere*, but it will miss the problem when such TU does not 
get to the linking stage (e.g. maybe the user wants to link them at runtime).



================
Comment at: clang/include/clang/Basic/DiagnosticSemaKinds.td:8260
+def err_hip_kern_without_gpu : Error<
+  "compiling a HIP kernel without specifying an offload arch is not allowed">,
+  DefaultFatal;
----------------
How about compiling a file with `__device__` functions with `-fgpu-rdc`? If a 
kernel with no-arch is an error, then this should be an error, too.


================
Comment at: clang/lib/Sema/SemaDeclAttr.cpp:4431
   }
+  if (S.getASTContext().getTargetInfo().getTargetOpts().CPU.empty() &&
+      S.getLangOpts().HIP && S.getLangOpts().CUDAIsDevice) {
----------------
Will this fire if we have an uninstantiated kernel template? 




================
Comment at: clang/test/SemaCUDA/kernel-no-gpu.cu:7
+
+__global__ void kern1() {}
+// hip-error@-1 {{compiling a HIP kernel without specifying an offload arch is 
not allowed}}
----------------
We'll need few more test cases.

E.g. these should be fine.

```
template <typename T> __global__ void kernel(T arg ) {};
__global__ void kernel(T arg );

```


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100552/new/

https://reviews.llvm.org/D100552

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to