jdoerfert wrote: There seems to be some trouble with NVIDIA offload (I tested mainly AMDGPU) and f128, I'll make sure that works too. The nits are easy to address, I just copied the style around. I'll also add a IR test to match the new runtime calls and kernel argument passing.
> Will kernels in TUs compiled with `-foffload-via-llvm` be interoperable with > code that wants to launch them from another TU compiled w/o > `-foffload-via-llvm` ? > > E.g.: > > * a.cu: `__global__ void kernel() { ... }` > * b.cu: `extern __global__ void kernel(); void func() { kernel<<<1,1>>>();}` > > This could use a test in the testsuite to actually check whether it works. I'll look into this. Intuitively, the kernel launch needs -foffload-via-llvm (which implies -foffload-new-driver) and the kernel definition needs -foffload-new-driver. Similarly, with the new driver flag device code should link fine. Right now, this defaults to gpu-rdc, as OpenMP does, but we can change that. On that note, non-rdc should actually internalize all but the kernels and thereby help the middle end as well. https://github.com/llvm/llvm-project/pull/94549 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits