tra added a comment.


> This is, or is very similar to, the problem that the host/device overloading 
> addresses in CUDA.

IIRC the difference was that OpenMP didn't have explicit notion of host/device 
functions which made it hard to apply host/device overloading in practice.

> It is also the problem, or very similar to the problem, that the new OpenMP 5 
> `declare variant` directive is intended to address. Johannes and I discussed 
> this earlier today, and I suggest that we:

Interesting. `declare variant ` sounds (according to openmp-TR7 doc) like a 
`__device__` on steroids. That may indeed make things work. Actually, I would 
like __device__ eventually work like `device variant`, so we can have multiple 
device overloads specialized for particular GPU architecture without relying on 
preprocessor's `__CUDA_ARCH__`.

> 
> 
> 1. Add a math.h wrapper to clang/lib/Headers, which generally just does an 
> include_next of math.h, but provides us with the ability to customize this 
> behavior. Writing a header for OpenMP on NVIDIA GPUs which is essentially 
> identical to the math.h functions in __clang_cuda_device_functions.h would be 
> unfortunate, and as CUDA does provide the underlying execution environment 
> for OpenMP target offload on NVIDIA GPUs, duplicative even in principle. We 
> don't need to alter the default global namespace, however, but can include 
> this file from the wrapper math.h.

Using `__clang_cuda_device_functions.h` in addition to `math.h` wrapper should 
be fine. It gives us a path to provide device-side standard math library 
implementation and math.h wrapper provides convenient point to hook in the 
implementation for platforms other than CUDA.

> 2. We should allow host/device overloading in OpenMP mode. As an extension, 
> we could directly reuse the CUDA host/device overloading capability - this 
> also has the advantage of allowing us to directly reuse 
> __clang_cuda_device_functions.h (and perhaps do a similar thing to pick up 
> the device-side printf, etc. from __clang_cuda_runtime_wrapper.h). In the 
> future, we can extend these to provide overloading using OpenMP declare 
> variant, if desired, when in OpenMP mode.

Is OpenMP is still essentially C-based? host/device overloading relies on C++ 
machinery. I think it should work with `__attribute__((overloadable))` but it's 
not been tested.

We may need to restructure bits and pieces of CUDA-related headers to make them 
reusable by OpenMP.  I guess that with `declare variant` we may be able to 
reuse most of the headers as is by treating `__device__` as if the function was 
a variant for NVPTX back-end.

> Thoughts?

SGTM. Let me know if something in the CUDA-related headers gets in the way.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D47849/new/

https://reviews.llvm.org/D47849



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
  • [PATCH] D47849: [Op... Hal Finkel via Phabricator via cfe-commits
    • [PATCH] D47849... Artem Belevich via Phabricator via cfe-commits
    • [PATCH] D47849... Gheorghe-Teodor Bercea via Phabricator via cfe-commits

Reply via email to