tra added a comment.
> This is, or is very similar to, the problem that the host/device overloading > addresses in CUDA. IIRC the difference was that OpenMP didn't have explicit notion of host/device functions which made it hard to apply host/device overloading in practice. > It is also the problem, or very similar to the problem, that the new OpenMP 5 > `declare variant` directive is intended to address. Johannes and I discussed > this earlier today, and I suggest that we: Interesting. `declare variant ` sounds (according to openmp-TR7 doc) like a `__device__` on steroids. That may indeed make things work. Actually, I would like __device__ eventually work like `device variant`, so we can have multiple device overloads specialized for particular GPU architecture without relying on preprocessor's `__CUDA_ARCH__`. > > > 1. Add a math.h wrapper to clang/lib/Headers, which generally just does an > include_next of math.h, but provides us with the ability to customize this > behavior. Writing a header for OpenMP on NVIDIA GPUs which is essentially > identical to the math.h functions in __clang_cuda_device_functions.h would be > unfortunate, and as CUDA does provide the underlying execution environment > for OpenMP target offload on NVIDIA GPUs, duplicative even in principle. We > don't need to alter the default global namespace, however, but can include > this file from the wrapper math.h. Using `__clang_cuda_device_functions.h` in addition to `math.h` wrapper should be fine. It gives us a path to provide device-side standard math library implementation and math.h wrapper provides convenient point to hook in the implementation for platforms other than CUDA. > 2. We should allow host/device overloading in OpenMP mode. As an extension, > we could directly reuse the CUDA host/device overloading capability - this > also has the advantage of allowing us to directly reuse > __clang_cuda_device_functions.h (and perhaps do a similar thing to pick up > the device-side printf, etc. from __clang_cuda_runtime_wrapper.h). In the > future, we can extend these to provide overloading using OpenMP declare > variant, if desired, when in OpenMP mode. Is OpenMP is still essentially C-based? host/device overloading relies on C++ machinery. I think it should work with `__attribute__((overloadable))` but it's not been tested. We may need to restructure bits and pieces of CUDA-related headers to make them reusable by OpenMP. I guess that with `declare variant` we may be able to reuse most of the headers as is by treating `__device__` as if the function was a variant for NVPTX back-end. > Thoughts? SGTM. Let me know if something in the CUDA-related headers gets in the way. Repository: rC Clang CHANGES SINCE LAST ACTION https://reviews.llvm.org/D47849/new/ https://reviews.llvm.org/D47849 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits