Hahnfeld added a comment. So the scheme is: `pow` is defined in `__clang_openmp_math.h` to call `__kmpc_pow`. This lives in `libomptarget-nvptx` (both bc and static lib) and just calls `pow` which works because `nvcc` and Clang in CUDA mode make sure that the call gets routed into `libdevice`?
Did you test that something like `pow(d, 2)` is optimized by LLVM to `d * d`? There's a pass doing so (can't recall the name) and from my previous attempts it didn't work well if you hid the function name instead of the known `pow` one. Repository: rC Clang CHANGES SINCE LAST ACTION https://reviews.llvm.org/D60907/new/ https://reviews.llvm.org/D60907 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits