Hahnfeld removed a reviewer: Hahnfeld.
Hahnfeld added a comment.

I feel like there is no progress in the discussion (here and off-list), partly 
because we might still not be talking about the same things. So I'm stepping 
down from this revision to unblock review from somebody else.

Here's my current understanding of the issue(s):

- `math.h` (or transitively included files) on both PowerPC and x86 contain 
inline assembly.
  - On x86 Clang directly bails out because the code is using the `x` input 
constraint which doesn't exist for NVPTX (-> `invalid input constraint 'x' in 
asm`).
  - From my understanding the header passes Sema analysis on PowerPC, but 
rejects CodeGen because the assembly instructions are invalid on NVPTX?
- This problem can be avoided (for testing purposes; including `math.h` should 
be fixed as well some day!) by explicitly declaring all needed math functions 
(like `extern double exp(double);`)
  - Without additional flags this makes Clang emit Intrinsic Functions 
<https://llvm.org/docs/LangRef.html#intrinsic-functions> like `@llvm.exp.f64` 
for NVPTX.
  - That's because `IsMathErrnoDefault()` returns `false` for the Cuda 
ToolChain. This behaviour can be overwritten using `-fmath-errno` (the test 
case `nvptx_device_math_functions.c` uses this flag; I'm not sure why?)
- That at least looks to be producing correct IR in both cases which is then 
passed to the backend:
  1. For intrinsic functions (with some notable exceptions) the backend 
complains `Cannot select: [...] ExternalSymbol'exp'`.
    - Some exceptions are `sqrt.f32`, `sqrt.f64`, `sin.f32` and `cos.f32`: The 
backend will directly lower them to the corresponding PTX instruction. 
Unfortunately there is none for `exp`...
  2. For "real" function calls (like `call double @exp(double %3)`) `nvlink` 
will throw `Undefined reference` errors.

This patch takes the following approach:

1. Avoid intrinsics for math builtins by passing `-fno-math-builtin` for device 
compilation.
2. Use the CUDA header to redirect math functions to their libdevice 
equivalents in the frontend, mostly just prefixed by `__nv_` (for example 
`exp(a)` -> `__nv_exp(a)`).

The downside of this approach is that LLVM doesn't recognize these function 
calls and doesn't perform optimizations to fold libcalls. For example `pow(a, 
2)` is transformed into a multiplication but `__nv_pow(a, 2)` is not.

In https://reviews.llvm.org/D47849#1124638, @Hahnfeld wrote:

> IMO this goes into the right direction, we should use the fast implementation 
> in libdevice.


So yeah, my comment seems to be outdated if these simple optimizations don't 
happen anymore with this patch: I don't want to use a fast `pow(a, 2)`, I don't 
want to call a library function for that at all.

We could of course make LLVM recognize the calls to libdevice and handle them 
the same way. But that's adding more workarounds to make this patch not regress 
on easy cases (in terms of transformations).
Another approach would be to make the NVPTX backend lower remaining calls of 
math functions to libdevice equivalents. I came across 
https://reviews.llvm.org/D34708 which seems to go into that direction (but 
doesn't work out-of-the-box after fixing some build errors, complaing about 
`Undefined external symbol`s because libdevice is optimized away as it wasn't 
needed before)...


Repository:
  rC Clang

https://reviews.llvm.org/D47849



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to