| Issue |
172045
|
| Summary |
Improve AMDGPU sqrt and inverse sqrt handling for bfloat
|
| Labels |
backend:AMDGPU,
missed-optimization
|
| Assignees |
|
| Reporter |
arsenm
|
The code for targets without v_sqrt_bf16 and v_rsq_bf16 is quite poor: https://github.com/llvm/llvm-project/pull/172044
This looks like it is casting to float, and performing the full precision float expansion. This can use one of the faster options, and be closer to the f16 expansion.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs