Issue 172045
Summary Improve AMDGPU sqrt and inverse sqrt handling for bfloat
Labels backend:AMDGPU, missed-optimization
Assignees
Reporter arsenm
    The code for targets without v_sqrt_bf16 and v_rsq_bf16 is quite poor: https://github.com/llvm/llvm-project/pull/172044

This looks like it is casting to float, and performing the full precision float expansion. This can use one of the faster options, and be closer to the f16 expansion. 
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to