Issue |
122081
|
Summary |
[AArch64] `QNaN` check after `fsqrt` instruction is slow
|
Labels |
new issue
|
Assignees |
|
Reporter |
kasuga-fj
|
It looks like we are about 100% behind for the following function (where `N=10000`) on Neoverse V2.
Compilation options: `-O3 -mcpu=neoveser-v2`
```
#include <math.h>
void f(int n, double *arr, double m) {
for (int i = 0; i < n; i++) {
arr[i] = sqrt(arr[i] * m);
}
}
```
godbolt: https://godbolt.org/z/57Yqj15KP
I tried to analyze the root cause and found out that the `fcmp` instruction after `fsqrt` takes a lot of time. The `fcmp` checks if the result of `fsqrt` is `QNaN` or not, then jumps to the library function call branch if necessary. This problem happens even if the all the element in `arr` is positive, so we don't jump to branch the library function call. Avoiding this check by adding options like `-fno-honor-nan` resolved the performance gap between gcc and clang. I think we should insert a comparison instruction before the `fsqrt` instruction like gcc does.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs