[llvm-bugs] [Bug 122081] [AArch64] `QNaN` check after `fsqrt` instruction is slow

LLVM Bugs via llvm-bugs Wed, 08 Jan 2025 01:32:42 -0800

Issue	122081
Summary	[AArch64] `QNaN` check after `fsqrt` instruction is slow
Labels	new issue
Assignees
Reporter	kasuga-fj

    It looks like we are about 100% behind for the following function (where `N=10000`) on Neoverse V2.
Compilation options: `-O3 -mcpu=neoveser-v2`


```
#include <math.h>

void f(int n, double *arr, double m) {
    for (int i = 0; i < n; i++) {
        arr[i] = sqrt(arr[i] * m);
    }
}
```

godbolt: https://godbolt.org/z/57Yqj15KP

I tried to analyze the root cause and found out that the `fcmp` instruction after `fsqrt` takes a lot of time. The `fcmp` checks if the result of `fsqrt` is `QNaN` or not, then jumps to the library function call branch if necessary. This problem happens even if the all the element in `arr` is positive, so we don't jump to branch the library function call. Avoiding this check by adding options like `-fno-honor-nan` resolved the performance gap between gcc and clang. I think we should insert a comparison instruction before the `fsqrt` instruction like gcc does.

_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

[llvm-bugs] [Bug 122081] [AArch64] `QNaN` check after `fsqrt` instruction is slow

Reply via email to