Issue 144681
Summary [XRay] Weird sled behavior with `-O3` and `-fno-inline`
Labels new issue
Assignees
Reporter Thyre
    **Godbolt link:** https://godbolt.org/z/KvoMf5GjY

-----

Given this very short example:

```c++
#include <math.h>

inline int SQRT(int arg) { return sqrtf(static_cast<float>(arg)); }

template<typename T>
T foo( T a )
{
 return SQRT( (T)a );
}

int main( int argc, char** argv )
{
    return foo( argc );
}
```

`clang` generates interesting assembly code with XRay being involved, and both the flags `-O3 -fno-inline -fxray-instrument -fxray-instruction-threshold=1` being used:

```assembly
main:
 nop     word ptr [rax + rax + 512]
        nop     word ptr [rax + rax + 512]
        jmp     int foo<int>(int)

int foo<int>(int):
        nop word ptr [rax + rax + 512]
        nop     word ptr [rax + rax + 512]
 jmp     SQRT(int)

SQRT(int):
        nop     word ptr [rax + rax + 512]
        [...]
        ret
        nop     word ptr cs:[rax + rax + 512]
```

Both `main` and `int foo<int>(int)` have proper sleds for XRay instrumentation. However, both the enter and exit sled can be found before the actual function content (i.e. the `jmp` instruction). 

This causes an issue for tools who want to represent the a proper tree structure of functions being called, e.g. performance tools. One would see something like this:

```
- ./a.out
  - main
  - int foo<int>(int)
  - SQRT(int)
```

Instead of
```
- ./a.out
  - main
    - int foo<int>(int)
      - SQRT(int)
```

In the case of LULESH with our current (in-development) XRay instrumentation adapter in [Score-P](https://www.vi-hps.org/projects/score-p/overview/overview.html), this even caused an inconsistent profile, probably due to similar reasons.

Given that this is a very constructed case, I don't see this as being a huge issue. However, I think this may be a limitation that should be documented somewhere. I can't immediately think of a solution for this, and I think most people will not encounter this issue. Why would someone prevent inlining with -O3 in the first place? (well, me, because I wanted to test the overhead when filtering functions).
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to