Issue |
129523
|
Summary |
`llvm-objdump` gives wrong line numbers for WebAssembly
|
Labels |
new issue
|
Assignees |
|
Reporter |
stevenwdv
|
This issue was previously filed as emscripten-core/emscripten#23717.
`llvm-objdump` gives wrong line info for a simple WebAssembly file.
# Steps to reproduce
- Create a simple `main.cpp`:
```cpp
int main() { return 42; }
```
- Now compile with debug symbols:
```shell
em++ -g main.cpp
```
<details>
<summary>Verbose output</summary>
```
"/home/swdv/emsdk/upstream/bin/clang++" -target wasm64-unknown-emscripten -fignore-exceptions -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr --sysroot=/home/swdv/emsdk/upstream/emscripten/cache/sysroot -DEMSCRIPTEN -Xclang -iwithsysroot/include/fakesdl -Xclang -iwithsysroot/include/compat -g3 -DNO_USE_MYFUN -v -c main.cpp -o /tmp/emscripten_temp_pe2lfvyf/main_0.o
clang version 21.0.0git (https:/github.com/llvm/llvm-project 6dc41a639334b913e762f65410fcd14a722b137f)
Target: wasm64-unknown-emscripten
Thread model: posix
InstalledDir: /home/swdv/emsdk/upstream/bin
(in-process)
"/home/swdv/emsdk/upstream/bin/clang-21" -cc1 -triple wasm64-unknown-emscripten -emit-obj -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name main.cpp -mrelocation-model static -mframe-pointer=none -ffp-contract=on -fno-rounding-math -mconstructor-aliases -target-cpu generic -fvisibility=hidden -debug-info-kind=constructor -dwarf-version=4 -debugger-tuning=gdb -fdebug-compilation-dir=/home/swdv/Downloads/plainwasmtest -v -fcoverage-compilation-dir=/home/swdv/Downloads/plainwasmtest -resource-dir /home/swdv/emsdk/upstream/lib/clang/21 -D EMSCRIPTEN -D NO_USE_MYFUN -isysroot /home/swdv/emsdk/upstream/emscripten/cache/sysroot -internal-isystem /home/swdv/emsdk/upstream/emscripten/cache/sysroot/include/wasm64-emscripten/c++/v1 -internal-isystem /home/swdv/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1 -internal-isystem /home/swdv/emsdk/upstream/lib/clang/21/include -internal-isystem /home/swdv/emsdk/upstream/emscripten/cache/sysroot/include/wasm64-emscripten -internal-isystem /home/swdv/emsdk/upstream/emscripten/cache/sysroot/include -fdeprecated-macro -ferror-limit 19 -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf -fcxx-exceptions -fignore-exceptions -fexceptions -fcolor-diagnostics -iwithsysroot/include/fakesdl -iwithsysroot/include/compat -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr -o /tmp/emscripten_temp_pe2lfvyf/main_0.o -x c++ main.cpp
clang -cc1 version 21.0.0git based upon LLVM 21.0.0git default target x86_64-unknown-linux-gnu
ignoring nonexistent directory "/home/swdv/emsdk/upstream/emscripten/cache/sysroot/include/wasm64-emscripten/c++/v1"
ignoring nonexistent directory "/home/swdv/emsdk/upstream/emscripten/cache/sysroot/include/wasm64-emscripten"
#include "..." search starts here:
#include <...> search starts here:
/home/swdv/emsdk/upstream/emscripten/cache/sysroot/include/fakesdl
/home/swdv/emsdk/upstream/emscripten/cache/sysroot/include/compat
/home/swdv/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1
/home/swdv/emsdk/upstream/lib/clang/21/include
/home/swdv/emsdk/upstream/emscripten/cache/sysroot/include
End of search list.
/home/swdv/emsdk/upstream/bin/clang --version
/home/swdv/emsdk/upstream/bin/wasm-ld -o hello.wasm /tmp/emscripten_temp_pe2lfvyf/main_0.o -L/home/swdv/emsdk/upstream/emscripten/cache/sysroot/lib/wasm64-emscripten -L/home/swdv/emsdk/upstream/emscripten/src/lib -lGL-getprocaddr -lal -lhtml5 -lstubs-debug -lnoexit -lc-debug -ldlmalloc-debug -lcompiler_rt -lc++-noexcept -lc++abi-debug-noexcept -lsockets -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr -mwasm64 /tmp/tmp5u5b29eklibemscripten_js_symbols.so --export=emscripten_stack_get_end --export=emscripten_stack_get_free --export=emscripten_stack_get_base --export=emscripten_stack_get_current --export=emscripten_stack_init --export=_emscripten_stack_alloc --export=__wasm_call_ctors --export=_emscripten_stack_restore --export-if-defined=__start_em_asm --export-if-defined=__stop_em_asm --export-if-defined=__start_em_lib_deps --export-if-defined=__stop_em_lib_deps --export-if-defined=__start_em_js --export-if-defined=__stop_em_js --export-if-defined=main --export-if-defined=__main_argc_argv --export-if-defined=fflush --export-table -z stack-size=65536 --no-growable-memory --initial-heap=16777216 --no-entry --stack-first --table-base=1
/home/swdv/emsdk/upstream/bin/llvm-objcopy hello.wasm hello.wasm --remove-section=producers
/home/swdv/emsdk/node/20.18.0_64bit/bin/node /home/swdv/emsdk/upstream/emscripten/src/compiler.mjs /tmp/tmp3fupbzr6.json
/home/swdv/emsdk/node/20.18.0_64bit/bin/node /home/swdv/emsdk/upstream/emscripten/tools/preprocessor.mjs /tmp/emscripten_temp_pe2lfvyf/settings.js shell.html
```
</details>
- Now disassemble the main function:
```shell
~/emsdk/upstream/bin/llvm-objdump --disassemble-symbols=__original_main --line-numbers a.out.wasm
```
- Observe how the line numbers and file are completely incorrect, mentioning `fflush.c` instead of our `main.cpp`:
```wasm
a.out.wasm: file format wasm
Disassembly of section CODE:
0000017c <__original_main>:
.local i32, i32, i32, i32, i32, i32, i32
; __original_main():
; /emsdk/emscripten/system/lib/libc/musl/src/stdio/fflush.c:17
180: 23 80 80 80 80 00 global.get 0
186: 21 00 local.set 0
188: 41 10 i32.const 16
18a: 21 01 local.set 1
18c: 20 00 local.get 0
18e: 20 01 local.get 1
190: 6b i32.sub
191: 21 02 local.set 2
193: 41 00 i32.const 0
195: 21 03 local.set 3
; /emsdk/emscripten/system/lib/libc/musl/src/stdio/fflush.c:18
197: 20 02 local.get 2
199: 20 03 local.get 3
19b: 36 02 0c i32.store 12
19e: 41 8d 21 i32.const 4237
1a1: 21 04 local.set 4
1a3: 41 15 i32.const 21
; /emsdk/emscripten/system/lib/libc/musl/src/stdio/fflush.c:15
1a5: 21 05 local.set 5
1a7: 20 04 local.get 4
1a9: 20 05 local.get 5
1ab: 36 02 00 i32.store 0
1ae: 41 2a i32.const 42
; /emsdk/emscripten/system/lib/libc/musl/src/stdio/fflush.c:20
1b0: 21 06 local.set 6
1b2: 20 06 local.get 6
1b4: 0f return
1b5: 0b end
```
# Version of emscripten/emsdk
```
emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 4.0.3 (a9651ff57165f5710bb09a5fe52590fd6ddb72df)
clang version 21.0.0git (https:/github.com/llvm/llvm-project 6dc41a639334b913e762f65410fcd14a722b137f)
Target: wasm32-unknown-emscripten
Thread model: posix
InstalledDir: /home/swdv/emsdk/upstream/bin
```
# More findings from emscripten-core/emscripten#23717
https://github.com/emscripten-core/emscripten/issues/23717#issuecomment-2675451863:
> [...] `llvm-dwarfdump` gives proper output.
https://github.com/emscripten-core/emscripten/issues/23717#issuecomment-2691456806:
> So this problem has to do with the way LLVM handles symbols for linked wasm files and debug info. Specifically, symbol addresses in DWARF are always encoded as offsets in the code section, whereas for linked files, LLVM uses the offset in the file as the address for a function (this is to match how [engines print](https://webassembly.github.io/spec/web-api/index.html#conventions) code addresses in backtraces). See some [changes](https://github.com/llvm/llvm-project/commits/main/?author=dschuff&since=2024-02-02&until=2024-02-22) (and [llvm/llvm-project#76198](https://github.com/llvm/llvm-project/pull/76198)) I made to implement this about a year ago in LLVM. So if you use e.g. `llvm-objdump` to print symbol addresses, they will match what browser backtraces show, but not match what you see if you use `llvm-dwarfdump` to look at the debug info, and `llvm-symbolizer` will not get the right answer. I think the same mechanism in LLVM that causes the latter problem is what is happening when `llvm-objdump` is looking up line information from the debug info during disassembly (despite the fact that it's correctly finding the right code address when you ask it to disassemble a symbol by name).
>
> So this is an unfortunate mismatch and not everything works right, as you have seen. Emscripten has a tool [emsymbolizer](https://github.com/emscripten-core/emscripten/blob/main/emsymbolizer.py) that knows a bunch of ways emscripten can store name/address information (e.g. DWARF, source maps, name sections) and can symbolize addresses. It papers over this problem using the `--adjust-vma` flag of llvm-symbolizer, but it currently only supports the use case of looking up a name or line from an address one at a time.
>
> We might be able to improve this situation. Adjusting how symbols are represented in LLVM is tricky, since they are used in various places in assembly, linking, etc. Ideally we also wouldn't need a bunch of special hacks in the tools such as llvm-objdump (although I wouldn't necessarily be above some kind of special case if it wasn't too horrible). [...]
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs