| Issue |
165563
|
| Summary |
[OMPT][Offload] OpenBLAS + offload causes Segmentation Fault if tool is attached
|
| Labels |
|
| Assignees |
|
| Reporter |
Thyre
|
I've originally ran into this issue with the ROCm compilers, but this is reproducible with the LLVM trunk as well. So here we go...
Using my daily LLVM build:
```
$ clang --version
clang version 22.0.0git (https://github.com/llvm/llvm-project.git e9804584f75c1ab267431c43a0928a8b0a3814f0)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/apps/software/Clang/trunk/bin
Build config: +assertions
```
and the latest development commit of OpenBLAS ([0c59ae0](https://github.com/OpenMathLib/OpenBLAS/commit/0c59ae0b45a8f30224f045902bc558381d6f8974)), I'm able to trigger a segmentation fault within `libomptarget` when trying to execute any arbitrary application.
There are a few conditions that need to be met first:
1. OpenBLAS needs to be built with OpenMP support
2. Some OpenMP Tools Interface tool needs to be attached. I've used [ompt-printf](https://github.com/FZJ-JSC/ompt-printf) here, but we saw the same effect with Score-P and the thread sanitizer as well.
```console
$ # OpenBLAS was built with: make CC=clang CXX=clang++ F77=flang F90=flang FC=flang USE_OPENMP=1 USE_THREAD=1 PREFIX=$(pwd)/_install
$ cat test.c
int main( void ) {}
$ clang -fopenmp --offload-arch=gfx1101 test.c -lopenblas -L$(pwd)/_install/lib -Wl,-rpath,$(pwd)/_install/lib
$ ldd ./a.out
linux-vdso.so.1 (0x00007fffc022f000)
libopenblas.so.0 => /home/jreuter/Sources/OpenBLAS/_install/lib/libopenblas.so.0 (0x000074b09c000000)
libomp.so => /opt/apps/software/Clang/trunk/lib/x86_64-unknown-linux-gnu/libomp.so (0x000074b09cfd0000)
libomptarget.so.22.0git => /opt/apps/software/Clang/trunk/lib/x86_64-unknown-linux-gnu/libomptarget.so.22.0git (0x000074b097400000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x000074b097000000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x000074b09bf19000)
libatomic.so.1 => /lib/x86_64-linux-gnu/libatomic.so.1 (0x000074b09cfb0000)
/lib64/ld-linux-x86-64.so.2 (0x000074b09d0d6000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x000074b09cf94000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x000074b096c00000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x000074b09bef9000)
$ env | grep OMP_TOOL_LIBRARIES
OMP_TOOL_LIBRARIES=/home/jreuter/Projects/perftools-misc/ompt-printf/_build/src/libompt-printf.so
$ ./a.out
[-1][ompt_start_tool] Chosen printf mode: 3
[-1][ompt_start_tool] omp_version = 201611 | runtime_version = LLVM OMP version: 5.0.20140926
[1] 4068330 segmentation fault (core dumped) ./a.out
$ unset OMP_TOOL_LIBRARIES
$ clang -fopenmp --offload-arch=gfx1101 test.c -lopenblas -L$(pwd)/_install/lib -Wl,-rpath,$(pwd)/_install/lib -Xarch_host -fsanitize=thread
$ ./a.out
ThreadSanitizer:DEADLYSIGNAL
==4068918==ERROR: ThreadSanitizer: SEGV on unknown address 0x0000000002b8 (pc 0x7ffff1e97ef4 bp 0x000000000001 sp 0x7fffffffce38 T4068918)
==4068918==The signal is caused by a READ memory access.
==4068918==Hint: address points to the zero page.
#0 __pthread_mutex_lock nptl/pthread_mutex_lock.c:80:23 (libc.so.6+0x97ef4) (BuildId: 4f7b0c955c3d81d7cac1501a2498b69d1d82bfe7)
#1 pthread_mutex_lock /opt/apps/sources/LLVM/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1426:13 (a.out+0x6662c)
#2 omp_get_num_devices <null> (libomptarget.so.22.0git+0x253761)
#3 ompt_post_init <null> (libomp.so+0xc3052)
#4 __kmp_do_middle_initialize() kmp_runtime.cpp (libomp.so+0x412b7)
#5 __kmp_middle_initialize <null> (libomp.so+0x4128b)
#6 omp_get_num_places@@VERSION <null> (libomp.so+0xbdf57)
#7 blas_get_cpu_number <null> (libopenblas.so.0+0x316ff6)
#8 gotoblas_init <null> (libopenblas.so.0+0x317b61)
#9 call_init elf/dl-init.c:70:3 (ld-linux-x86-64.so.2+0x647d) (BuildId: acaf96d7b1a6bad57b559d646233d5dc1a23257c)
#10 call_init elf/dl-init.c:33:6 (ld-linux-x86-64.so.2+0x6567) (BuildId: acaf96d7b1a6bad57b559d646233d5dc1a23257c)
#11 _dl_init elf/dl-init.c:117:5 (ld-linux-x86-64.so.2+0x6567)
#12 <null> <null> (ld-linux-x86-64.so.2+0x202c9) (BuildId: acaf96d7b1a6bad57b559d646233d5dc1a23257c)
==4068918==Register values:
rax = 0x0000000000000000 rbx = 0x00000000000002a8 rcx = 0x2000000000000000 rdx = 0x0000000000000000
rdi = 0x00000000000002a8 rsi = 0x00007fffffffcde0 rbp = 0x0000000000000001 rsp = 0x00007fffffffce38
r8 = 0x6000000000000000 r9 = 0x0fffff0000000000 r10 = 0xffffff0000000000 r11 = 0x0000000000000000
r12 = 0x0000000000000000 r13 = 0x00007fffffffd008 r14 = 0x00007ffff6ddb800 r15 = 0x00005555555ba558
ThreadSanitizer can not provide additional info.
SUMMARY: ThreadSanitizer: SEGV nptl/pthread_mutex_lock.c:80:23 in __pthread_mutex_lock
==4068918==ABORTING
```
The stack trace suggests that the very early OpenMP function call of OpenBLAS during `_dl_start_user` causes issues with `omp_get_num_devices()`, though the exact position is not visible, since I haven't built LLVM with debug symbols enabled. I'll try to do that next, if my storage space permits it...
---
A workaround is to use `LD_PRELOAD=$LLVM_PATH/lib/x86_64-unknown-linux-gnu/libomptarget.so.22.0git`, suggesting that this is related to some data structures not properly initialized yet.
If we set this, the program works as expected:
```console
LD_PRELOAD=$LLVM_PATH/lib/x86_64-unknown-linux-gnu/libomptarget.so.22.0git ./a.out jreuter@zam226
[-1][ompt_start_tool] Chosen printf mode: 3
[-1][ompt_start_tool] omp_version = 201611 | runtime_version = LLVM OMP version: 5.0.20140926
[-1][tool_initialize] lookup = 0x714d203c41e0 | initial_device_num = 0 | tool_data = 0x714d213ff700
[-1][tool_initialize] thread_begin = always
[-1][tool_initialize] thread_end = always
[-1][tool_initialize] parallel_begin = always
[-1][tool_initialize] parallel_end = always
[-1][tool_initialize] task_create = always
[-1][tool_initialize] task_schedule = always
[-1][tool_initialize] implicit_task = always
[-1][tool_initialize] sync_region_wait = always
[-1][tool_initialize] mutex_released = always
[-1][tool_initialize] dependences = always
[-1][tool_initialize] task_dependence = always
[-1][tool_initialize] work = always
[-1][tool_initialize] masked = always
[-1][tool_initialize] sync_region = always
[-1][tool_initialize] lock_init = always
[-1][tool_initialize] lock_destroy = always
[-1][tool_initialize] mutex_acquire = always
[-1][tool_initialize] mutex_acquired = always
[-1][tool_initialize] nest_lock = always
[-1][tool_initialize] flush = always
[-1][tool_initialize] cancel = always
[-1][tool_initialize] reduction = always
[-1][tool_initialize] dispatch = always
[-1][tool_initialize] control_tool = always
[-1][tool_initialize] device_initialize = always
[-1][tool_initialize] device_finalize = always
[-1][tool_initialize] device_load = always
[-1][tool_initialize] device_unload = never
[-1][tool_initialize] target_emi = always
[-1][tool_initialize] target_map_emi = never
[-1][tool_initialize] target_map = never
[-1][tool_initialize] target_data_op_emi = always
[-1][tool_initialize] target_submit_emi = always
[0][callback_thread_begin] thread_type = initial | thread_data = 0x61fba01eca88
[0][callback_implicit_task] endpoint = begin | parallel_data->value = 0 (0x61fba01eb1e0) | task_data->value = 555000001 (0x61fba01ebb00) | actual_parallelism = 1 | index = 1 | flags = initial
[0][callback_device_initialize] device_num = 0 | type = gfx1101 | device = 0x61fba02604f0 | lookup = 0x714d203c45f0 | documentation = (null)
[0][callback_device_initialize] device_num = 0 | set_trace_ompt not found
[0][callback_device_finalize] device_num = 0
[0][callback_implicit_task] endpoint = end | parallel_data->value = 0 (0x61fba01eb1e0) | task_data->value = 555000001 (0x61fba01ebb00) | actual_parallelism = 0 | index = 1 | flags = initial
[0][callback_thread_end] thread_data = 0x61fba01eca88
[0][tool_finalize] tool_data = 0x714d213ff700
```
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs