Issue 165563
Summary [OMPT][Offload] OpenBLAS + offload causes Segmentation Fault if tool is attached
Labels
Assignees
Reporter Thyre
    I've originally ran into this issue with the ROCm compilers, but this is reproducible with the LLVM trunk as well. So here we go...

Using my daily LLVM build:

```
$ clang --version
clang version 22.0.0git (https://github.com/llvm/llvm-project.git e9804584f75c1ab267431c43a0928a8b0a3814f0)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/apps/software/Clang/trunk/bin
Build config: +assertions
```

and the latest development commit of OpenBLAS ([0c59ae0](https://github.com/OpenMathLib/OpenBLAS/commit/0c59ae0b45a8f30224f045902bc558381d6f8974)), I'm able to trigger a segmentation fault within `libomptarget` when trying to execute any arbitrary application.
There are a few conditions that need to be met first:

1. OpenBLAS needs to be built with OpenMP support
2. Some OpenMP Tools Interface tool needs to be attached. I've used [ompt-printf](https://github.com/FZJ-JSC/ompt-printf) here, but we saw the same effect with Score-P and the thread sanitizer as well.

```console
$ # OpenBLAS was built with: make CC=clang CXX=clang++ F77=flang F90=flang FC=flang USE_OPENMP=1 USE_THREAD=1 PREFIX=$(pwd)/_install
$ cat test.c
int main( void ) {}
$ clang -fopenmp --offload-arch=gfx1101 test.c -lopenblas -L$(pwd)/_install/lib -Wl,-rpath,$(pwd)/_install/lib
$ ldd ./a.out
 linux-vdso.so.1 (0x00007fffc022f000)
        libopenblas.so.0 => /home/jreuter/Sources/OpenBLAS/_install/lib/libopenblas.so.0 (0x000074b09c000000)
        libomp.so => /opt/apps/software/Clang/trunk/lib/x86_64-unknown-linux-gnu/libomp.so (0x000074b09cfd0000)
        libomptarget.so.22.0git => /opt/apps/software/Clang/trunk/lib/x86_64-unknown-linux-gnu/libomptarget.so.22.0git (0x000074b097400000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x000074b097000000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x000074b09bf19000)
        libatomic.so.1 => /lib/x86_64-linux-gnu/libatomic.so.1 (0x000074b09cfb0000)
 /lib64/ld-linux-x86-64.so.2 (0x000074b09d0d6000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x000074b09cf94000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x000074b096c00000)
 libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x000074b09bef9000)
$ env | grep OMP_TOOL_LIBRARIES
OMP_TOOL_LIBRARIES=/home/jreuter/Projects/perftools-misc/ompt-printf/_build/src/libompt-printf.so
$ ./a.out
[-1][ompt_start_tool] Chosen printf mode: 3
[-1][ompt_start_tool] omp_version = 201611 | runtime_version = LLVM OMP version: 5.0.20140926
[1] 4068330 segmentation fault (core dumped)  ./a.out
$ unset OMP_TOOL_LIBRARIES
$ clang -fopenmp --offload-arch=gfx1101 test.c -lopenblas -L$(pwd)/_install/lib -Wl,-rpath,$(pwd)/_install/lib -Xarch_host -fsanitize=thread
$ ./a.out
ThreadSanitizer:DEADLYSIGNAL
==4068918==ERROR: ThreadSanitizer: SEGV on unknown address 0x0000000002b8 (pc 0x7ffff1e97ef4 bp 0x000000000001 sp 0x7fffffffce38 T4068918)
==4068918==The signal is caused by a READ memory access.
==4068918==Hint: address points to the zero page.
 #0 __pthread_mutex_lock nptl/pthread_mutex_lock.c:80:23 (libc.so.6+0x97ef4) (BuildId: 4f7b0c955c3d81d7cac1501a2498b69d1d82bfe7)
 #1 pthread_mutex_lock /opt/apps/sources/LLVM/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1426:13 (a.out+0x6662c)
    #2 omp_get_num_devices <null> (libomptarget.so.22.0git+0x253761)
    #3 ompt_post_init <null> (libomp.so+0xc3052)
    #4 __kmp_do_middle_initialize() kmp_runtime.cpp (libomp.so+0x412b7)
    #5 __kmp_middle_initialize <null> (libomp.so+0x4128b)
    #6 omp_get_num_places@@VERSION <null> (libomp.so+0xbdf57)
    #7 blas_get_cpu_number <null> (libopenblas.so.0+0x316ff6)
    #8 gotoblas_init <null> (libopenblas.so.0+0x317b61)
    #9 call_init elf/dl-init.c:70:3 (ld-linux-x86-64.so.2+0x647d) (BuildId: acaf96d7b1a6bad57b559d646233d5dc1a23257c)
    #10 call_init elf/dl-init.c:33:6 (ld-linux-x86-64.so.2+0x6567) (BuildId: acaf96d7b1a6bad57b559d646233d5dc1a23257c)
    #11 _dl_init elf/dl-init.c:117:5 (ld-linux-x86-64.so.2+0x6567)
    #12 <null> <null> (ld-linux-x86-64.so.2+0x202c9) (BuildId: acaf96d7b1a6bad57b559d646233d5dc1a23257c)

==4068918==Register values:
rax = 0x0000000000000000  rbx = 0x00000000000002a8  rcx = 0x2000000000000000  rdx = 0x0000000000000000
rdi = 0x00000000000002a8  rsi = 0x00007fffffffcde0  rbp = 0x0000000000000001  rsp = 0x00007fffffffce38
 r8 = 0x6000000000000000   r9 = 0x0fffff0000000000  r10 = 0xffffff0000000000  r11 = 0x0000000000000000
r12 = 0x0000000000000000  r13 = 0x00007fffffffd008  r14 = 0x00007ffff6ddb800  r15 = 0x00005555555ba558
ThreadSanitizer can not provide additional info.
SUMMARY: ThreadSanitizer: SEGV nptl/pthread_mutex_lock.c:80:23 in __pthread_mutex_lock
==4068918==ABORTING
```

The stack trace suggests that the very early OpenMP function call of OpenBLAS during `_dl_start_user` causes issues with `omp_get_num_devices()`, though the exact position is not visible, since I haven't built LLVM with debug symbols enabled. I'll try to do that next, if my storage space permits it...

---

A workaround is to use `LD_PRELOAD=$LLVM_PATH/lib/x86_64-unknown-linux-gnu/libomptarget.so.22.0git`, suggesting that this is related to some data structures not properly initialized yet.

If we set this, the program works as expected:

```console
LD_PRELOAD=$LLVM_PATH/lib/x86_64-unknown-linux-gnu/libomptarget.so.22.0git ./a.out jreuter@zam226
[-1][ompt_start_tool] Chosen printf mode: 3
[-1][ompt_start_tool] omp_version = 201611 | runtime_version = LLVM OMP version: 5.0.20140926
[-1][tool_initialize] lookup = 0x714d203c41e0 | initial_device_num = 0 | tool_data = 0x714d213ff700
[-1][tool_initialize] thread_begin = always
[-1][tool_initialize]         thread_end = always
[-1][tool_initialize]     parallel_begin = always
[-1][tool_initialize]       parallel_end = always
[-1][tool_initialize]        task_create = always
[-1][tool_initialize]      task_schedule = always
[-1][tool_initialize]      implicit_task = always
[-1][tool_initialize]   sync_region_wait = always
[-1][tool_initialize]     mutex_released = always
[-1][tool_initialize]        dependences = always
[-1][tool_initialize]    task_dependence = always
[-1][tool_initialize]               work = always
[-1][tool_initialize]             masked = always
[-1][tool_initialize]        sync_region = always
[-1][tool_initialize]          lock_init = always
[-1][tool_initialize]       lock_destroy = always
[-1][tool_initialize]      mutex_acquire = always
[-1][tool_initialize]     mutex_acquired = always
[-1][tool_initialize]          nest_lock = always
[-1][tool_initialize]              flush = always
[-1][tool_initialize]             cancel = always
[-1][tool_initialize]          reduction = always
[-1][tool_initialize]           dispatch = always
[-1][tool_initialize]       control_tool = always
[-1][tool_initialize]  device_initialize = always
[-1][tool_initialize]    device_finalize = always
[-1][tool_initialize]        device_load = always
[-1][tool_initialize]      device_unload = never
[-1][tool_initialize]         target_emi = always
[-1][tool_initialize]     target_map_emi = never
[-1][tool_initialize]         target_map = never
[-1][tool_initialize] target_data_op_emi = always
[-1][tool_initialize]  target_submit_emi = always
[0][callback_thread_begin] thread_type = initial | thread_data = 0x61fba01eca88
[0][callback_implicit_task] endpoint = begin | parallel_data->value = 0 (0x61fba01eb1e0) | task_data->value = 555000001 (0x61fba01ebb00) | actual_parallelism = 1 | index = 1 | flags = initial
[0][callback_device_initialize] device_num = 0 | type = gfx1101 | device = 0x61fba02604f0 | lookup = 0x714d203c45f0 | documentation = (null)
[0][callback_device_initialize] device_num = 0 | set_trace_ompt not found
[0][callback_device_finalize] device_num = 0
[0][callback_implicit_task] endpoint = end | parallel_data->value = 0 (0x61fba01eb1e0) | task_data->value = 555000001 (0x61fba01ebb00) | actual_parallelism = 0 | index = 1 | flags = initial
[0][callback_thread_end] thread_data = 0x61fba01eca88
[0][tool_finalize] tool_data = 0x714d213ff700
```
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to