Dear Team,

I hope this message finds you well.

We have encountered a recurring deadlock issue within the function
rte_rwlock_write_lock in the DPDK version 22.11.3 LTS.

It appears to be related to a known matter addressed in
https://bugs.dpdk.org/show_bug.cgi?id=1277 and subsequently resolved
in version 23.11.

I kindly propose the backporting of this fix to the 22.11 branch,
considering its status as a long-term support (LTS) version.

This deadlock scenario significantly impacts the initialization of the
secondary program, rendering it unable to function correctly.

Here is a snippet of the secondary program's initiation call stack:

```
#0  0x00000000013dd604 in rte_mcfg_mem_read_lock ()
#1  0x00000000013def02 in rte_memseg_list_walk ()
#2  0x00000000013fbc85 in eal_memalloc_init ()
#3  0x00000000013df73b in rte_eal_memory_init ()
#4  0x0000000000889cf5 in rte_eal_init.cold ()
#5  0x000000000088d094 in main () at ../app/status_server/main.cc:96
#6  0x00007ffff678e555 in __libc_start_main () from /lib64/libc.so.6
#7  0x00000000009ca80d in _start () at
/opt/rh/devtoolset-9/root/usr/include/c++/9/bits/shared_ptr_base.h:1169
```


The main program's situation during this deadlock is as follows:

```
(gdb) thread 1
[Switching to thread 1 (Thread 0x7ffff7fdec00 (LWP 20071))]
#0  0x00007ffff6b1d85d in nanosleep () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff6b1d85d in nanosleep () from /lib64/libc.so.6
#1  0x00007ffff6b1d6f4 in sleep () from /lib64/libc.so.6
#2  0x00000000006e1f24 in lcore_main (pInfo=<synthetic pointer>) at
../app/main/main.c:682
#3  main () at ../app/main/main.c:1174
#4  0x00007ffff6a7a555 in __libc_start_main () from /lib64/libc.so.6
#5  0x000000000081f57d in _start ()
(gdb) thread 2
[Switching to thread 2 (Thread 0x7ffff3c50700 (LWP 20166))]
#0  0x00007ffff6e349dd in accept () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007ffff6e349dd in accept () from /lib64/libpthread.so.0
#1  0x0000000001172b23 in socket_listener ()
#2  0x00007ffff6e2dea5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007ffff6b568dd in clone () from /lib64/libc.so.6
(gdb) thread 3
[Switching to thread 3 (Thread 0x7ffff4451700 (LWP 20157))]
#0  0x00007ffff6e34bad in recvmsg () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007ffff6e34bad in recvmsg () from /lib64/libpthread.so.0
#1  0x000000000115fce7 in mp_handle ()
#2  0x00007ffff6e2dea5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007ffff6b568dd in clone () from /lib64/libc.so.6
(gdb) thread 4
[Switching to thread 4 (Thread 0x7ffff4c52700 (LWP 20156))]
#0  0x00007ffff6b56eb3 in epoll_wait () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff6b56eb3 in epoll_wait () from /lib64/libc.so.6
#1  0x0000000001169be4 in eal_intr_thread_main ()
#2  0x00007ffff6e2dea5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007ffff6b568dd in clone () from /lib64/libc.so.6
```

Your assistance in resolving this matter or providing guidance on a
workaround would be greatly appreciated.

Thank you for your attention to this issue.

Reply via email to