memory_hotplug_lock deadlock during initialization in Multi-process Mode on DPDK Version 22.11.3 LTS
Dear Team, I hope this message finds you well. We have encountered a recurring deadlock issue within the function rte_rwlock_write_lock in the DPDK version 22.11.3 LTS. It appears to be related to a known matter addressed in https://bugs.dpdk.org/show_bug.cgi?id=1277 and subsequently resolved in version 23.11. I kindly propose the backporting of this fix to the 22.11 branch, considering its status as a long-term support (LTS) version. This deadlock scenario significantly impacts the initialization of the secondary program, rendering it unable to function correctly. Here is a snippet of the secondary program's initiation call stack: ``` #0 0x013dd604 in rte_mcfg_mem_read_lock () #1 0x013def02 in rte_memseg_list_walk () #2 0x013fbc85 in eal_memalloc_init () #3 0x013df73b in rte_eal_memory_init () #4 0x00889cf5 in rte_eal_init.cold () #5 0x0088d094 in main () at ../app/status_server/main.cc:96 #6 0x7678e555 in __libc_start_main () from /lib64/libc.so.6 #7 0x009ca80d in _start () at /opt/rh/devtoolset-9/root/usr/include/c++/9/bits/shared_ptr_base.h:1169 ``` The main program's situation during this deadlock is as follows: ``` (gdb) thread 1 [Switching to thread 1 (Thread 0x77fdec00 (LWP 20071))] #0 0x76b1d85d in nanosleep () from /lib64/libc.so.6 (gdb) bt #0 0x76b1d85d in nanosleep () from /lib64/libc.so.6 #1 0x76b1d6f4 in sleep () from /lib64/libc.so.6 #2 0x006e1f24 in lcore_main (pInfo=) at ../app/main/main.c:682 #3 main () at ../app/main/main.c:1174 #4 0x76a7a555 in __libc_start_main () from /lib64/libc.so.6 #5 0x0081f57d in _start () (gdb) thread 2 [Switching to thread 2 (Thread 0x73c50700 (LWP 20166))] #0 0x76e349dd in accept () from /lib64/libpthread.so.0 (gdb) bt #0 0x76e349dd in accept () from /lib64/libpthread.so.0 #1 0x01172b23 in socket_listener () #2 0x76e2dea5 in start_thread () from /lib64/libpthread.so.0 #3 0x76b568dd in clone () from /lib64/libc.so.6 (gdb) thread 3 [Switching to thread 3 (Thread 0x74451700 (LWP 20157))] #0 0x76e34bad in recvmsg () from /lib64/libpthread.so.0 (gdb) bt #0 0x76e34bad in recvmsg () from /lib64/libpthread.so.0 #1 0x0115fce7 in mp_handle () #2 0x76e2dea5 in start_thread () from /lib64/libpthread.so.0 #3 0x76b568dd in clone () from /lib64/libc.so.6 (gdb) thread 4 [Switching to thread 4 (Thread 0x74c52700 (LWP 20156))] #0 0x76b56eb3 in epoll_wait () from /lib64/libc.so.6 (gdb) bt #0 0x76b56eb3 in epoll_wait () from /lib64/libc.so.6 #1 0x01169be4 in eal_intr_thread_main () #2 0x76e2dea5 in start_thread () from /lib64/libpthread.so.0 #3 0x76b568dd in clone () from /lib64/libc.so.6 ``` Your assistance in resolving this matter or providing guidance on a workaround would be greatly appreciated. Thank you for your attention to this issue.
Re: memory_hotplug_lock deadlock during initialization in Multi-process Mode on DPDK Version 22.11.3 LTS
Hi, Testing on 22.11.4-rc3 confirms that this issue has been resolved. Thank you very much. David Marchand 于2023年12月27日周三 18:14写道: > > Hello, > > Cc: 22.11 stable maintainer for info > > On Wed, Dec 27, 2023 at 4:14 AM Linzhe Lee > wrote: > > > > Dear Team, > > > > I hope this message finds you well. > > > > We have encountered a recurring deadlock issue within the function > > rte_rwlock_write_lock in the DPDK version 22.11.3 LTS. > > > > It appears to be related to a known matter addressed in > > https://bugs.dpdk.org/show_bug.cgi?id=1277 and subsequently resolved > > in version 23.11. > > > > I kindly propose the backporting of this fix to the 22.11 branch, > > considering its status as a long-term support (LTS) version. > > As far as I can see, this fix is part of the 22.11.4-rc1 tag. > > A 22.11.4-rc3 tag was recently released. > https://git.dpdk.org/dpdk-stable/tag/?h=v22.11.4-rc3 > Could you have a try with it? > > > Thanks. > > -- > David Marchand >
[dpdk-dev] [PATCH] mbuf: fix atomic refcnt update synchronization
Thanks for reply, Stephen. I'm in x86-64, my cpu is `Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz`. When allocation mbuf in program1, and transfer it to program2 for free via ring, the program1 might meet assert in allocate mbuf sometimes. (`RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);`) but when I using gdb to check it, the refcnt field of mbuf is already zero. so I believe the problem came from the cache line problem or incorrect optimization. When apply this patch, the problem seems solved. I'm submitting it for your comments. 2016-09-03 0:12 GMT+08:00 Stephen Hemminger : > On Fri, 2 Sep 2016 13:25:06 +0800 > lilinzhe wrote: > >> From: ??? >> >> chagne atomic ref update to always call atomic_add >> >> when mbuf is allocated by cpu1 and freed by cpu2. cpu1 cache may not be >> updated by such a set operation. >> causes refcnt reads incorrect values. > > What architecture are you dealing with? On X86 memory is cache coherent. > > Doing atomic operation all the time on each mbuf free would significantly > slow down performance. >
[dpdk-dev] [PATCH] mbuf: fix atomic refcnt update synchronization
yes,stephen. my config file here: http://pastebin.com/N0RKGArh 2016-09-03 0:51 GMT+08:00 Stephen Hemminger : > On Sat, 3 Sep 2016 00:31:50 +0800 > Linzhe Lee wrote: > >> Thanks for reply, Stephen. >> >> >> >> I'm in x86-64, my cpu is `Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz`. >> >> >> >> When allocation mbuf in program1, and transfer it to program2 for free >> via ring, the program1 might meet assert in allocate mbuf sometimes. >> (`RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);`) >> >> >> >> but when I using gdb to check it, the refcnt field of mbuf is already >> zero. so I believe the problem came from the cache line problem or >> incorrect optimization. >> >> >> >> When apply this patch, the problem seems solved. I'm submitting it for >> your comments. > > Are you sure you have REFCNT_ATOMIC set?