memory_hotplug_lock deadlock during initialization in Multi-process Mode on DPDK Version 22.11.3 LTS

2023-12-26 Thread Linzhe Lee
Dear Team,

I hope this message finds you well.

We have encountered a recurring deadlock issue within the function
rte_rwlock_write_lock in the DPDK version 22.11.3 LTS.

It appears to be related to a known matter addressed in
https://bugs.dpdk.org/show_bug.cgi?id=1277 and subsequently resolved
in version 23.11.

I kindly propose the backporting of this fix to the 22.11 branch,
considering its status as a long-term support (LTS) version.

This deadlock scenario significantly impacts the initialization of the
secondary program, rendering it unable to function correctly.

Here is a snippet of the secondary program's initiation call stack:

```
#0  0x013dd604 in rte_mcfg_mem_read_lock ()
#1  0x013def02 in rte_memseg_list_walk ()
#2  0x013fbc85 in eal_memalloc_init ()
#3  0x013df73b in rte_eal_memory_init ()
#4  0x00889cf5 in rte_eal_init.cold ()
#5  0x0088d094 in main () at ../app/status_server/main.cc:96
#6  0x7678e555 in __libc_start_main () from /lib64/libc.so.6
#7  0x009ca80d in _start () at
/opt/rh/devtoolset-9/root/usr/include/c++/9/bits/shared_ptr_base.h:1169
```


The main program's situation during this deadlock is as follows:

```
(gdb) thread 1
[Switching to thread 1 (Thread 0x77fdec00 (LWP 20071))]
#0  0x76b1d85d in nanosleep () from /lib64/libc.so.6
(gdb) bt
#0  0x76b1d85d in nanosleep () from /lib64/libc.so.6
#1  0x76b1d6f4 in sleep () from /lib64/libc.so.6
#2  0x006e1f24 in lcore_main (pInfo=) at
../app/main/main.c:682
#3  main () at ../app/main/main.c:1174
#4  0x76a7a555 in __libc_start_main () from /lib64/libc.so.6
#5  0x0081f57d in _start ()
(gdb) thread 2
[Switching to thread 2 (Thread 0x73c50700 (LWP 20166))]
#0  0x76e349dd in accept () from /lib64/libpthread.so.0
(gdb) bt
#0  0x76e349dd in accept () from /lib64/libpthread.so.0
#1  0x01172b23 in socket_listener ()
#2  0x76e2dea5 in start_thread () from /lib64/libpthread.so.0
#3  0x76b568dd in clone () from /lib64/libc.so.6
(gdb) thread 3
[Switching to thread 3 (Thread 0x74451700 (LWP 20157))]
#0  0x76e34bad in recvmsg () from /lib64/libpthread.so.0
(gdb) bt
#0  0x76e34bad in recvmsg () from /lib64/libpthread.so.0
#1  0x0115fce7 in mp_handle ()
#2  0x76e2dea5 in start_thread () from /lib64/libpthread.so.0
#3  0x76b568dd in clone () from /lib64/libc.so.6
(gdb) thread 4
[Switching to thread 4 (Thread 0x74c52700 (LWP 20156))]
#0  0x76b56eb3 in epoll_wait () from /lib64/libc.so.6
(gdb) bt
#0  0x76b56eb3 in epoll_wait () from /lib64/libc.so.6
#1  0x01169be4 in eal_intr_thread_main ()
#2  0x76e2dea5 in start_thread () from /lib64/libpthread.so.0
#3  0x76b568dd in clone () from /lib64/libc.so.6
```

Your assistance in resolving this matter or providing guidance on a
workaround would be greatly appreciated.

Thank you for your attention to this issue.


Re: memory_hotplug_lock deadlock during initialization in Multi-process Mode on DPDK Version 22.11.3 LTS

2023-12-27 Thread Linzhe Lee
Hi,

Testing on 22.11.4-rc3 confirms that this issue has been resolved.

Thank you very much.

David Marchand  于2023年12月27日周三 18:14写道:
>
> Hello,
>
> Cc: 22.11 stable maintainer for info
>
> On Wed, Dec 27, 2023 at 4:14 AM Linzhe Lee
>  wrote:
> >
> > Dear Team,
> >
> > I hope this message finds you well.
> >
> > We have encountered a recurring deadlock issue within the function
> > rte_rwlock_write_lock in the DPDK version 22.11.3 LTS.
> >
> > It appears to be related to a known matter addressed in
> > https://bugs.dpdk.org/show_bug.cgi?id=1277 and subsequently resolved
> > in version 23.11.
> >
> > I kindly propose the backporting of this fix to the 22.11 branch,
> > considering its status as a long-term support (LTS) version.
>
> As far as I can see, this fix is part of the 22.11.4-rc1 tag.
>
> A 22.11.4-rc3 tag was recently released.
> https://git.dpdk.org/dpdk-stable/tag/?h=v22.11.4-rc3
> Could you have a try with it?
>
>
> Thanks.
>
> --
> David Marchand
>


[dpdk-dev] [PATCH] mbuf: fix atomic refcnt update synchronization

2016-09-03 Thread Linzhe Lee
Thanks for reply, Stephen.



I'm in x86-64, my cpu is `Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz`.



When allocation mbuf in program1, and transfer it to program2 for free
via ring, the program1 might meet assert in allocate mbuf sometimes.
(`RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);`)



but when I using gdb to check it, the refcnt field of mbuf is already
zero. so I believe the problem came from the cache line problem or
incorrect optimization.



When apply this patch, the problem seems solved. I'm submitting it for
your comments.


2016-09-03 0:12 GMT+08:00 Stephen Hemminger :
> On Fri,  2 Sep 2016 13:25:06 +0800
> lilinzhe  wrote:
>
>> From: ??? 
>>
>> chagne atomic ref update to always call atomic_add
>>
>> when mbuf is allocated by cpu1 and freed by cpu2. cpu1 cache may not be 
>> updated by such a set operation.
>> causes refcnt reads incorrect values.
>
> What architecture are you dealing with? On X86 memory is cache coherent.
>
> Doing atomic operation all the time on each mbuf free would significantly
> slow down performance.
>


[dpdk-dev] [PATCH] mbuf: fix atomic refcnt update synchronization

2016-09-03 Thread Linzhe Lee
yes,stephen.

my config file here: http://pastebin.com/N0RKGArh

2016-09-03 0:51 GMT+08:00 Stephen Hemminger :
> On Sat, 3 Sep 2016 00:31:50 +0800
> Linzhe Lee  wrote:
>
>> Thanks for reply, Stephen.
>>
>>
>>
>> I'm in x86-64, my cpu is `Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz`.
>>
>>
>>
>> When allocation mbuf in program1, and transfer it to program2 for free
>> via ring, the program1 might meet assert in allocate mbuf sometimes.
>> (`RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);`)
>>
>>
>>
>> but when I using gdb to check it, the refcnt field of mbuf is already
>> zero. so I believe the problem came from the cache line problem or
>> incorrect optimization.
>>
>>
>>
>> When apply this patch, the problem seems solved. I'm submitting it for
>> your comments.
>
> Are you sure you have REFCNT_ATOMIC set?