The issue arose due to the change in the DPDK read-write lock implementation. That change added a new flag, RTE_RWLOCK_WAIT, designed to prevent new read locks while a write lock is in the queue. However, this change has led to a scenario where a recursive read lock, where a lock is acquired twice by the same execution thread, can initiate a sequence of events resulting in a deadlock:
Process 1 takes the first read lock. Process 2 attempts to take a write lock, triggering RTE_RWLOCK_WAIT due to the presence of a read lock. This makes process 2 enter a wait loop until the read lock is released. Process 1 tries to take a second read lock. However, since a write lock is waiting (due to RTE_RWLOCK_WAIT), it also enters a wait loop until the write lock is acquired and then released. Both processes end up in a blocked state, unable to proceed, resulting in a deadlock scenario. Following these changes, the RW-lock no longer supports recursion, implying that a single thread shouldn't obtain a read lock if it already possesses one. The problem arises during initialization: the rte_eal_init() function acquires the memory_hotplug_lock, and later on, there are sequences of calls leading to rte_memseg_list_walk() which acquires it again without releasing it. This scenario introduces the risk of a potential deadlock when concurrent write locks are applied to the same memory_hotplug_lock. To address this we resolved the issue by replacing rte_memseg_list_walk() with rte_memseg_list_walk_thread_unsafe(). Implementing a lock annotation for rte_memseg_list_walk() to proactively identify bugs similar to this one during compile time. Artemy Kovalyov (2): eal: fix memory initialization deadlock eal: annotate rte_memseg_list_walk() lib/eal/common/eal_common_dynmem.c | 5 ++++- lib/eal/common/eal_memalloc.h | 3 ++- lib/eal/common/eal_private.h | 3 ++- lib/eal/include/generic/rte_rwlock.h | 4 ++++ lib/eal/include/rte_lock_annotations.h | 5 +++++ lib/eal/include/rte_memory.h | 4 +++- lib/eal/linux/eal_memalloc.c | 7 +++++-- 7 files changed, 25 insertions(+), 6 deletions(-) -- 1.8.3.1