On Thu, 29 Jan 2026 10:34:34 +0200
Shani Peretz <[email protected]> wrote:
> During cleanup, a race condition existed:
>
> Main Thread: Event Dispatch Thread:
> 1. Remove fds from fdset while (1) {
> 2. Close file descriptors epoll_wait() [gets interrupted]
> 3. rte_eal_cleanup() [continues loop]
> 4. Unmap hugepages Accesses fdset... CRASH
> }
>
> There was no explicit cleanup of the fdset structure.
> The fdset structure is allocated with rte_zmalloc() and the memory would
> only be reclaimed at application shutdown when rte_eal_cleanup() is called,
> which invokes rte_eal_memory_detach() to unmap all the hugepage memory.
> Meanwhile, the event dispatch thread could still be running and accessing
> the fdset.
>
> The code had a `destroy` flag that the event dispatch thread checked,
> but it was never set during cleanup, and the code never waited for
> the thread to actually exit before freeing memory.
>
> To fix this, the commit implements fdset_destroy() that sets the destroy
> flag with mutex protection, waits for thread termination, and cleans up
> all resources including the fdset memory.
>
> Update socket.c to call fdset_destroy() when the last vhost-user socket
> is unregistered.
>
> Fixes: 0e38b42bf61c ("vhost: manage FD with epoll")
> Cc: [email protected]
>
> Signed-off-by: Shani Peretz <[email protected]>
It is preferable not to use posix mutex in DPDK code.
Can this be done with regular locks or better yet stdatomic instead.