> -----Original Message-----
> From: Saurabh Singhal <[email protected]>
> Sent: Thursday, September 7, 2023 11:15 AM
> To: Thomas Monjalon <[email protected]>; Wu, Jingjing
> <[email protected]>; Xing, Beilei <[email protected]>
> Cc: [email protected]; Singhal, Saurabh <[email protected]>
> Subject: [PATCH v4] net/iavf: unregister intr handler before FD close
>
> Unregister VFIO interrupt handler before the interrupt fd gets closed in case
> iavf_dev_init() returns an error.
>
> dpdk creates a standalone thread named eal-intr-thread for processing
> interrupts for the PCI devices. The interrupt handler callbacks are registered
> by the VF driver(iavf, in this case).
>
> When we do a PCI probe of the network interfaces, we register an interrupt
> handler, open a vfio-device fd using ioctl, and an eventfd in dpdk. These
> interrupt sources are registered in a global linked list that the
> eal-intr-thread
> keeps iterating over for handling the interrupts. In our internal testing, we
> see
> eal-intr-thread crash in these two ways:
>
> Error adding fd 660 epoll_ctl, Operation not permitted
>
> or
>
> Error adding fd 660 epoll_ctl, Bad file descriptor
>
> epoll_ctl() returns EPERM if the target fd does not support poll.
> It returns EBADF when the epoll fd itself is closed or the target fd is
> closed.
>
> When the first type of crash happens, we see that the fd 660 is
> anon_inode:[vfio-device] which does not support poll.
>
> When the second type of crash happens, we could see from the fd map of
> the crashing process that the fd 660 was already closed.
>
> This means the said fd has been closed and in certain cases may have been
> reassigned to a different device by the operating system but the eal-intr-
> thread does not know about it.
>
> We observed that these crashes were always accompanied by an error in
> iavf_dev_init() after rte_intr_callback_register() and
> iavf_enable_irq0() have already happened. In the error path, the
> intr_handle_fd was being closed but the interrupt handler wasn't being
> unregistered.
>
> The fix is to unregister the interrupt handle in the
> iavf_dev_init() error path.
>
> Ensure proper cleanup if iavf_security_init() or
> iavf_security_ctx_create() fail. Earlier, we were leaking memory by simply
> returning from iavf_dev_init().
Fixes: 22b123a36d07 ("net/avf: initialize PMD")
Fixes: 6bc987ecb860 ("net/iavf: support IPsec inline crypto")
Cc: [email protected]
>
> Signed-off-by: Saurabh Singhal <[email protected]>
Acked-by: Qi Zhang <[email protected]>
Applied to dpdk-next-net-intel.
Thanks
Qi