On Thu, 13 Feb 2025 16:30:59 -0800 Jacob Keller wrote: > Analyzing the places where we take crit_lock in the driver there are two > sources: > > a) several of the work queue tasks including adminq_task, watchdog_task, > reset_task, and the finish_config task. > > b) various callbacks which ultimately stem back to .ndo operations or > ethtool operations. > > The latter cannot be triggered until after the netdevice registration is > completed successfully. > > The iAVF driver uses alloc_ordered_workqueue, which is an unbound workqueue > that has a max limit of 1, and thus guarantees that only a single work item > on the queue is executing at any given time, so none of the other work > threads could be executing due to the ordered workqueue guarantees. > > The iavf_finish_config() function also does not do anything else after > register_netdevice, unless it fails. It seems unlikely that the driver > private crit_lock is protecting anything that register_netdevice() itself > touches. > > Thus, to fix this ABBA lock violation, lets simply release the > adapter->crit_lock as well as netdev_lock prior to calling > register_netdevice(). We do still keep holding the RTNL lock as required by > the function. If we do fail to register the netdevice, then we re-acquire > the adapter critical lock to finish the transition back to > __IAVF_INIT_CONFIG_ADAPTER. > > This ensures every call where both netdev_lock and the adapter->crit_lock > are acquired under the same ordering.
Thanks a lot for figuring this out, much appreciated! Reviewed-by: Jakub Kicinski <k...@kernel.org>