On Thu, 13 Feb 2025 16:30:59 -0800 Jacob Keller wrote:
> Analyzing the places where we take crit_lock in the driver there are two
> sources:
> 
> a) several of the work queue tasks including adminq_task, watchdog_task,
> reset_task, and the finish_config task.
> 
> b) various callbacks which ultimately stem back to .ndo operations or
> ethtool operations.
> 
> The latter cannot be triggered until after the netdevice registration is
> completed successfully.
> 
> The iAVF driver uses alloc_ordered_workqueue, which is an unbound workqueue
> that has a max limit of 1, and thus guarantees that only a single work item
> on the queue is executing at any given time, so none of the other work
> threads could be executing due to the ordered workqueue guarantees.
> 
> The iavf_finish_config() function also does not do anything else after
> register_netdevice, unless it fails. It seems unlikely that the driver
> private crit_lock is protecting anything that register_netdevice() itself
> touches.
> 
> Thus, to fix this ABBA lock violation, lets simply release the
> adapter->crit_lock as well as netdev_lock prior to calling
> register_netdevice(). We do still keep holding the RTNL lock as required by
> the function. If we do fail to register the netdevice, then we re-acquire
> the adapter critical lock to finish the transition back to
> __IAVF_INIT_CONFIG_ADAPTER.
> 
> This ensures every call where both netdev_lock and the adapter->crit_lock
> are acquired under the same ordering.

Thanks a lot for figuring this out, much appreciated!

Reviewed-by: Jakub Kicinski <k...@kernel.org>

Reply via email to