On 2021-03-15 10:38, Antoine Tenart wrote:
Quoting Saeed Mahameed (2021-03-12 21:54:18)
On Fri, 2021-03-12 at 16:04 +0100, Antoine Tenart wrote:
netif_set_xps_queue must be called with the rtnl lock taken, and this
is
now enforced using ASSERT_RTNL(). mlx5e_attach_netdev was taking the
lock conditionally, fix this by taking the rtnl lock all the time.
Signed-off-by: Antoine Tenart <aten...@kernel.org>
---
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 11 +++--------
1 file changed, 3 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index ec2fcb2a2977..96cba86b9f0d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -5557,7 +5557,6 @@ static void mlx5e_update_features(struct
net_device *netdev)
int mlx5e_attach_netdev(struct mlx5e_priv *priv)
{
- const bool take_rtnl = priv->netdev->reg_state ==
NETREG_REGISTERED;
const struct mlx5e_profile *profile = priv->profile;
int max_nch;
int err;
@@ -5578,15 +5577,11 @@ int mlx5e_attach_netdev(struct mlx5e_priv
*priv)
* 2. Set our default XPS cpumask.
* 3. Build the RQT.
*
- * rtnl_lock is required by netif_set_real_num_*_queues in case
the
- * netdev has been registered by this point (if this function
was called
- * in the reload or resume flow).
+ * rtnl_lock is required by netif_set_xps_queue.
*/
There is a reason why it is conditional:
we had a bug in the past of double locking here:
[ 4255.283960] echo/644 is trying to acquire lock:
[ 4255.285092] ffffffff85101f90 (rtnl_mutex){+..}, at:
mlx5e_attach_netdev0xd4/0×3d0 [mlx5_core]
[ 4255.287264]
[ 4255.287264] but task is already holding lock:
[ 4255.288971] ffffffff85101f90 (rtnl_mutex){+..}, at:
ipoib_vlan_add0×7c/0×2d0 [ib_ipoib]
ipoib_vlan_add is called under rtnl and will eventually call
mlx5e_attach_netdev, we don't have much control over this in mlx5
driver since the rdma stack provides a per-prepared netdev to attach to
our hw. maybe it is time we had a nested rtnl lock ..
Thanks for the explanation. So as you said, we can't based the locking
decision only on the driver own state / information...
What about `take_rtnl = !rtnl_is_locked();`?
It won't work, because the lock may be taken by some other unrelated
thread. By doing `if (!rtnl_is_locked()) rtnl_lock()` we defeat the
purpose of the lock, because we will proceed to the critical section
even if we should wait until some other thread releases the lock.
Thanks!
Antoine