On 2021-03-15 10:38, Antoine Tenart wrote:
Quoting Saeed Mahameed (2021-03-12 21:54:18)
On Fri, 2021-03-12 at 16:04 +0100, Antoine Tenart wrote:
netif_set_xps_queue must be called with the rtnl lock taken, and this
is
now enforced using ASSERT_RTNL(). mlx5e_attach_netdev was taking the
lock conditionally, fix this by taking the rtnl lock all the time.

Signed-off-by: Antoine Tenart <aten...@kernel.org>
---
  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 11 +++--------
  1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index ec2fcb2a2977..96cba86b9f0d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -5557,7 +5557,6 @@ static void mlx5e_update_features(struct
net_device *netdev)
 int mlx5e_attach_netdev(struct mlx5e_priv *priv)
  {
-       const bool take_rtnl = priv->netdev->reg_state ==
NETREG_REGISTERED;
         const struct mlx5e_profile *profile = priv->profile;
         int max_nch;
         int err;
@@ -5578,15 +5577,11 @@ int mlx5e_attach_netdev(struct mlx5e_priv
*priv)
          * 2. Set our default XPS cpumask.
          * 3. Build the RQT.
          *
-        * rtnl_lock is required by netif_set_real_num_*_queues in case
the
-        * netdev has been registered by this point (if this function
was called
-        * in the reload or resume flow).
+        * rtnl_lock is required by netif_set_xps_queue.
          */

There is a reason why it is conditional:
we had a bug in the past of double locking here:

[ 4255.283960] echo/644 is trying to acquire lock:

  [ 4255.285092] ffffffff85101f90 (rtnl_mutex){+..}, at:
mlx5e_attach_netdev0xd4/0×3d0 [mlx5_core]

  [ 4255.287264]

  [ 4255.287264] but task is already holding lock:

  [ 4255.288971] ffffffff85101f90 (rtnl_mutex){+..}, at:
ipoib_vlan_add0×7c/0×2d0 [ib_ipoib]

ipoib_vlan_add is called under rtnl and will eventually call
mlx5e_attach_netdev, we don't have much control over this in mlx5
driver since the rdma stack provides a per-prepared netdev to attach to
our hw. maybe it is time we had a nested rtnl lock ..

Thanks for the explanation. So as you said, we can't based the locking
decision only on the driver own state / information...

What about `take_rtnl = !rtnl_is_locked();`?

It won't work, because the lock may be taken by some other unrelated thread. By doing `if (!rtnl_is_locked()) rtnl_lock()` we defeat the purpose of the lock, because we will proceed to the critical section even if we should wait until some other thread releases the lock.

Thanks!
Antoine


Reply via email to