Quoting Saeed Mahameed (2021-03-12 21:54:18) > On Fri, 2021-03-12 at 16:04 +0100, Antoine Tenart wrote: > > netif_set_xps_queue must be called with the rtnl lock taken, and this > > is > > now enforced using ASSERT_RTNL(). mlx5e_attach_netdev was taking the > > lock conditionally, fix this by taking the rtnl lock all the time. > > There is a reason why it is conditional: > we had a bug in the past of double locking here: > > [ 4255.283960] echo/644 is trying to acquire lock: > > [ 4255.285092] ffffffff85101f90 (rtnl_mutex){+..}, at: > mlx5e_attach_netdev0xd4/0×3d0 [mlx5_core] > > [ 4255.287264] > > [ 4255.287264] but task is already holding lock: > > [ 4255.288971] ffffffff85101f90 (rtnl_mutex){+..}, at: > ipoib_vlan_add0×7c/0×2d0 [ib_ipoib] > > ipoib_vlan_add is called under rtnl and will eventually call > mlx5e_attach_netdev, we don't have much control over this in mlx5 > driver since the rdma stack provides a per-prepared netdev to attach to > our hw. maybe it is time we had a nested rtnl lock ..
Not sure we want to add a nested rtnl lock because of xps. I'd like to see other options first. Could be having a locking mechanism for xps not relying on rtnl; if that's possible. As for this series, patches 6, 15 (this one) and 16 are not linked to and do not rely on the other patches. They're improvement or fixes for already existing behaviours. The series already gained enough new patches since v1 and I don't want to maintain it out-of-tree for too long, so I'll resend it without patches 6, 15 and 16; and then we'll be able to focus on the xps locking relationship with rtnl. Antoine