Hi Jakub, On Mon, 2 Sep 2024 18:55:43 -0700 Jakub Kicinski <k...@kernel.org> wrote:
> On Thu, 29 Aug 2024 18:15:30 +0200 Maxime Chevallier wrote: > > @@ -582,15 +591,12 @@ static void fs_timeout_work(struct work_struct *work) > > > > dev->stats.tx_errors++; > > > > - spin_lock_irqsave(&fep->lock, flags); > > - > > - if (dev->flags & IFF_UP) { > > - phy_stop(dev->phydev); > > - (*fep->ops->stop)(dev); > > - (*fep->ops->restart)(dev); > > - } > > + rtnl_lock(); > > so we take rtnl_lock here.. > > > + phylink_stop(fep->phylink); > > + phylink_start(fep->phylink); > > + rtnl_unlock(); > > > > - phy_start(dev->phydev); > > + spin_lock_irqsave(&fep->lock, flags); > > wake = fep->tx_free >= MAX_SKB_FRAGS && > > !(CBDR_SC(fep->cur_tx) & BD_ENET_TX_READY); > > spin_unlock_irqrestore(&fep->lock, flags); > > > @@ -717,19 +686,18 @@ static int fs_enet_close(struct net_device *dev) > > unsigned long flags; > > > > netif_stop_queue(dev); > > - netif_carrier_off(dev); > > napi_disable(&fep->napi); > > cancel_work_sync(&fep->timeout_work); > > ..and cancel_work_sync() under rtnl_lock here? > > IDK if removing the the "dev->flags & IFF_UP" check counts as > meaningfully making it worse, but we're going in the wrong direction. > The _sync() has to go, and the timeout work needs to check if device > has been closed under rtnl_lock ? Arg that's true, I didn't consider that call path at all... Sorry about that, I'll indeed rework that to address this deadlock waiting to happen. Thanks, Maxime