Hi Jakub,

On Mon, 2 Sep 2024 18:55:43 -0700
Jakub Kicinski <k...@kernel.org> wrote:

> On Thu, 29 Aug 2024 18:15:30 +0200 Maxime Chevallier wrote:
> > @@ -582,15 +591,12 @@ static void fs_timeout_work(struct work_struct *work)
> >  
> >     dev->stats.tx_errors++;
> >  
> > -   spin_lock_irqsave(&fep->lock, flags);
> > -
> > -   if (dev->flags & IFF_UP) {
> > -           phy_stop(dev->phydev);
> > -           (*fep->ops->stop)(dev);
> > -           (*fep->ops->restart)(dev);
> > -   }
> > +   rtnl_lock();  
> 
> so we take rtnl_lock here..
> 
> > +   phylink_stop(fep->phylink);
> > +   phylink_start(fep->phylink);
> > +   rtnl_unlock();
> >  
> > -   phy_start(dev->phydev);
> > +   spin_lock_irqsave(&fep->lock, flags);
> >     wake = fep->tx_free >= MAX_SKB_FRAGS &&
> >            !(CBDR_SC(fep->cur_tx) & BD_ENET_TX_READY);
> >     spin_unlock_irqrestore(&fep->lock, flags);  
> 
> > @@ -717,19 +686,18 @@ static int fs_enet_close(struct net_device *dev)
> >     unsigned long flags;
> >  
> >     netif_stop_queue(dev);
> > -   netif_carrier_off(dev);
> >     napi_disable(&fep->napi);
> >     cancel_work_sync(&fep->timeout_work);  
> 
> ..and cancel_work_sync() under rtnl_lock here?
> 
> IDK if removing the the "dev->flags & IFF_UP" check counts as
> meaningfully making it worse, but we're going in the wrong direction.
> The _sync() has to go, and the timeout work needs to check if device
> has been closed under rtnl_lock ?

Arg that's true, I didn't consider that call path at all... Sorry about
that, I'll indeed rework that to address this deadlock waiting to
happen.

Thanks,

Maxime

Reply via email to