Re: netconsole deadlock with virtnet

Jakub Kicinski Mon, 23 Nov 2020 11:23:38 -0800

On Mon, 23 Nov 2020 14:09:34 -0500 Steven Rostedt wrote:
> On Mon, 23 Nov 2020 10:52:52 -0800
> Jakub Kicinski <k...@kernel.org> wrote:
> 
> > On Mon, 23 Nov 2020 09:31:28 -0500 Steven Rostedt wrote:  
> > > On Mon, 23 Nov 2020 13:08:55 +0200
> > > Leon Romanovsky <l...@kernel.org> wrote:
> > > 
> > >     
> > > >  [   10.028024] Chain exists of:
> > > >  [   10.028025]   console_owner --> target_list_lock --> _xmit_ETHER#2  
> > > >     
> > > 
> > > Note, the problem is that we have a location that grabs the xmit_lock 
> > > while
> > > holding target_list_lock (and possibly console_owner).    
> > 
> > Well, it try_locks the xmit_lock. Does lockdep understand try-locks?
> > 
> > (not that I condone the shenanigans that are going on here)  
> 
> Does it?
> 
>       virtnet_poll_tx() {
>               __netif_tx_lock() {
>                       spin_lock(&txq->_xmit_lock);


Umpf. Right. I was looking at virtnet_poll_cleantx()

> That looks like we can have:
> 
> 
>       CPU0            CPU1
>       ----            ----
>    lock(xmit_lock)
> 
>                   lock(console)
>                   lock(target_list_lock)
>                   __netif_tx_lock()
>                       lock(xmit_lock);
> 
>                       [BLOCKED]
> 
>    <interrupt>
>    lock(console)
> 
>    [BLOCKED]
> 
> 
> 
>  DEADLOCK.
> 
> 
> So where is the trylock here?
> 
> Perhaps you need the trylock in virtnet_poll_tx()?

That could work. Best if we used normal lock if !!budget, and trylock
when budget is 0. But maybe that's too hairy.

I'm assuming all this trickiness comes from virtqueue_get_buf() needing
locking vs the TX path? It's pretty unusual for the completion path to
need locking vs xmit path.

Re: netconsole deadlock with virtnet

Reply via email to