On Mon, 23 Nov 2020 14:09:34 -0500 Steven Rostedt wrote: > On Mon, 23 Nov 2020 10:52:52 -0800 > Jakub Kicinski <k...@kernel.org> wrote: > > > On Mon, 23 Nov 2020 09:31:28 -0500 Steven Rostedt wrote: > > > On Mon, 23 Nov 2020 13:08:55 +0200 > > > Leon Romanovsky <l...@kernel.org> wrote: > > > > > > > > > > [ 10.028024] Chain exists of: > > > > [ 10.028025] console_owner --> target_list_lock --> _xmit_ETHER#2 > > > > > > > > > > Note, the problem is that we have a location that grabs the xmit_lock > > > while > > > holding target_list_lock (and possibly console_owner). > > > > Well, it try_locks the xmit_lock. Does lockdep understand try-locks? > > > > (not that I condone the shenanigans that are going on here) > > Does it? > > virtnet_poll_tx() { > __netif_tx_lock() { > spin_lock(&txq->_xmit_lock);
Umpf. Right. I was looking at virtnet_poll_cleantx() > That looks like we can have: > > > CPU0 CPU1 > ---- ---- > lock(xmit_lock) > > lock(console) > lock(target_list_lock) > __netif_tx_lock() > lock(xmit_lock); > > [BLOCKED] > > <interrupt> > lock(console) > > [BLOCKED] > > > > DEADLOCK. > > > So where is the trylock here? > > Perhaps you need the trylock in virtnet_poll_tx()? That could work. Best if we used normal lock if !!budget, and trylock when budget is 0. But maybe that's too hairy. I'm assuming all this trickiness comes from virtqueue_get_buf() needing locking vs the TX path? It's pretty unusual for the completion path to need locking vs xmit path.