On Tue, Oct 22, 2024 at 05:21:53PM +0000, Joe Damato wrote: > e1000_down calls netif_queue_set_napi, which assumes that RTNL is held. > > There are a few paths for e1000_down to be called in e1000 where RTNL is > not currently being held: > - e1000_shutdown (pci shutdown) > - e1000_suspend (power management) > - e1000_reinit_locked (via e1000_reset_task delayed work) > > Hold RTNL in two places to fix this issue: > - e1000_reset_task > - __e1000_shutdown (which is called from both e1000_shutdown and > e1000_suspend).
It looks like there's one other spot I missed: e1000_io_error_detected (pci error handler) which should also hold rtnl_lock: + if (netif_running(netdev)) { + rtnl_lock(); e1000_down(adapter); + rtnl_unlock(); + } I can send that update in the v2, but I'll wait to see if Intel has suggestions on the below. > The other paths which call e1000_down seemingly hold RTNL and are OK: > - e1000_close (ndo_stop) > - e1000_change_mtu (ndo_change_mtu) > > I'm submitting this is as an RFC because: > - the e1000_reinit_locked issue appears very similar to commit > 21f857f0321d ("e1000e: add rtnl_lock() to e1000_reset_task"), which > fixes a similar issue in e1000e > > however > > - adding rtnl to e1000_reinit_locked seemingly conflicts with an > earlier e1000 commit b2f963bfaeba ("e1000: fix lockdep warning in > e1000_reset_task"). > > Hopefully Intel can weigh in and shed some light on the correct way to > go. > > Fixes: 8f7ff18a5ec7 ("e1000: Link NAPI instances to queues and IRQs") > Reported-by: Dmitry Antipov <dmanti...@yandex.ru> > Closes: > https://lore.kernel.org/netdev/8cf62307-1965-46a0-a411-ff0080090...@yandex.ru/ > Signed-off-by: Joe Damato <jdam...@fastly.com>