A rare [1] race condition is observed between the igb_watchdog_task and shutdown on a dual-core i.MX6 based system with two I210 controllers.
Using printk, the igb_watchdog_task is hung in igb_read_phy_reg because __igb_shutdown has already called __igb_close. The fix is to delete timer and cancel the work after settting IGB_DOWN. This approach mirrors igb_up. reboot kworker __igb_shutdown rtnl_lock __igb_close : igb_watchdog_task : : : igb_read_phy_reg (hung) rtnl_unlock [1] Note that this is easier to reproduce with 'initcall_debug' logging and additional and printk logging in igb_main. Signed-off-by: Ian Ray <[email protected]> --- Changes in v2: - Change strategy to avoid taking RTNL. - Link to v1: https://lore.kernel.org/all/[email protected]/ --- drivers/net/ethernet/intel/igb/igb_main.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 9e9a5900e6e5..a65ae7925ae8 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -2175,10 +2175,14 @@ void igb_down(struct igb_adapter *adapter) u32 tctl, rctl; int i; - /* signal that we're down so the interrupt handler does not - * reschedule our watchdog timer + /* The watchdog timer may be rescheduled, so explicitly + * disable watchdog from being rescheduled. */ set_bit(__IGB_DOWN, &adapter->state); + timer_delete_sync(&adapter->watchdog_timer); + timer_delete_sync(&adapter->phy_info_timer); + + cancel_work_sync(&adapter->watchdog_task); /* disable receives in the hardware */ rctl = rd32(E1000_RCTL); @@ -2210,9 +2214,6 @@ void igb_down(struct igb_adapter *adapter) } } - timer_delete_sync(&adapter->watchdog_timer); - timer_delete_sync(&adapter->phy_info_timer); - /* record the stats before reset*/ spin_lock(&adapter->stats64_lock); igb_update_stats(adapter); -- 2.49.0
