On Thu, 15 Feb 2007 23:40:32 -0800 Ben Greear <[EMAIL PROTECTED]> wrote:
> Jarek Poplawski wrote: > > On 14-02-2007 22:27, Stephen Hemminger wrote: > > > >> Ben found this but the problem seems pretty widespread. > >> > >> The following places are subject to deadlock between flush_scheduled_work > >> and the RTNL mutex. What can happen is that a work queue routine (like > >> bridge port_carrier_check) is waiting forever for RTNL, and the driver > >> routine has called flush_scheduled_work with RTNL held and is waiting > >> for the work queue to clear. > >> > >> Several other places have comments like: "can't call flush_scheduled_work > >> here or it will deadlock". Most of the problem places are in device close > >> routine. My recommendation would be to add a check for device > >> netif_running in > >> what ever work routine is used, and move the flush_scheduled_work to the > >> remove routine. > >> > >> 8139too.c: rtl8139_close --> rtl8139_stop_thread > >> r8169.c: rtl8169_down > >> cassini.c: cas_change_mtu > >> iseries_veth.c: veth_stop_connection > >> s2io.c: s2io_close > >> sis190.c: sis190_down > >> > >> > > > > There is probably more than this... > > > > Maybe there should be something like an ASSERT_NOT_RTNL() in the > flush_scheduled_work() > method? If it's performance criticial, #ifdef it out if we're not > debugging locks? > You can't safely add a check like that. What if another cpu had acquired RTNL and was unrelated. -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html