On Tue, 05 Dec 2006 13:37:37 -0800 Roland Dreier <[EMAIL PROTECTED]> wrote:
> > a) Ban the calling of flush_scheduled_work() from under rtnl_lock(). > > Sounds hard. > > Unfortunate if this is happening a lot. It seems like the most > sensible fix -- flush_scheduled_work() is in effect calling into > an unknown and changeable in the future set of functions (since it > waits for them to finish), and it seems error-prone to hold a lock > across such a call. yes, I agree. It's really bad to be calling flush_scheduled_work() with any locks held at all. Fragile, hard-to-maintain, source of once-in-a-blue-moon failures, etc. I guess lockdep will help. But running flush_scheduled_work() from within dev_close() is a very sensible thing to do, and dev_close is called under rtnl_lock(). davem is -> thattaway ;) > > This will almost work, as long as it's done in workqueue.c with > > appropriate locking. The bug occurs when some other CPU is running > > phy_change() right now - we'll end up freeing data which that CPU is > > presently playing with. > > > > But perhaps we can take care of this within workqueue.c. We need a > > cancel function which will cancel the work and, if its callback is > > presently executing it will block until that execution has completed. > > I may be misunderstanding you, but this seems to deadlock in exactly > the same way: if someone calls this cancel routine holding rtnl_lock, > and the work function that will also take rtnl_lock has just started, > it will get stuck when the work function tries to take rtnl_lock. Ah. The point is that the phy code doesn't want to flush _all_ pending callbacks. It only wants to flush its own one. And its own one doesn't take rtnl_lock(). IOW, the phy code has no interest in running some random other subsystem's callback - it just wants to run its own. Hence no deadlock. Maybe the lesson here is that flush_scheduled_work() is a bad function. It should really be flush_this_work(struct work_struct *w). That is in fact what approximately 100% of the flush_scheduled_work() callers actually want to do. hmm. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/