Patrick McHardy wrote: > Andrew Morton wrote: > >>>http://bugzilla.kernel.org/show_bug.cgi?id=8736 >>> >>>Here is another scenario I bumped onto - qdisc_watchdog_cancel() and >>>qdisc_restart() deadlock. >>> >>>[...] >>>DEADLOCK! > > > > Good catch. > > Please try reverting commit 1936502d00ae6c2aa3931c42f6cf54afaba094f2, > that should fix it.
Ranko, did you get a chance to test this? I've attached the patch since it doesn't revert cleanly ..
[NET_SCHED]: Revert "avoid transmit softirq on watchdog wakeup" optimization As noticed by Ranko Zivojnovic <[EMAIL PROTECTED]>, calling qdisc_run from the timer handler can result in deadlock: > CPU#0 > > qdisc_watchdog() fires and gets dev->queue_lock > qdisc_run()...qdisc_restart()... > -> releases dev->queue_lock and enters dev_hard_start_xmit() > > CPU#1 > > tc del qdisc dev ... > qdisc_graft()...dev_graft_qdisc()...dev_deactivate()... > -> grabs dev->queue_lock ... > > qdisc_reset()...{cbq,hfsc,htb,netem,tbf}_reset()...qdisc_watchdog_cancel()... > -> hrtimer_cancel() - waiting for the qdisc_watchdog() to exit, while still > holding dev->queue_lock > > CPU#0 > > dev_hard_start_xmit() returns ... > -> wants to get dev->queue_lock(!) > > DEADLOCK! The entire optimization is a bit questionable IMO, it moves potentially large parts of NET_TX_SOFTIRQ work to TIMER_SOFTIRQ/HRTIMER_SOFTIRQ, which kind of defeats the separation of them. Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index d92ea26..4fd0bec 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -278,11 +278,7 @@ static enum hrtimer_restart qdisc_watchdog(struct hrtimer *timer) wd->qdisc->flags &= ~TCQ_F_THROTTLED; smp_wmb(); - if (spin_trylock(&dev->queue_lock)) { - qdisc_run(dev); - spin_unlock(&dev->queue_lock); - } else - netif_schedule(dev); + netif_schedule(dev); return HRTIMER_NORESTART; }