Patrick McHardy wrote:
> Andrew Morton wrote:
> 
>>>http://bugzilla.kernel.org/show_bug.cgi?id=8736
>>>
>>>Here is another scenario I bumped onto - qdisc_watchdog_cancel() and
>>>qdisc_restart() deadlock.
>>>
>>>[...]
>>>DEADLOCK!
> 
> 
> 
> Good catch.
> 
> Please try reverting commit 1936502d00ae6c2aa3931c42f6cf54afaba094f2,
> that should fix it.


Ranko, did you get a chance to test this? I've attached the patch
since it doesn't revert cleanly ..

[NET_SCHED]: Revert "avoid transmit softirq on watchdog wakeup" optimization

As noticed by Ranko Zivojnovic <[EMAIL PROTECTED]>, calling qdisc_run
from the timer handler can result in deadlock:

> CPU#0
>
> qdisc_watchdog() fires and gets dev->queue_lock
> qdisc_run()...qdisc_restart()...
> -> releases dev->queue_lock and enters dev_hard_start_xmit()
>
> CPU#1
>
> tc del qdisc dev ...
> qdisc_graft()...dev_graft_qdisc()...dev_deactivate()...
> -> grabs dev->queue_lock ...
>
> qdisc_reset()...{cbq,hfsc,htb,netem,tbf}_reset()...qdisc_watchdog_cancel()...
> -> hrtimer_cancel() - waiting for the qdisc_watchdog() to exit, while still
>                       holding dev->queue_lock
>
> CPU#0
>
> dev_hard_start_xmit() returns ...
> -> wants to get dev->queue_lock(!)
>
> DEADLOCK!

The entire optimization is a bit questionable IMO, it moves potentially
large parts of NET_TX_SOFTIRQ work to TIMER_SOFTIRQ/HRTIMER_SOFTIRQ,
which kind of defeats the separation of them.

Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index d92ea26..4fd0bec 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -278,11 +278,7 @@ static enum hrtimer_restart qdisc_watchdog(struct hrtimer 
*timer)
 
        wd->qdisc->flags &= ~TCQ_F_THROTTLED;
        smp_wmb();
-       if (spin_trylock(&dev->queue_lock)) {
-               qdisc_run(dev);
-               spin_unlock(&dev->queue_lock);
-       } else
-               netif_schedule(dev);
+       netif_schedule(dev);
 
        return HRTIMER_NORESTART;
 }

Reply via email to