On 2020/9/2 17:20, Eric Dumazet wrote: > > > On 9/2/20 1:14 AM, Yunsheng Lin wrote: >> On 2020/9/2 15:32, Eric Dumazet wrote: >>> >>> >>> On 9/1/20 11:34 PM, Yunsheng Lin wrote: >>> >>>> >>>> I am not familiar with TCQ_F_CAN_BYPASS. >>>> From my understanding, the problem is that there is no order between >>>> qdisc enqueuing and qdisc reset. >>> >>> Thw qdisc_reset() should be done after rcu grace period, when there is >>> guarantee no enqueue is in progress. >>> >>> qdisc_destroy() already has a qdisc_reset() call, I am not sure why >>> qdisc_deactivate() is also calling qdisc_reset() >> >> That is a good point. >> Do we allow skb left in qdisc when the qdisc is deactivated state? >> And qdisc_destroy() is not always called after qdisc_deactivate() is called. >> If we allow skb left in qdisc when the qdisc is deactivated state, then it is >> huge change of semantics for qdisc_deactivate(), and I am not sure how many >> cases will affected by this change. > > All I am saying is that the qdisc_reset() in qdisc_deactivate() seems > at the wrong place. > > This certainly can be deferred _after_ the rcu grace period we already have > in dev_deactivate_many() > > Something like this nicer patch.
Thanks for clarifying and the nicer patch. Reusing the existing synchronize_net() is a good idea. Some minor comment inline. > > net/sched/sch_generic.c | 22 ++++------------------ > 1 file changed, 4 insertions(+), 18 deletions(-) > > > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c > index > 265a61d011dfaa7ec0f8fb8aaede920784f690c9..0eaa99e4f8de643724c0942ee1a2e9516eed65c0 > 100644 > --- a/net/sched/sch_generic.c > +++ b/net/sched/sch_generic.c > @@ -1131,24 +1131,10 @@ EXPORT_SYMBOL(dev_activate); > > static void qdisc_deactivate(struct Qdisc *qdisc) > { > - bool nolock = qdisc->flags & TCQ_F_NOLOCK; > - > if (qdisc->flags & TCQ_F_BUILTIN) > return; > - if (test_bit(__QDISC_STATE_DEACTIVATED, &qdisc->state)) > - return; > - > - if (nolock) > - spin_lock_bh(&qdisc->seqlock); > - spin_lock_bh(qdisc_lock(qdisc)); > > set_bit(__QDISC_STATE_DEACTIVATED, &qdisc->state); > - > - qdisc_reset(qdisc); > - > - spin_unlock_bh(qdisc_lock(qdisc)); > - if (nolock) > - spin_unlock_bh(&qdisc->seqlock); > } > > static void dev_deactivate_queue(struct net_device *dev, > @@ -1184,6 +1170,9 @@ static bool some_qdisc_is_busy(struct net_device *dev) > val = (qdisc_is_running(q) || > test_bit(__QDISC_STATE_SCHED, &q->state)); > > + if (!val) > + qdisc_reset(q); It seems semantics for some_qdisc_is_busy() is changed, which does not only do the checking, but also do the reseting? Also, qdisc_reset() could be called multi times for the same qdisc if some_qdisc_is_busy() return true multi times? > + > spin_unlock_bh(root_lock); > > if (val) > @@ -1213,10 +1202,7 @@ void dev_deactivate_many(struct list_head *head) > dev_watchdog_down(dev); > } > > - /* Wait for outstanding qdisc-less dev_queue_xmit calls. > - * This is avoided if all devices are in dismantle phase : > - * Caller will call synchronize_net() for us > - */ > + /* Wait for outstanding dev_queue_xmit calls. */ > synchronize_net(); > > /* Wait for outstanding qdisc_run calls. */ > > > . >