Cong Wang wrote: > On Thu, Sep 3, 2020 at 1:40 AM Paolo Abeni <pab...@redhat.com> wrote: > > > > On Wed, 2020-09-02 at 22:01 -0700, Cong Wang wrote: > > > Can you test the attached one-line fix? I think we are overthinking, > > > probably all > > > we need here is a busy wait. > > > > I think that will solve, but I also think that will kill NOLOCK > > performances due to really increased contention. > > Yeah, we somehow end up with more locks (seqlock, skb array lock) > for lockless qdisc. What an irony... ;)
I went back to the original nolock implementation code to try and figure out how this was working in the first place. After initial patch series we have this in __dev_xmit_skb() if (q->flags & TCQ_F_NOLOCK) { if (unlikely(test_bit(__QDISC_STATE_DEACTIVATED, &q->state))) { __qdisc_drop(skb, &to_free); rc = NET_XMIT_DROP; } else { rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK; __qdisc_run(q); } if (unlikely(to_free)) kfree_skb_list(to_free); return rc; } One important piece here is we used __qdisc_run(q) instead of what we have there now qdisc_run(q). Here is the latest code, if (q->flags & TCQ_F_NOLOCK) { rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK; qdisc_run(q); ... __qdisc_run is going to always go into a qdisc_restart loop and dequeue packets. There is no check here to see if another CPU is running or not. Compare that to qdisc_run() static inline void qdisc_run(struct Qdisc *q) { if (qdisc_run_begin(q)) { __qdisc_run(q); qdisc_run_end(q); } } Here we have all the racing around qdisc_is_running() that seems unsolvable. Seems we flipped __qdisc_run to qdisc_run here 32f7b44d0f566 ("sched: manipulate __QDISC_STATE_RUNNING in qdisc_run_* helpers"). Its not clear to me from thatpatch though why it was even done there? Maybe this would unlock us, diff --git a/net/core/dev.c b/net/core/dev.c index 7df6c9617321..9b09429103f1 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3749,7 +3749,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, if (q->flags & TCQ_F_NOLOCK) { rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK; - qdisc_run(q); + __qdisc_run(q); if (unlikely(to_free)) kfree_skb_list(to_free); Per other thread we also need the state deactivated check added back. > > > > > At this point I fear we could consider reverting the NOLOCK stuff. > > I personally would hate doing so, but it looks like NOLOCK benefits are > > outweighed by its issues. > > I agree, NOLOCK brings more pains than gains. There are many race > conditions hidden in generic qdisc layer, another one is enqueue vs. > reset which is being discussed in another thread. Sure. Seems they crept in over time. I had some plans to write a lockless HTB implementation. But with fq+EDT with BPF it seems that it is no longer needed, we have a more generic/better solution. So I dropped it. Also most folks should really be using fq, fq_codel, etc. by default anyways. Using pfifo_fast alone is not ideal IMO. Thanks, John