On Sat, Mar 24, 2018 at 10:25 PM, John Fastabend <john.fastab...@gmail.com> wrote: > After the qdisc lock was dropped in pfifo_fast we allow multiple > enqueue threads and dequeue threads to run in parallel. On the > enqueue side the skb bit ooo_okay is used to ensure all related > skbs are enqueued in-order. On the dequeue side though there is > no similar logic. What we observe is with fewer queues than CPUs > it is possible to re-order packets when two instances of > __qdisc_run() are running in parallel. Each thread will dequeue > a skb and then whichever thread calls the ndo op first will > be sent on the wire. This doesn't typically happen because > qdisc_run() is usually triggered by the same core that did the > enqueue. However, drivers will trigger __netif_schedule() > when queues are transitioning from stopped to awake using the > netif_tx_wake_* APIs. When this happens netif_schedule() calls > qdisc_run() on the same CPU that did the netif_tx_wake_* which > is usually done in the interrupt completion context. This CPU > is selected with the irq affinity which is unrelated to the > enqueue operations.
Interesting. Why this is unique to pfifo_fast? For me it could happen to other qdisc's too, when we release the qdisc root lock in sch_direct_xmit(), another CPU could dequeue from the same qdisc and transmit the skb in parallel too? ... > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c > index 7e3fbe9..39c144b 100644 > --- a/net/sched/sch_generic.c > +++ b/net/sched/sch_generic.c > @@ -373,24 +373,33 @@ bool sch_direct_xmit(struct sk_buff *skb, struct Qdisc > *q, > */ > static inline bool qdisc_restart(struct Qdisc *q, int *packets) > { > + bool more, validate, nolock = q->flags & TCQ_F_NOLOCK; > spinlock_t *root_lock = NULL; > struct netdev_queue *txq; > struct net_device *dev; > struct sk_buff *skb; > - bool validate; > > /* Dequeue packet */ > + if (nolock && test_and_set_bit(__QDISC_STATE_RUNNING, &q->state)) > + return false; > + Nit: you probably want to move the comment below this if check, or simply remove it since it is useless...