On 03/24/2018 01:13 PM, John Fastabend wrote:
> After the qdisc lock was dropped in pfifo_fast we allow multiple
> enqueue threads and dequeue threads to run in parallel. On the
> enqueue side the skb bit ooo_okay is used to ensure all related
> skbs are enqueued in-order. On the dequeue side though there is
> no similar logic. What we observe is with fewer queues than CPUs
> it is possible to re-order packets when two instances of
> __qdisc_run() are running in parallel. Each thread will dequeue
> a skb and then whichever thread calls the ndo op first will
> be sent on the wire. This doesn't typically happen because
> qdisc_run() is usually triggered by the same core that did the
> enqueue. However, drivers will trigger __netif_schedule()
> when queues are transitioning from stopped to awake using the
> netif_tx_wake_* APIs. When this happens netif_schedule() calls
> qdisc_run() on the same CPU that did the netif_tx_wake_* which
> is usually done in the interrupt completion context. This CPU
> is selected with the irq affinity which is unrelated to the
> enqueue operations.
> 
> To resolve this we add a RUNNING bit to the qdisc to ensure
> only a single dequeue per qdisc is running. Enqueue and dequeue
> operations can still run in parallel and also on multi queue
> NICs we can still have a dequeue in-flight per qdisc, which
> is typically per CPU.
> 
> Fixes: c5ad119fb6c0 ("net: sched: pfifo_fast use skb_array")
> Reported-by: Jakob Unterwurzacher <[email protected]>
> Signed-off-by: John Fastabend <[email protected]>
> ---
>  include/net/sch_generic.h |    1 +
>  net/sched/sch_generic.c   |   13 ++++++++++---
>  2 files changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index 2092d33..8da3267 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -30,6 +30,7 @@ struct qdisc_rate_table {
>  enum qdisc_state_t {
>       __QDISC_STATE_SCHED,
>       __QDISC_STATE_DEACTIVATED,
> +     __QDISC_STATE_RUNNING,
>  };
>  
>  struct qdisc_size_table {
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 7e3fbe9..29a1b47 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -377,12 +377,17 @@ static inline bool qdisc_restart(struct Qdisc *q, int 
> *packets)
>       struct netdev_queue *txq;
>       struct net_device *dev;
>       struct sk_buff *skb;
> -     bool validate;
> +     bool more, validate;
>  
>       /* Dequeue packet */
> +     if (test_and_set_bit(__QDISC_STATE_RUNNING, &q->state))
> +             return false;
> +
>       skb = dequeue_skb(q, &validate, packets);
> -     if (unlikely(!skb))
> +     if (unlikely(!skb)) {
> +             clear_bit(__QDISC_STATE_RUNNING, &q->state);
>               return false;
> +     }
>  
>       if (!(q->flags & TCQ_F_NOLOCK))
>               root_lock = qdisc_lock(q);
> @@ -390,7 +395,9 @@ static inline bool qdisc_restart(struct Qdisc *q, int 
> *packets)
>       dev = qdisc_dev(q);
>       txq = skb_get_tx_queue(dev, skb);
>  
> -     return sch_direct_xmit(skb, q, dev, txq, root_lock, validate);
> +     more = sch_direct_xmit(skb, q, dev, txq, root_lock, validate);
> +     clear_bit(__QDISC_STATE_RUNNING, &q->state);
> +     return more;
>  }
>  
>  void __qdisc_run(struct Qdisc *q)
> 


This adds a pair of atomic operations in fast path, only for pfifo_fast sake.

qdisc_restart() name is misleading, this is used from __qdisc_run()


Reply via email to