On Thu, Sep 29, 2016 at 11:15 AM, Eric Dumazet <eric.duma...@gmail.com> wrote: > On Thu, 2016-09-29 at 07:51 -0700, Eric Dumazet wrote: >> On Thu, 2016-09-29 at 10:08 -0400, Tom Herbert wrote: >> >> > It addresses the issue that Rick Jones pointed out was happening with >> > XPS. When packets are sent for a flow that has no socket and XPS is >> > enabled then each packet uses the XPS queue based on the running CPU. >> > Since the thread sending on a flow can be rescheduled on different >> > CPUs this is creating ooo packets. In this case the ooo is being >> > caused by interaction with XPS. >> > >> >> Nope, your patch does not address the problem properly. >> >> I am not sure I want to spend more time explaining the issue. >> >> Lets talk about this in Tokyo next week. >> > > Just as a reminder, sorry to bother you, stating some obvious facts for > both of us. We have public exchanges, so we also need to re-explain how > things work. > > Queue selection on xmit happens before we hit the qdisc and its delays. > > So when you access txq->dql.num_completed_ops and > txq->dql.num_enqueue_ops you can observe values that do not change for a > while. > > Say a thread runs on a VM, and sends 2 packets P1, P2 on the same flow > (skb_get_hash() returns the same value for these 2 packets) > > P1 is sent on behalf of CPU 1, we pickup queue txq1, and queue the > packet on its qdisc . Transmit does not happen because of some > constraints like rate limiting or scheduling constraints. > > P2 is sent on behalf of CPU 2, we pickup queue txq2, notice that prior > packet chose txq1. We check txq1->dql and decide it is fine to use txq2, > since the dql params of txq1 were not changed yet. > > ( txq->dql.num_completed_ops == ent.queue_ptr ) > > Note that in RFS case, we have the guarantee that we observe 'live > queues' since they are the per cpu backlog. > > So input_queue_head_incr() and input_queue_tail_incr_save() are > correctly doing the OOO prevention, because a queued packet immediately > changes the state. > > So really your patch works if you have no qdisc, or a non congested > qdisc. (Think if P1 is dropped by a full pfifo or pfifo_fast : We really > want to avoid steering P2, P3, ..., PN on this full pfifo while maybe > other txq are idle). Strange attractors are back (check commit > 9b462d02d6dd6 ) > Understood.
> You could avoid (ab)using BQL with a different method, grabbing > skb->destructor for the packets that are socketless > > The hash table would simply track the sum of skb->truesize to allow flow > migration. This would be self contained and not intrusive. > Okay, will look that. > > >