On Thu, Sep 29, 2016 at 11:15 AM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On Thu, 2016-09-29 at 07:51 -0700, Eric Dumazet wrote:
>> On Thu, 2016-09-29 at 10:08 -0400, Tom Herbert wrote:
>>
>> > It addresses  the issue that Rick Jones pointed out was happening with
>> > XPS. When packets are sent for a flow that has no socket and XPS is
>> > enabled then each packet uses the XPS queue based on the running CPU.
>> > Since the thread sending on a flow can be rescheduled on different
>> > CPUs this is creating ooo packets. In this case the ooo is being
>> > caused by interaction with XPS.
>> >
>>
>> Nope, your patch does not address the problem properly.
>>
>> I am not sure I want to spend more time explaining the issue.
>>
>> Lets talk about this in Tokyo next week.
>>
>
> Just as a reminder, sorry to bother you, stating some obvious facts for
> both of us. We have public exchanges, so we also need to re-explain how
> things work.
>
> Queue selection on xmit happens before we hit the qdisc and its delays.
>
> So when you access txq->dql.num_completed_ops and
> txq->dql.num_enqueue_ops you can observe values that do not change for a
> while.
>
> Say a thread runs on a VM, and sends 2 packets P1, P2 on the same flow
> (skb_get_hash() returns the same value for these 2 packets)
>
> P1 is sent on behalf of CPU 1, we pickup queue txq1, and queue the
> packet on its qdisc . Transmit does not happen because of some
> constraints like rate limiting or scheduling constraints.
>
> P2 is sent on behalf of CPU 2, we pickup queue txq2, notice that prior
> packet chose txq1. We check txq1->dql and decide it is fine to use txq2,
> since the dql params of txq1 were not changed yet.
>
> ( txq->dql.num_completed_ops == ent.queue_ptr )
>
> Note that in RFS case, we have the guarantee that we observe 'live
> queues' since they are the per cpu backlog.
>
> So input_queue_head_incr() and input_queue_tail_incr_save() are
> correctly doing the OOO prevention, because a queued packet immediately
> changes the state.
>
> So really your patch works if you have no qdisc, or a non congested
> qdisc. (Think if P1 is dropped by a full pfifo or pfifo_fast : We really
> want to avoid steering P2, P3, ..., PN on this full pfifo while maybe
> other txq are idle). Strange attractors are back (check commit
> 9b462d02d6dd6 )
>
Understood.

> You could avoid (ab)using BQL with a different method, grabbing
> skb->destructor for the packets that are socketless
>
> The hash table would simply track the sum of skb->truesize to allow flow
> migration. This would be self contained and not intrusive.
>
Okay, will look that.

>
>
>

Reply via email to