I have figured it out. Two issues.
1) skb->xmit_more is hardly ever set under virtualization because the
qdisc is usually bypassed because of TCQ_F_CAN_BYPASS. Once
TCQ_F_CAN_BYPASS is set a virtual NIC driver is not likely see
skb->xmit_more (this answers my "how does this work at all" question).
2) If that flag is turned off (I patched sched_generic to turn it off in
pfifo_fast while testing), DQL keeps xmit_more from being set. If the
driver is not DQL enabled xmit_more is never ever set. If the driver is
DQL enabled the queue is adjusted to ensure xmit_more stops happening
within 10-15 xmit cycles.
That is plain *wrong* for virtual NICs - virtio, emulated NICs, etc.
There, the BIG cost is telling the hypervisor that it needs to "kick"
the packets. The cost of putting them into the vNIC buffers is
negligible. You want xmit_more to happen - it makes between 50% and 300%
(depending on vNIC design) difference. If there is no xmit_more the vNIC
will immediately "kick" the hypervisor and try to signal that the
packet needs to move straight away (as for example in virtio_net).
In addition to that, the perceived line rate is proportional to this
cost, so I am not sure that the current dql math holds. In fact, I think
it does not - it is trying to adjust something which influences the
perceived line rate.
So - how do we turn BOTH bypass and DQL adjustment while under
virtualization and set them to be "always qdisc" + "always xmit_more
allowed"
A.
P.S. Cc-ing virtio maintainer
A.
On 08/05/17 08:15, Anton Ivanov wrote:
Hi all,
I was revising some of my old work for UML to prepare it for
submission and I noticed that skb->xmit_more does not seem to be set
any more.
I traced the issue as far as net/sched/sched_generic.c
try_bulk_dequeue_skb() is never invoked (the drivers I am working on
are dql enabled so that is not the problem).
More interestingly, if I put a breakpoint and debug output into
dequeue_skb() around line 147 - right before the bulk: tag that skb
there is always NULL. ???
Similarly, debug in pfifo_fast_dequeue shows only NULLs being
dequeued. Again - ???
First and foremost, I apologize for the silly question, but how can
this work at all? I see the skbs showing up at the driver level, why
are NULLs being returned at qdisc dequeue and where do the skbs at the
driver level come from?
Second, where should I look to fix it?
A.
--
Anton R. Ivanov
Cambridge Greys Limited, England company No 10273661
http://www.cambridgegreys.com/