On 05. 11. 24 17:50, Morten Brørup wrote:
From: Stephen Hemminger [mailto:step...@networkplumber.org]
Sent: Tuesday, 5 November 2024 16.55
On Tue, 5 Nov 2024 09:49:39 +0100
Morten Brørup <m...@smartsharesystems.com> wrote:
I suspect AF_PACKET provides an intermediate step which can buffer
more
or spread out the work.
Agree. It's a Linux scheduling issue.
With DPDK polling, there is no interrupt in the kernel scheduler.
If the CPU core running the DPDK polling thread is running some other
thread when the packets arrive on the hardware, the DPDK polling thread
is NOT scheduled immediately, but has to wait for the kernel scheduler
to switch to this thread instead of the other thread.
Quite a lot of time can pass before this happens - the kernel
scheduler does not know that the DPDK polling thread has urgent work
pending.
And the number of RX descriptors needs to be big enough to absorb all
packets arriving during the scheduling delay.
It is not well described how to *guarantee* that nothing but the DPDK
polling thread runs on a dedicated CPU core.
That why any non-trivial DPDK application needs to run on isolated
cpu's.
Exactly.
And it is non-trivial and not well described how to do this.
Especially in virtual environments.
E.g. I ran some scheduling latency tests earlier today, and frequently observed
500-1000 us scheduling latency under vmware vSphere ESXi. This requires a large
number of RX descriptors to absorb without packet loss. (Disclaimer: The
virtual machine configuration had not been optimized. Tweaking the knobs
offered by the hypervisor might improve this.)
The exact same firmware (same kernel, rootfs, libraries, applications etc.) running
directly on our purpose-built hardware has scheduling latency very close to the kernel's
default "timerslack" (50 us).
Thanks for the feedback, I am currently not 100% I ran my earlier
experiments on isolcpus and whether it had a massive impact or not.
But here is a decent guide on latency tuning I found the other day
though virtual environments are not exactly described.
https://rigtorp.se/low-latency-guide/