> From: Stephen Hemminger [mailto:step...@networkplumber.org] > Sent: Friday, 1 November 2024 01.35 > > On Thu, 31 Oct 2024 11:27:25 +0100 > Morten Brørup <m...@smartsharesystems.com> wrote: > > > > From: Stephen Hemminger [mailto:step...@networkplumber.org] > > > Sent: Wednesday, 30 October 2024 22.57 > > > > > > The current tap device is slow both due to architectural choices > and > > > the > > > overhead of Linux system calls. > > > > Yes; but isn't it only being used for (low volume) management > traffic? > > Is the TAP PMD performance an issue for anyone? What is their use > case? > > > In embedded systems, if you want to use DPDK for dataplane, you still > need > to have a control plane path to the kernel. And most of the hardware > used > does not support a bifurcated driver. Either that or have two NIC's.
Yes, our application does that (not using a bifurcated driver); it can be configured for in-band management or a dedicated management port. > > > > > Or is the key issue that the TAP PMD makes system calls in the fast > path, so you are looking to implement a new TAP PMD that doesn't make > any system calls in the fast path? > > Even the control path performance matters. Think of a router with lots > BGP > connections, or doing updates. BGP is an excellent example where performance matters. (Our applications currently don't support BGP, so I'm used to a relatively low volume of management traffic, and didn't think about BGP.) > > > > > > I am exploring a how to fix that but > > > some > > > of the choices require some tradeoffs. Which leads to some open > > > questions: > > > > > > 1. DPDK tap also support tunnel (TUN) mode where there is no > Ethernet > > > header > > > only L3. Does anyone actually use this? It is different than > what > > > every other > > > PMD expects. > > > > If used for high volume (data plane) traffic, I would assume standard > PMD behavior (i.e. incl. Ethernet headers) would suffice. The traffic to/from the TAP port is likely to be exchanged with a physical port, so the packets will have an Ethernet header at this point. > > > > > > > > 2. The fastest way to use kernel TAP device would be to use > io_uring. > > > But this was added in 5.1 kernel (2019). Rather than having > > > conditional or > > > dual mode in DPDK tap device, perhaps there should just be a new > PMD > > > tap_uring? > > > > If the features differ significantly, I'm in favor of a new PMD. > > And it would be an opportunity to get rid of useless cruft, which I > think you are already asking about here. :-) > > Yes, and the TAP device was written to support a niche use case (all > the flow stuff). > Also TAP device has lots of extra code, at some point doing bit-by-bit > cleanup gets annoying. > > > > > Furthermore, a "clean sheet" implementation - adding all the > experience accumulated since the old TAP PMD - could serve as showcase > for "best practices" for software PMDs. > > > > > > > > 3. Current TAP device provides hooks for several rte_flow types by > > > playing > > > games with kernel qdisc. Does anyone really use this? Propose > just > > > not doing > > > this in new tap_uring. > > > > > > 4. What other features of TAP device beyond basic send/receive make > > > sense? > > > It looks like new device could support better statistics. > > > > IMHO, statistics about missed packets are relevant. If the ingress > (kernel->DPDK) queue is full, and the kernel has to drop packets, this > drop counter should be exposed to the application through the PMD. > > It may require some kernel side additions to extract that, but not out > of scope. > > > > > I don't know if the existing TAP PMD supports it, but associating a > port/queue with a "network namespace" or VRF in the kernel could also > be relevant. > > All network devices can be put in network namespace; VRF in Linux is > separate from netns it has to do with > which routing table is associated with the net device. > > > > > > > > > 5. What about Rx interrupt support? > > > > RX interrupt support seems closely related to power management. > > It could be used to reduce jitter/latency (and burstiness) when > someone on the network communicates with an in-band management > interface. > > Not sure if ioring has wakeup mechanism, but probably epoll() is > possible. Yes, it seems to be: https://unixism.net/loti/tutorial/register_eventfd.html > > > > > > > > > Probably the hardest part of using io_uring is figuring out how to > > > collect > > > completions. The simplest way would be to handle all completions rx > and > > > tx > > > in the rx_burst function. > > > > Please don't mix RX and TX, unless explicitly requested by the > application through the recently introduced "mbuf recycle" feature. > > The issue is Rx and Tx share a single fd and ioring for completion is > per fd. > The implementation for ioring came from the storage side so initially > it was for fixing > the broken Linux AIO support. > > Some other devices only have single interrupt or ring shared with rx/tx > so not unique. > Virtio, netvsc, and some NIC's. > > The problem is that if Tx completes descriptors then there needs to be > locking > to prevent Rx thread and Tx thread overlapping. And a spin lock is a > performance buzz kill. Brainstorming a bit here... What if the new TAP io_uring PMD is designed to use two io_urings per port, one for RX and another one for TX on the same TAP interface? This requires that a TAP interface can be referenced via two file descriptors (one fd for the RX io_uring and another fd for the TX io_uring), e.g. by using dup() to create the additional file descriptor. I don't know if this is possible, and if it works with io_uring.