On 11/25/25 02:34, Jason Wang wrote: > On Mon, Nov 24, 2025 at 5:20 PM Simon Schippers > <[email protected]> wrote: >> >> On 11/24/25 02:04, Jason Wang wrote: >>> On Fri, Nov 21, 2025 at 5:23 PM Simon Schippers >>> <[email protected]> wrote: >>>> >>>> On 11/21/25 07:19, Jason Wang wrote: >>>>> On Thu, Nov 20, 2025 at 11:30 PM Simon Schippers >>>>> <[email protected]> wrote: >>>>>> >>>>>> This patch series deals with tun/tap and vhost-net which drop incoming >>>>>> SKBs whenever their internal ptr_ring buffer is full. Instead, with this >>>>>> patch series, the associated netdev queue is stopped before this happens. >>>>>> This allows the connected qdisc to function correctly as reported by [1] >>>>>> and improves application-layer performance, see our paper [2]. Meanwhile >>>>>> the theoretical performance differs only slightly: >>>>>> >>>>>> +--------------------------------+-----------+----------+ >>>>>> | pktgen benchmarks to Debian VM | Stock | Patched | >>>>>> | i5 6300HQ, 20M packets | | | >>>>>> +-----------------+--------------+-----------+----------+ >>>>>> | TAP | Transmitted | 195 Kpps | 183 Kpps | >>>>>> | +--------------+-----------+----------+ >>>>>> | | Lost | 1615 Kpps | 0 pps | >>>>>> +-----------------+--------------+-----------+----------+ >>>>>> | TAP+vhost_net | Transmitted | 589 Kpps | 588 Kpps | >>>>>> | +--------------+-----------+----------+ >>>>>> | | Lost | 1164 Kpps | 0 pps | >>>>>> +-----------------+--------------+-----------+----------+ >>>>> >>>> >>>> Hi Jason, >>>> >>>> thank you for your reply! >>>> >>>>> PPS drops somehow for TAP, any reason for that? >>>> >>>> I have no explicit explanation for that except general overheads coming >>>> with this implementation. >>> >>> It would be better to fix that. >>> >>>> >>>>> >>>>> Btw, I had some questions: >>>>> >>>>> 1) most of the patches in this series would introduce non-trivial >>>>> impact on the performance, we probably need to benchmark each or split >>>>> the series. What's more we need to run TCP benchmark >>>>> (throughput/latency) as well as pktgen see the real impact >>>> >>>> What could be done, IMO, is to activate tun_ring_consume() / >>>> tap_ring_consume() before enabling tun_ring_produce(). Then we could see >>>> if this alone drops performance. >>>> >>>> For TCP benchmarks, you mean userspace performance like iperf3 between a >>>> host and a guest system? >>> >>> Yes, >>> >>>> >>>>> >>>>> 2) I see this: >>>>> >>>>> if (unlikely(tun_ring_produce(&tfile->tx_ring, queue, skb))) { >>>>> drop_reason = SKB_DROP_REASON_FULL_RING; >>>>> goto drop; >>>>> } >>>>> >>>>> So there could still be packet drop? Or is this related to the XDP path? >>>> >>>> Yes, there can be packet drops after a ptr_ring resize or a ptr_ring >>>> unconsume. Since those two happen so rarely, I figured we should just >>>> drop in this case. >>>> >>>>> >>>>> 3) The LLTX change would have performance implications, but the >>>>> benmark doesn't cover the case where multiple transmission is done in >>>>> parallel >>>> >>>> Do you mean multiple applications that produce traffic and potentially >>>> run on different CPUs? >>> >>> Yes. >>> >>>> >>>>> >>>>> 4) After the LLTX change, it seems we've lost the synchronization with >>>>> the XDP_TX and XDP_REDIRECT path? >>>> >>>> I must admit I did not take a look at XDP and cannot really judge if/how >>>> lltx has an impact on XDP. But from my point of view, __netif_tx_lock() >>>> instead of __netif_tx_acquire(), is executed before the tun_net_xmit() >>>> call and I do not see the impact for XDP, which calls its own methods. >>> >>> Without LLTX tun_net_xmit is protected by tx lock but it is not the >>> case of tun_xdp_xmit. This is because, unlike other devices, tun >>> doesn't have a dedicated TX queue for XDP, so the queue is shared by >>> both XDP and skb. So XDP xmit path needs to be protected with tx lock >>> as well, and since we don't have queue discipline for XDP, it means we >>> could still drop packets when XDP is enabled. I'm not sure this would >>> defeat the whole idea or not. >> >> Good point. >> >>> >>>>> >>>>> 5) The series introduces various ptr_ring helpers with lots of >>>>> ordering stuff which is complicated, I wonder if we first have a >>>>> simple patch to implement the zero packet loss >>>> >>>> I personally don't see how a simpler patch is possible without using >>>> discouraged practices like returning NETDEV_TX_BUSY in tun_net_xmit or >>>> spin locking between producer and consumer. But I am open for >>>> suggestions :) >>> >>> I see NETDEV_TX_BUSY is used by veth: >>> >>> static int veth_xdp_rx(struct veth_rq *rq, struct sk_buff *skb) >>> { >>> if (unlikely(ptr_ring_produce(&rq->xdp_ring, skb))) >>> return NETDEV_TX_BUSY; /* signal qdisc layer */ >>> >>> return NET_RX_SUCCESS; /* same as NETDEV_TX_OK */ >>> } >>> >>> Maybe it would be simpler to start from that (probably with a new >>> tun->flags?). >>> >>> Thanks >> >> Do you mean that this patchset could be implemented using the same >> approach that was used for veth in [1]? >> This could then also fix the XDP path. > > I think so.
Okay, I will do so and submit a v7 when net-next opens again for 6.19. > >> >> But is returning NETDEV_TX_BUSY fine in our case? > > If it helps to avoid packet drop. But I'm not sure if qdisc is a must > in your case. I will try to avoid returning it. When no qdisc is connected, I will just drop like veth does. > >> >> Do you mean a flag that enables or disables the no-drop behavior? > > Yes, via a new flags that could be set via TUNSETIFF. > > Thanks I am not a fan of that, since I can not imagine a use case where dropping packets is desired. veth does not introduce a flag either. Of course, if there is a major performance degradation, it makes sense. But I will benchmark it, and we will see. Thank you! > >> >> Thanks! >> >> [1] Link: >> https://lore.kernel.org/netdev/174559288731.827981.8748257839971869213.stgit@firesoul/T/#u >> >>> >>>> >>>>> >>>>>> >>>>>> This patch series includes tun/tap, and vhost-net because they share >>>>>> logic. Adjusting only one of them would break the others. Therefore, the >>>>>> patch series is structured as follows: >>>>>> 1+2: new ptr_ring helpers for 3 >>>>>> 3: tun/tap: tun/tap: add synchronized ring produce/consume with queue >>>>>> management >>>>>> 4+5+6: tun/tap: ptr_ring wrappers and other helpers to be called by >>>>>> vhost-net >>>>>> 7: tun/tap & vhost-net: only now use the previous implemented functions >>>>>> to >>>>>> not break git bisect >>>>>> 8: tun/tap: drop get ring exports (not used anymore) >>>>>> >>>>>> Possible future work: >>>>>> - Introduction of Byte Queue Limits as suggested by Stephen Hemminger >>>>> >>>>> This seems to be not easy. The tx completion depends on the userspace >>>>> behaviour. >>>> >>>> I agree, but I really would like to reduce the buffer bloat caused by the >>>> default 500 TUN / 1000 TAP packet queue without losing performance. >>>> >>>>> >>>>>> - Adaption of the netdev queue flow control for ipvtap & macvtap >>>>>> >>>>>> [1] Link: >>>>>> https://unix.stackexchange.com/questions/762935/traffic-shaping-ineffective-on-tun-device >>>>>> [2] Link: >>>>>> https://cni.etit.tu-dortmund.de/storages/cni-etit/r/Research/Publications/2025/Gebauer_2025_VTCFall/Gebauer_VTCFall2025_AuthorsVersion.pdf >>>>>> >>>>> >>>>> Thanks >>>>> >>>> >>>> Thanks! :) >>>> >>> >> >

