On Mon, Nov 24, 2025 at 5:20 PM Simon Schippers <[email protected]> wrote: > > On 11/24/25 02:04, Jason Wang wrote: > > On Fri, Nov 21, 2025 at 5:23 PM Simon Schippers > > <[email protected]> wrote: > >> > >> On 11/21/25 07:19, Jason Wang wrote: > >>> On Thu, Nov 20, 2025 at 11:30 PM Simon Schippers > >>> <[email protected]> wrote: > >>>> > >>>> This patch series deals with tun/tap and vhost-net which drop incoming > >>>> SKBs whenever their internal ptr_ring buffer is full. Instead, with this > >>>> patch series, the associated netdev queue is stopped before this happens. > >>>> This allows the connected qdisc to function correctly as reported by [1] > >>>> and improves application-layer performance, see our paper [2]. Meanwhile > >>>> the theoretical performance differs only slightly: > >>>> > >>>> +--------------------------------+-----------+----------+ > >>>> | pktgen benchmarks to Debian VM | Stock | Patched | > >>>> | i5 6300HQ, 20M packets | | | > >>>> +-----------------+--------------+-----------+----------+ > >>>> | TAP | Transmitted | 195 Kpps | 183 Kpps | > >>>> | +--------------+-----------+----------+ > >>>> | | Lost | 1615 Kpps | 0 pps | > >>>> +-----------------+--------------+-----------+----------+ > >>>> | TAP+vhost_net | Transmitted | 589 Kpps | 588 Kpps | > >>>> | +--------------+-----------+----------+ > >>>> | | Lost | 1164 Kpps | 0 pps | > >>>> +-----------------+--------------+-----------+----------+ > >>> > >> > >> Hi Jason, > >> > >> thank you for your reply! > >> > >>> PPS drops somehow for TAP, any reason for that? > >> > >> I have no explicit explanation for that except general overheads coming > >> with this implementation. > > > > It would be better to fix that. > > > >> > >>> > >>> Btw, I had some questions: > >>> > >>> 1) most of the patches in this series would introduce non-trivial > >>> impact on the performance, we probably need to benchmark each or split > >>> the series. What's more we need to run TCP benchmark > >>> (throughput/latency) as well as pktgen see the real impact > >> > >> What could be done, IMO, is to activate tun_ring_consume() / > >> tap_ring_consume() before enabling tun_ring_produce(). Then we could see > >> if this alone drops performance. > >> > >> For TCP benchmarks, you mean userspace performance like iperf3 between a > >> host and a guest system? > > > > Yes, > > > >> > >>> > >>> 2) I see this: > >>> > >>> if (unlikely(tun_ring_produce(&tfile->tx_ring, queue, skb))) { > >>> drop_reason = SKB_DROP_REASON_FULL_RING; > >>> goto drop; > >>> } > >>> > >>> So there could still be packet drop? Or is this related to the XDP path? > >> > >> Yes, there can be packet drops after a ptr_ring resize or a ptr_ring > >> unconsume. Since those two happen so rarely, I figured we should just > >> drop in this case. > >> > >>> > >>> 3) The LLTX change would have performance implications, but the > >>> benmark doesn't cover the case where multiple transmission is done in > >>> parallel > >> > >> Do you mean multiple applications that produce traffic and potentially > >> run on different CPUs? > > > > Yes. > > > >> > >>> > >>> 4) After the LLTX change, it seems we've lost the synchronization with > >>> the XDP_TX and XDP_REDIRECT path? > >> > >> I must admit I did not take a look at XDP and cannot really judge if/how > >> lltx has an impact on XDP. But from my point of view, __netif_tx_lock() > >> instead of __netif_tx_acquire(), is executed before the tun_net_xmit() > >> call and I do not see the impact for XDP, which calls its own methods. > > > > Without LLTX tun_net_xmit is protected by tx lock but it is not the > > case of tun_xdp_xmit. This is because, unlike other devices, tun > > doesn't have a dedicated TX queue for XDP, so the queue is shared by > > both XDP and skb. So XDP xmit path needs to be protected with tx lock > > as well, and since we don't have queue discipline for XDP, it means we > > could still drop packets when XDP is enabled. I'm not sure this would > > defeat the whole idea or not. > > Good point. > > > > >>> > >>> 5) The series introduces various ptr_ring helpers with lots of > >>> ordering stuff which is complicated, I wonder if we first have a > >>> simple patch to implement the zero packet loss > >> > >> I personally don't see how a simpler patch is possible without using > >> discouraged practices like returning NETDEV_TX_BUSY in tun_net_xmit or > >> spin locking between producer and consumer. But I am open for > >> suggestions :) > > > > I see NETDEV_TX_BUSY is used by veth: > > > > static int veth_xdp_rx(struct veth_rq *rq, struct sk_buff *skb) > > { > > if (unlikely(ptr_ring_produce(&rq->xdp_ring, skb))) > > return NETDEV_TX_BUSY; /* signal qdisc layer */ > > > > return NET_RX_SUCCESS; /* same as NETDEV_TX_OK */ > > } > > > > Maybe it would be simpler to start from that (probably with a new > > tun->flags?). > > > > Thanks > > Do you mean that this patchset could be implemented using the same > approach that was used for veth in [1]? > This could then also fix the XDP path.
I think so. > > But is returning NETDEV_TX_BUSY fine in our case? If it helps to avoid packet drop. But I'm not sure if qdisc is a must in your case. > > Do you mean a flag that enables or disables the no-drop behavior? Yes, via a new flags that could be set via TUNSETIFF. Thanks > > Thanks! > > [1] Link: > https://lore.kernel.org/netdev/174559288731.827981.8748257839971869213.stgit@firesoul/T/#u > > > > >> > >>> > >>>> > >>>> This patch series includes tun/tap, and vhost-net because they share > >>>> logic. Adjusting only one of them would break the others. Therefore, the > >>>> patch series is structured as follows: > >>>> 1+2: new ptr_ring helpers for 3 > >>>> 3: tun/tap: tun/tap: add synchronized ring produce/consume with queue > >>>> management > >>>> 4+5+6: tun/tap: ptr_ring wrappers and other helpers to be called by > >>>> vhost-net > >>>> 7: tun/tap & vhost-net: only now use the previous implemented functions > >>>> to > >>>> not break git bisect > >>>> 8: tun/tap: drop get ring exports (not used anymore) > >>>> > >>>> Possible future work: > >>>> - Introduction of Byte Queue Limits as suggested by Stephen Hemminger > >>> > >>> This seems to be not easy. The tx completion depends on the userspace > >>> behaviour. > >> > >> I agree, but I really would like to reduce the buffer bloat caused by the > >> default 500 TUN / 1000 TAP packet queue without losing performance. > >> > >>> > >>>> - Adaption of the netdev queue flow control for ipvtap & macvtap > >>>> > >>>> [1] Link: > >>>> https://unix.stackexchange.com/questions/762935/traffic-shaping-ineffective-on-tun-device > >>>> [2] Link: > >>>> https://cni.etit.tu-dortmund.de/storages/cni-etit/r/Research/Publications/2025/Gebauer_2025_VTCFall/Gebauer_VTCFall2025_AuthorsVersion.pdf > >>>> > >>> > >>> Thanks > >>> > >> > >> Thanks! :) > >> > > >

