> On Apr 6, 2025, at 7:14 PM, Jason Wang <jasow...@redhat.com> wrote: > > !-------------------------------------------------------------------| > CAUTION: External Email > > |-------------------------------------------------------------------! > > On Fri, Apr 4, 2025 at 10:24 PM Jon Kohler <j...@nutanix.com> wrote: >> >> Commit 098eadce3c62 ("vhost_net: disable zerocopy by default") disabled >> the module parameter for the handle_tx_zerocopy path back in 2019, >> nothing that many downstream distributions (e.g., RHEL7 and later) had >> already done the same. >> >> Both upstream and downstream disablement suggest this path is rarely >> used. >> >> Testing the module parameter shows that while the path allows packet >> forwarding, the zerocopy functionality itself is broken. On outbound >> traffic (guest TX -> external), zerocopy SKBs are orphaned by either >> skb_orphan_frags_rx() (used with the tun driver via tun_net_xmit()) > > This is by design to avoid DOS.
I understand that, but it makes ZC non-functional in general, as ZC fails and immediately increments the error counters. > >> or >> skb_orphan_frags() elsewhere in the stack, > > Basically zerocopy is expected to work for guest -> remote case, so > could we still hit skb_orphan_frags() in this case? Yes, you’d hit that in tun_net_xmit(). If you punch a hole in that *and* in the zc error counter (such that failed ZC doesn’t disable ZC in vhost), you get ZC from vhost; however, the network interrupt handler under net_tx_action and eventually incurs the memcpy under dev_queue_xmit_nit(). This is no more performant, and in fact is actually worse since the time spent waiting on that memcpy to resolve is longer. > >> as vhost_net does not set >> SKBFL_DONT_ORPHAN. >> >> Orphaning enforces a memcpy and triggers the completion callback, which >> increments the failed TX counter, effectively disabling zerocopy again. >> >> Even after addressing these issues to prevent SKB orphaning and error >> counter increments, performance remains poor. By default, only 64 >> messages can be zerocopied, which is immediately exhausted by workloads >> like iperf, resulting in most messages being memcpy'd anyhow. >> >> Additionally, memcpy'd messages do not benefit from the XDP batching >> optimizations present in the handle_tx_copy path. >> >> Given these limitations and the lack of any tangible benefits, remove >> zerocopy entirely to simplify the code base. >> >> Signed-off-by: Jon Kohler <j...@nutanix.com> > > Any chance we can fix those issues? Actually, we had a plan to make > use of vhost-net and its tx zerocopy (or even implement the rx > zerocopy) in pasta. Happy to take direction and ideas here, but I don’t see a clear way to fix these issues, without dealing with the assertions that skb_orphan_frags_rx calls out. Said another way, I’d be interested in hearing if there is a config where ZC in current host-net implementation works, as I was driving myself crazy trying to reverse engineer. Happy to collaborate if there is something we could do here. > > Eugenio may explain more here. > > Thanks >