On 10/10/2016 06:22 AM, Yuanhan Liu wrote: > On Mon, Oct 10, 2016 at 07:17:06AM +0300, Michael S. Tsirkin wrote: >> On Mon, Oct 10, 2016 at 12:05:31PM +0800, Yuanhan Liu wrote: >>> On Fri, Sep 30, 2016 at 10:16:43PM +0300, Michael S. Tsirkin wrote: >>>>>> And the same is done is done in DPDK: >>>>>> >>>>>> static inline int __attribute__((always_inline)) >>>>>> copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc *descs, >>>>>> uint16_t max_desc, struct rte_mbuf *m, uint16_t desc_idx, >>>>>> struct rte_mempool *mbuf_pool) >>>>>> { >>>>>> ... >>>>>> /* >>>>>> * A virtio driver normally uses at least 2 desc buffers >>>>>> * for Tx: the first for storing the header, and others >>>>>> * for storing the data. >>>>>> */ >>>>>> if (likely((desc->len == dev->vhost_hlen) && >>>>>> (desc->flags & VRING_DESC_F_NEXT) != 0)) { >>>>>> desc = &descs[desc->next]; >>>>>> if (unlikely(desc->flags & VRING_DESC_F_INDIRECT)) >>>>>> return -1; >>>>>> >>>>>> desc_addr = gpa_to_vva(dev, desc->addr); >>>>>> if (unlikely(!desc_addr)) >>>>>> return -1; >>>>>> >>>>>> rte_prefetch0((void *)(uintptr_t)desc_addr); >>>>>> >>>>>> desc_offset = 0; >>>>>> desc_avail = desc->len; >>>>>> nr_desc += 1; >>>>>> >>>>>> PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0); >>>>>> } else { >>>>>> desc_avail = desc->len - dev->vhost_hlen; >>>>>> desc_offset = dev->vhost_hlen; >>>>>> } >>>>> >>>>> Actually, the header is parsed in DPDK vhost implementation. >>>>> But as Virtio PMD provides a zero'ed header, we could just parse >>>>> the header only if VIRTIO_NET_F_NO_TX_HEADER is not negotiated. >>>> >>>> host can always skip the header parse if it wants to. >>>> It didn't seem worth it to add branches there but >>>> if I'm wrong, by all means code it up. >>> >>> It's added by following commit, which yields about 10% performance >>> boosts for PVP case (with 64B packet size). >>> >>> At that time, a packet always use 2 descs. Since indirect desc is >>> enabled (by default) now, the assumption is not true then. What's >>> worse, it might even slow things a bit down. That should also be >>> part of the reason why performance is slightly worse than before. >>> >>> --yliu >> >> I'm not sure I get what you are saying >> >>> commit 1d41d77cf81c448c1b09e1e859bfd300e2054a98 >>> Author: Yuanhan Liu <yuanhan.liu at linux.intel.com> >>> Date: Mon May 2 17:46:17 2016 -0700 >>> >>> vhost: optimize dequeue for small packets >>> >>> A virtio driver normally uses at least 2 desc buffers for Tx: the >>> first for storing the header, and the others for storing the data. >>> >>> Therefore, we could fetch the first data desc buf before the main >>> loop, and do the copy first before the check of "are we done yet?". >>> This could save one check for small packets that just have one data >>> desc buffer and need one mbuf to store it. >>> >>> Signed-off-by: Yuanhan Liu <yuanhan.liu at linux.intel.com> >>> Acked-by: Huawei Xie <huawei.xie at intel.com> >>> Tested-by: Rich Lane <rich.lane at bigswitch.com> >> >> This fast-paths the 2-descriptors format but it's not active >> for indirect descriptors. Is this what you mean? > > Yes. It's also not active when ANY_LAYOUT is actually turned on. >> Should be a simple matter to apply this optimization for indirect. > > Might be.
If I understand the code correctly, indirect descs also benefit from this optimization, or am I missing something? Maxime