On 09/29/2016 11:23 PM, Maxime Coquelin wrote: > > > On 09/29/2016 10:21 PM, Michael S. Tsirkin wrote: >> On Thu, Sep 29, 2016 at 10:05:22PM +0200, Maxime Coquelin wrote: >>> >>> >>> On 09/29/2016 07:57 PM, Michael S. Tsirkin wrote: >>>> On Thu, Sep 29, 2016 at 05:30:53PM +0200, Maxime Coquelin wrote: >>> ... >>>>> >>>>> Before enabling anything by default, we should first optimize the 1 >>>>> slot >>>>> case. Indeed, micro-benchmark using testpmd in txonly[0] shows ~17% >>>>> perf regression for 64 bytes case: >>>>> - 2 descs per packet: 11.6Mpps >>>>> - 1 desc per packet: 9.6Mpps >>>>> >>>>> This is due to the virtio header clearing in virtqueue_enqueue_xmit(). >>>>> Removing it, we get better results than with 2 descs (1.20Mpps). >>>>> Since the Virtio PMD doesn't support offloads, I wonder whether we can >>>>> just drop the memset? >>>> >>>> What will happen? Will the header be uninitialized? >>> Yes.. >>> I didn't look closely at the spec, but just looked at DPDK's and Linux >>> vhost implementations. IIUC, the header is just skipped in the two >>> implementations. >> >> In linux guest skbs are initialized AFAIK. See virtio_net_hdr_from_skb >> first thing it does is >> memset(hdr, 0, sizeof(*hdr)); > > I meant in vhost-net linux implementation, the header is just skipped: > > static void handle_tx(struct vhost_net *net) > { > ... > /* Skip header. TODO: support TSO. */ > len = iov_length(vq->iov, out); > iov_iter_init(&msg.msg_iter, WRITE, vq->iov, out, len); > iov_iter_advance(&msg.msg_iter, hdr_size); > > And the same is done is done in DPDK: > > static inline int __attribute__((always_inline)) > copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc *descs, > uint16_t max_desc, struct rte_mbuf *m, uint16_t desc_idx, > struct rte_mempool *mbuf_pool) > { > ... > /* > * A virtio driver normally uses at least 2 desc buffers > * for Tx: the first for storing the header, and others > * for storing the data. > */ > if (likely((desc->len == dev->vhost_hlen) && > (desc->flags & VRING_DESC_F_NEXT) != 0)) { > desc = &descs[desc->next]; > if (unlikely(desc->flags & VRING_DESC_F_INDIRECT)) > return -1; > > desc_addr = gpa_to_vva(dev, desc->addr); > if (unlikely(!desc_addr)) > return -1; > > rte_prefetch0((void *)(uintptr_t)desc_addr); > > desc_offset = 0; > desc_avail = desc->len; > nr_desc += 1; > > PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0); > } else { > desc_avail = desc->len - dev->vhost_hlen; > desc_offset = dev->vhost_hlen; > }
Actually, the header is parsed in DPDK vhost implementation. But as Virtio PMD provides a zero'ed header, we could just parse the header only if VIRTIO_NET_F_NO_TX_HEADER is not negotiated. >> >> >> >>>> >>>> The spec says: >>>> The driver can send a completely checksummed packet. In this >>>> case, flags >>>> will be zero, and gso_type >>>> will be VIRTIO_NET_HDR_GSO_NONE. >>>> >>>> and >>>> The driver MUST set num_buffers to zero. >>>> If VIRTIO_NET_F_CSUM is not negotiated, the driver MUST set >>>> flags to >>>> zero and SHOULD supply a fully >>>> checksummed packet to the device. >>>> >>>> and >>>> If none of the VIRTIO_NET_F_HOST_TSO4, TSO6 or UFO options have >>>> been >>>> negotiated, the driver MUST >>>> set gso_type to VIRTIO_NET_HDR_GSO_NONE. >>>> >>>> so doing this unconditionally would be a spec violation, but if you see >>>> value in this, we can add a feature bit. >>> Right it would be a spec violation, so it should be done conditionally. >>> If a feature bit is to be added, what about VIRTIO_NET_F_NO_TX_HEADER? >>> It would imply VIRTIO_NET_F_CSUM not set, and no GSO features set. >>> If negotiated, we wouldn't need to prepend a header. >> >> Yes but two points. >> >> 1. why is this memset expensive? Is the test completely skipping looking >> at the packet otherwise? > Yes. >> >> 2. As long as we are doing this, see >> Alignment vs. Networking >> ======================== >> in Documentation/unaligned-memory-access.txt > Thanks, I'll have a look tomorrow. I did a rough prototype which removes Tx headers unconditionally, to see what gain we could expect. I expect the results to be a little lower with no headers in full implementation, as some more checks will have to be done. For PVP use-case with 0.05% acceptable packets loss: - Original (with headers): 9.43Mpps - Indirect descs: 9.36 Mpps - Prototype (no headers): 10.65Mpps For PVP use-case with 0% acceptable packets loss: - Original (with headers): 5.23Mpps - Indirect descs: 7.13 Mpps - Prototype (no headers): 7.92Mpps Maxime