On 08/23/2016 10:10 AM, Yuanhan Liu wrote: > The basic idea of Tx zero copy is, instead of copying data from the > desc buf, here we let the mbuf reference the desc buf addr directly. > > Doing so, however, has one major issue: we can't update the used ring > at the end of rte_vhost_dequeue_burst. Because we don't do the copy > here, an update of the used ring would let the driver to reclaim the > desc buf. As a result, DPDK might reference a stale memory region. > > To update the used ring properly, this patch does several tricks: > > - when mbuf references a desc buf, refcnt is added by 1. > > This is to pin lock the mbuf, so that a mbuf free from the DPDK > won't actually free it, instead, refcnt is subtracted by 1. > > - We chain all those mbuf together (by tailq) > > And we check it every time on the rte_vhost_dequeue_burst entrance, > to see if the mbuf is freed (when refcnt equals to 1). If that > happens, it means we are the last user of this mbuf and we are > safe to update the used ring. > > - "struct zcopy_mbuf" is introduced, to associate an mbuf with the > right desc idx. > > Tx zero copy is introduced for performance reason, and some rough tests > show about 40% perfomance boost for packet size 1400B. FOr small packets, > (e.g. 64B), it actually slows a bit down. That is expected because this > patch introduces some extra works, and it outweighs the benefit from > saving few bytes copy. > > Signed-off-by: Yuanhan Liu <yuanhan.liu at linux.intel.com> > --- > lib/librte_vhost/vhost.c | 2 + > lib/librte_vhost/vhost.h | 21 ++++++ > lib/librte_vhost/vhost_user.c | 41 +++++++++- > lib/librte_vhost/virtio_net.c | 169 > +++++++++++++++++++++++++++++++++++++----- > 4 files changed, 214 insertions(+), 19 deletions(-) > ...
> rte_vhost_dequeue_burst(int vid, uint16_t queue_id, > struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count) > @@ -823,6 +943,30 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id, > if (unlikely(vq->enabled == 0)) > return 0; > > + if (dev->tx_zero_copy) { > + struct zcopy_mbuf *zmbuf, *next; > + int nr_updated = 0; > + > + for (zmbuf = TAILQ_FIRST(&vq->zmbuf_list); > + zmbuf != NULL; zmbuf = next) { > + next = TAILQ_NEXT(zmbuf, next); > + > + if (mbuf_is_consumed(zmbuf->mbuf)) { > + used_idx = vq->last_used_idx++ & (vq->size - 1); > + update_used_ring(dev, vq, used_idx, > + zmbuf->desc_idx); > + nr_updated += 1; > + > + TAILQ_REMOVE(&vq->zmbuf_list, zmbuf, next); > + rte_pktmbuf_free(zmbuf->mbuf); > + put_zmbuf(zmbuf); > + vq->nr_zmbuf -= 1; > + } Shouldn't you break the loop here as soon as a mbuf is not consumed? Indeed, they might not be consumed sequentially, and would cause last_used_idx to be incremented whereas it shouldn't.