[dpdk-dev] [PATCH v3] vhost: Add indirect descriptors support to the TX path

Maxime Coquelin Fri, 23 Sep 2016 20:02:27 +0200


On 09/23/2016 05:49 PM, Michael S. Tsirkin wrote:
> On Fri, Sep 23, 2016 at 10:28:23AM +0200, Maxime Coquelin wrote:
>> Indirect descriptors are usually supported by virtio-net devices,
>> allowing to dispatch a larger number of requests.
>>
>> When the virtio device sends a packet using indirect descriptors,
>> only one slot is used in the ring, even for large packets.
>>
>> The main effect is to improve the 0% packet loss benchmark.
>> A PVP benchmark using Moongen (64 bytes) on the TE, and testpmd
>> (fwd io for host, macswap for VM) on DUT shows a +50% gain for
>> zero loss.
>>
>> On the downside, micro-benchmark using testpmd txonly in VM and
>> rxonly on host shows a loss between 1 and 4%.i But depending on
>> the needs, feature can be disabled at VM boot time by passing
>> indirect_desc=off argument to vhost-user device in Qemu.
>
> Even better, change guest pmd to only use indirect
> descriptors when this makes sense (e.g. sufficiently
> large packets).
With the micro-benchmark, the degradation is quite constant whatever
the packet size.


For PVP, I could not test with larger packets than 64 bytes, as I don't
have a 40G interface, and line rate with 10G is reached rapidly.

> I would be very interested to know when does it make
> sense.
>
> The feature is there, it's up to guest whether to
> use it.
Do you mean the PMD should detect dynamically whether using indirect,
or having an option at device init time to enable or not the feature?

>
>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin at redhat.com>
>> ---
>> Changes since v2:
>> =================
>>  - Revert back to not checking feature flag to be aligned with
>> kernel implementation
>>  - Ensure we don't have nested indirect descriptors
>>  - Ensure the indirect desc address is valid, to protect against
>> malicious guests
>>
>> Changes since RFC:
>> =================
>>  - Enrich commit message with figures
>>  - Rebased on top of dpdk-next-virtio's master
>>  - Add feature check to ensure we don't receive an indirect desc
>> if not supported by the virtio driver
>>
>>  lib/librte_vhost/vhost.c      |  3 ++-
>>  lib/librte_vhost/virtio_net.c | 41 +++++++++++++++++++++++++++++++----------
>>  2 files changed, 33 insertions(+), 11 deletions(-)
>>
>> diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
>> index 46095c3..30bb0ce 100644
>> --- a/lib/librte_vhost/vhost.c
>> +++ b/lib/librte_vhost/vhost.c
>> @@ -65,7 +65,8 @@
>>                              (1ULL << VIRTIO_NET_F_CSUM)    | \
>>                              (1ULL << VIRTIO_NET_F_GUEST_CSUM) | \
>>                              (1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
>> -                            (1ULL << VIRTIO_NET_F_GUEST_TSO6))
>> +                            (1ULL << VIRTIO_NET_F_GUEST_TSO6) | \
>> +                            (1ULL << VIRTIO_RING_F_INDIRECT_DESC))
>>
>>  uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;
>>
>> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
>> index 8a151af..2e0a587 100644
>> --- a/lib/librte_vhost/virtio_net.c
>> +++ b/lib/librte_vhost/virtio_net.c
>> @@ -679,8 +679,8 @@ make_rarp_packet(struct rte_mbuf *rarp_mbuf, const 
>> struct ether_addr *mac)
>>  }
>>
>>  static inline int __attribute__((always_inline))
>> -copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
>> -              struct rte_mbuf *m, uint16_t desc_idx,
>> +copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc *descs,
>> +              uint16_t max_desc, struct rte_mbuf *m, uint16_t desc_idx,
>>                struct rte_mempool *mbuf_pool)
>>  {
>>      struct vring_desc *desc;
>> @@ -693,8 +693,9 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct 
>> vhost_virtqueue *vq,
>>      /* A counter to avoid desc dead loop chain */
>>      uint32_t nr_desc = 1;
>>
>> -    desc = &vq->desc[desc_idx];
>> -    if (unlikely(desc->len < dev->vhost_hlen))
>> +    desc = &descs[desc_idx];
>> +    if (unlikely((desc->len < dev->vhost_hlen)) ||
>> +                    (desc->flags & VRING_DESC_F_INDIRECT))
>>              return -1;
>>
>>      desc_addr = gpa_to_vva(dev, desc->addr);
>> @@ -711,7 +712,9 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct 
>> vhost_virtqueue *vq,
>>       */
>>      if (likely((desc->len == dev->vhost_hlen) &&
>>                 (desc->flags & VRING_DESC_F_NEXT) != 0)) {
>> -            desc = &vq->desc[desc->next];
>> +            desc = &descs[desc->next];
>> +            if (unlikely(desc->flags & VRING_DESC_F_INDIRECT))
>> +                    return -1;
>>
>>              desc_addr = gpa_to_vva(dev, desc->addr);
>>              if (unlikely(!desc_addr))
>
>
> Just to make sure, does this still allow a chain of
> direct descriptors ending with an indirect one?
> This is legal as per spec.
>
>> @@ -747,10 +750,12 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct 
>> vhost_virtqueue *vq,
>>                      if ((desc->flags & VRING_DESC_F_NEXT) == 0)
>>                              break;
>>
>> -                    if (unlikely(desc->next >= vq->size ||
>> -                                 ++nr_desc > vq->size))
>> +                    if (unlikely(desc->next >= max_desc ||
>> +                                 ++nr_desc > max_desc))
>> +                            return -1;
>> +                    desc = &descs[desc->next];
>> +                    if (unlikely(desc->flags & VRING_DESC_F_INDIRECT))
>>                              return -1;
>> -                    desc = &vq->desc[desc->next];
>>
>>                      desc_addr = gpa_to_vva(dev, desc->addr);
>>                      if (unlikely(!desc_addr))
>> @@ -878,19 +883,35 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
>>      /* Prefetch descriptor index. */
>>      rte_prefetch0(&vq->desc[desc_indexes[0]]);
>>      for (i = 0; i < count; i++) {
>> +            struct vring_desc *desc;
>> +            uint16_t sz, idx;
>>              int err;
>>
>>              if (likely(i + 1 < count))
>>                      rte_prefetch0(&vq->desc[desc_indexes[i + 1]]);
>>
>> +            if (vq->desc[desc_indexes[i]].flags & VRING_DESC_F_INDIRECT) {
>> +                    desc = (struct vring_desc *)gpa_to_vva(dev,
>> +                                    vq->desc[desc_indexes[i]].addr);
>> +                    if (unlikely(!desc))
>> +                            break;
>> +
>> +                    rte_prefetch0(desc);
>> +                    sz = vq->desc[desc_indexes[i]].len / sizeof(*desc);
>> +                    idx = 0;
>> +            } else {
>> +                    desc = vq->desc;
>> +                    sz = vq->size;
>> +                    idx = desc_indexes[i];
>> +            }
>> +
>>              pkts[i] = rte_pktmbuf_alloc(mbuf_pool);
>>              if (unlikely(pkts[i] == NULL)) {
>>                      RTE_LOG(ERR, VHOST_DATA,
>>                              "Failed to allocate memory for mbuf.\n");
>>                      break;
>>              }
>> -            err = copy_desc_to_mbuf(dev, vq, pkts[i], desc_indexes[i],
>> -                                    mbuf_pool);
>> +            err = copy_desc_to_mbuf(dev, desc, sz, pkts[i], idx, mbuf_pool);
>>              if (unlikely(err)) {
>>                      rte_pktmbuf_free(pkts[i]);
>>                      break;
>> --
>> 2.7.4
>
> Something that I'm missing here: it's legal for guest
> to add indirect descriptors for RX.
> I don't see the handling of RX here though.
> I think it's required for spec compliance.
>

[dpdk-dev] [PATCH v3] vhost: Add indirect descriptors support to the TX path

Reply via email to