> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coque...@redhat.com]
> Sent: Tuesday, May 29, 2018 5:45 PM
> To: dev@dpdk.org; Bie, Tiwei <tiwei....@intel.com>; Wang, Zhihong
> <zhihong.w...@intel.com>
> Cc: Maxime Coquelin <maxime.coque...@redhat.com>
> Subject: [PATCH v2 0/2] Vhost: unitfy receive paths
>
> Hi,
>
> This second version fixes the feature bit check in
> rxvq_is_mergeable(), and remove "mergeable" from rx funcs
> names. No difference is seen in the benchmarks
>
> This series is preliminary work to ease the integration of
> packed ring layout support. But even without packed ring
> layout, the result is positive.
>
> First patch unify both paths, and second one is a small
> optimization to avoid copying batch_copy_nb_elems VQ field
> to/from the stack.
>
> With the series applied, I get modest performance gain for
> both mergeable and non-mergeable casesi (, and the gain of
> about 300 LoC is non negligible maintenance-wise.
>
> Rx-mrg=off benchmarks:
>
> +------------+-------+-------------+-------------+----------+
> | Run | PVP | Guest->Host | Host->Guest | Loopback |
> +------------+-------+-------------+-------------+----------+
> | v18.05-rc5 | 14.47 | 16.64 | 17.57 | 13.15 |
> | + series | 14.87 | 16.86 | 17.70 | 13.30 |
> +------------+-------+-------------+-------------+----------+
>
> Rx-mrg=on benchmarks:
>
> +------------+------+-------------+-------------+----------+
> | Run | PVP | Guest->Host | Host->Guest | Loopback |
> +------------+------+-------------+-------------+----------+
> | v18.05-rc5 | 9.38 | 13.78 | 16.70 | 12.79 |
> | + series | 9.38 | 13.80 | 17.49 | 13.36 |
> +------------+------+-------------+-------------+----------+
>
> Note: Even without my series, the guest->host benchmark with
> mergeable buffers enabled looks suspicious as it should in
> theory be alsmost identical as when Rx mergeable buffers are
> disabled. To be investigated...
>
> Maxime Coquelin (2):
> vhost: unify Rx mergeable and non-mergeable paths
> vhost: improve batched copies performance
>
> lib/librte_vhost/virtio_net.c | 376
> +++++-------------------------------------
> 1 file changed, 37 insertions(+), 339 deletions(-)
>
Acked-by: Zhihong Wang <zhihong.w...@intel.com>
Thanks Maxime! This is really great to see. ;) We probably need the
same improvement for Virtio-pmd.
One comment on Virtio/Vhost performance analysis: No matter what type
of traffic is used (PVP, or Txonly-Rxonly, Loopback...), we need to
be clear on who we're testing, and give the other part excessive CPU
resources, otherwise we'll be testing whoever the slowest.
Since this patch is for Vhost, I suggest to run N (e.g. N = 4) Virtio
threads on N cores, and the corresponding N Vhost threads on a single
core, to do performance comparison. Do you think this makes sense?
For Guest -> Host, in my test I see Rx-mrg=on has negative impact on
Virtio side, probably because Virtio touches something that's not
touched when Rx-mrg=off.
Thanks
-Zhihong