Hi Tiwei,
On 08/24/2017 04:19 AM, Tiwei Bie wrote:
This patch adaptively batches the small guest memory copies.
By batching the small copies, the efficiency of executing the
memory LOAD instructions can be improved greatly, because the
memory LOAD latency can be effectively hidden by the pipeline.
We saw great performance boosts for small packets PVP test.
This patch improves the performance for small packets, and has
distinguished the packets by size. So although the performance
for big packets doesn't change, it makes it relatively easy to
do some special optimizations for the big packets too.
Signed-off-by: Tiwei Bie<tiwei....@intel.com>
Signed-off-by: Zhihong Wang<zhihong.w...@intel.com>
Signed-off-by: Zhiyong Yang<zhiyong.y...@intel.com>
---
This optimization depends on the CPU internal pipeline design.
So further tests (e.g. ARM) from the community is appreciated.
lib/librte_vhost/vhost.c | 2 +-
lib/librte_vhost/vhost.h | 13 +++
lib/librte_vhost/vhost_user.c | 12 +++
lib/librte_vhost/virtio_net.c | 240 ++++++++++++++++++++++++++++++++----------
4 files changed, 209 insertions(+), 58 deletions(-)
I did some PVP benchmark with your patch.
First I tried my standard PVP setup, with io forwarding on host and
macswap on guest in bidirectional mode.
With this, I notice no improvement (18.8Mpps), but I think it explains
because guest is the bottleneck here.
So I change my setup to do csum forwarding on host side, so that host's
PMD threads are more loaded.
In this case, I notice a great improvement, I get 18.8Mpps with your
patch instead of 14.8Mpps without! Great work!
Reviewed-by: Maxime Coquelin <maxime.coque...@redhat.com>
Thanks,
Maxime