Hi Maxime, On Thu, Sep 07, 2017 at 07:47:57PM +0200, Maxime Coquelin wrote: > Hi Tiwei, > > On 08/24/2017 04:19 AM, Tiwei Bie wrote: > > This patch adaptively batches the small guest memory copies. > > By batching the small copies, the efficiency of executing the > > memory LOAD instructions can be improved greatly, because the > > memory LOAD latency can be effectively hidden by the pipeline. > > We saw great performance boosts for small packets PVP test. > > > > This patch improves the performance for small packets, and has > > distinguished the packets by size. So although the performance > > for big packets doesn't change, it makes it relatively easy to > > do some special optimizations for the big packets too. > > > > Signed-off-by: Tiwei Bie<tiwei....@intel.com> > > Signed-off-by: Zhihong Wang<zhihong.w...@intel.com> > > Signed-off-by: Zhiyong Yang<zhiyong.y...@intel.com> > > --- > > This optimization depends on the CPU internal pipeline design. > > So further tests (e.g. ARM) from the community is appreciated. > > > > lib/librte_vhost/vhost.c | 2 +- > > lib/librte_vhost/vhost.h | 13 +++ > > lib/librte_vhost/vhost_user.c | 12 +++ > > lib/librte_vhost/virtio_net.c | 240 > > ++++++++++++++++++++++++++++++++---------- > > 4 files changed, 209 insertions(+), 58 deletions(-) > > I did some PVP benchmark with your patch. > First I tried my standard PVP setup, with io forwarding on host and > macswap on guest in bidirectional mode. > > With this, I notice no improvement (18.8Mpps), but I think it explains > because guest is the bottleneck here. > So I change my setup to do csum forwarding on host side, so that host's > PMD threads are more loaded. > > In this case, I notice a great improvement, I get 18.8Mpps with your > patch instead of 14.8Mpps without! Great work! > > Reviewed-by: Maxime Coquelin <maxime.coque...@redhat.com> >
Thank you very much for taking time to review and test this patch! :-) Best regards, Tiwei Bie