On Fri, Sep 01, 2017 at 11:45:42AM +0200, Maxime Coquelin wrote: > On 08/24/2017 04:19 AM, Tiwei Bie wrote: > > This patch adaptively batches the small guest memory copies. > > By batching the small copies, the efficiency of executing the > > memory LOAD instructions can be improved greatly, because the > > memory LOAD latency can be effectively hidden by the pipeline. > > We saw great performance boosts for small packets PVP test. > > > > This patch improves the performance for small packets, and has > > distinguished the packets by size. So although the performance > > for big packets doesn't change, it makes it relatively easy to > > do some special optimizations for the big packets too. > > Do you mean that if we would batch unconditionnaly whatever the size, > we see performance drop for larger (>256) packets? >
Yeah, you are right. > Other question is about indirect descriptors, my understanding of the > patch is that the number of batched copies is limited to the queue size. > In theory, we could have more than that with indirect descriptors (first > indirect desc for the vnet header, second one for the packet). > > So in the worst case, we would have the first small copies being > batched, but not the last ones if there are more than queue size. > So, I think it works, but I'd like your confirmation. > Yeah, you are right. If the number of small copies is larger than the queue size, the last ones won't be batched any more. > > > > Signed-off-by: Tiwei Bie <tiwei....@intel.com> > > Signed-off-by: Zhihong Wang <zhihong.w...@intel.com> > > Signed-off-by: Zhiyong Yang <zhiyong.y...@intel.com> > > --- > > This optimization depends on the CPU internal pipeline design. > > So further tests (e.g. ARM) from the community is appreciated. > > Agree, I think this is important to have it tested on ARM platforms at > least to ensure it doesn't introduce a regression. > > Adding Santosh, Jerin & Hemant in cc, who might know who could do the > test. > Thank you very much! :-) Best regards, Tiwei Bie