On Thu, Aug 24, 2017 at 10:19:39AM +0800, Tiwei Bie wrote: > This patch adaptively batches the small guest memory copies. > By batching the small copies, the efficiency of executing the > memory LOAD instructions can be improved greatly, because the > memory LOAD latency can be effectively hidden by the pipeline. > We saw great performance boosts for small packets PVP test. > > This patch improves the performance for small packets, and has > distinguished the packets by size. So although the performance > for big packets doesn't change, it makes it relatively easy to > do some special optimizations for the big packets too.
The number showed in other replies looks really impressive. Great work! This patch also looks good to me. I have one minor comment though. [...] > +/* > + * Structure contains the info for each batched memory copy. > + */ > +struct burst_copy_elem { > + void *dst; > + void *src; > + uint32_t len; > + uint64_t log_addr; > +}; Like the title says, it's more about batch (but not burst). Also, it's not a good idea to mix burst and batch. I'd suggest you to use the term "batch" consistently. --yliu