On 09/20/2016 04:00 AM, Zhihong Wang wrote: > This patch reorders the code to delay virtio header write to improve > cache access efficiency for cases where the mrg_rxbuf feature is turned > on. CPU pipeline stall cycles can be significantly reduced. > > Virtio header write and mbuf data copy are all remote store operations > which takes a long time to finish. It's a good idea to put them together > to remove bubbles in between, to let as many remote store instructions > as possible go into store buffer at the same time to hide latency, and > to let the H/W prefetcher goes to work as early as possible. > > On a Haswell machine, about 100 cycles can be saved per packet by this > patch alone. Taking 64B packets traffic for example, this means about 60% > efficiency improvement for the enqueue operation.
Thanks for the detailed information, I appreciate it. Maxime