On Wed, Dec 12, 2018 at 05:34:31PM +0100, Maxime Coquelin wrote: > Hi Ilya, > > On 12/12/18 4:23 PM, Ilya Maximets wrote: > > On 12.12.2018 11:24, Maxime Coquelin wrote: > > > Instead of writing back descriptors chains in order, let's > > > write the first chain flags last in order to improve batching. > > > > > > With Kernel's pktgen benchmark, ~3% performance gain is measured. > > > > > > Signed-off-by: Maxime Coquelin <maxime.coque...@redhat.com> > > > --- > > > lib/librte_vhost/virtio_net.c | 39 +++++++++++++++++++++-------------- > > > 1 file changed, 24 insertions(+), 15 deletions(-) > > > > > > > Hi. > > I made some rough testing on my ARMv8 system with this patch and v1 of it. > > Here is the performance difference with current master: > > v1: +1.1 % > > v2: -3.6 % > > > > So, write barriers are quiet heavy in practice. > > Thanks for testing it on ARM. Indeed, SMP WMB is heavier on ARM.
Besides your ideas for improving packed rings, maybe we should switch to load_acquite/store_release? See virtio: use smp_load_acquire/smp_store_release which worked fine but as I only tested on x86 did not result in any gains. -- MST