Assuming my TX experiments with disconnected backend (and I disable CPU dynamic scaling of performance, etc.): 1) after patch 1 and 2, virtio bottleneck jumps from ~1Mpps to 1.910 Mpps. 2) after patch 1,2 and 3, virtio bottleneck jumps to 2.039 Mpps.
So I see an improvement for patch 3, and I guess it's because we avoid an additional memory translation and related overhead. I believe that avoiding the memory translation is more beneficial than avoiding the variable-sized memcpy. I'm not surprised of that, because taking a brief look at what happens under the hood when you call an access_memory() function - it looks like a lot of operations. Cheers, Vincenzo 2015-12-16 9:38 GMT+01:00 Paolo Bonzini <pbonz...@redhat.com>: > > > On 15/12/2015 23:33, Vincenzo Maffione wrote: >> This patch slightly rewrites the code to reduce the number of accesses, since >> many of them seems unnecessary to me. After this reduction, the bottleneck >> jumps from 1 Mpps to 2 Mpps. > > Very nice. Did you get new numbers with the rebase? That would help > measuring the effect of removing variable-sized memcpy (I'll post the > patches for this shortly; they're entirely in memory.h/exec.c so they're > not virtio-specific). A rough measurement from "perf" says they're > worth about 5%. > > Related to this, patch 3 introduces a variable-sized memcpy, because it > switches from 2 virtio_stl_phys to 1 address_space_write. I'm curious > if the effect of this individual patch is positive, negative or neutral. > On the other hand, patches 1 and 2 are clear wins. > > Paolo > >> Patch is not complete (e.g. it still does not properly manage endianess, it >> is >> not clean, etc.). I just wanted to ask if you think the idea makes sense, and >> a proper patch in this direction would be accepted. -- Vincenzo Maffione