On Tue, May 22, 2012 at 05:17:39PM +1000, Benjamin Herrenschmidt wrote: > On Tue, 2012-05-22 at 14:34 +1000, Benjamin Herrenschmidt wrote: > > The emulated devices can run simultaneously with the guest, so > > we need to be careful with ordering of load and stores done by > > them to the guest system memory, which need to be observed in > > the right order by the guest operating system. > > > > This adds a barrier call to the basic DMA read/write ops which > > is currently implemented as a smp_mb(), but could be later > > improved for more fine grained control of barriers. > > > > Additionally, a _relaxed() variant of the accessors is provided > > to easily convert devices who would be performance sensitive > > and negatively impacted by the change. > > > > Signed-off-by: Benjamin Herrenschmidt <b...@kernel.crashing.org> > > --- > > (Note to Rusty: The number I told you on ST is wrong, see below) > > So I tried to do some performance measurements with that patch using > netperf on an x86 laptop (x220 with core i7). > > It's a bit tricky. For example, if I just create a tap interface, > give it a local IP on the laptop and a different IP on the guest, > (ie talking to a netserver on the host basically from the guest > via tap), the performance is pretty poor and the numbers seem > useless with and without the barrier. > > So I did tests involving talking to a server on our gigabit network > instead. > > The baseline is the laptop without kvm talking to the server. The > TCP_STREAM test results are:
It's not a good test. The thing most affecting throughput results is how much CPU does you guest get. So as a minumum you need to measure CPU utilization on the host and divide by that. -- MST