On Tue, May 22, 2012 at 05:17:39PM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2012-05-22 at 14:34 +1000, Benjamin Herrenschmidt wrote:
> > The emulated devices can run simultaneously with the guest, so
> > we need to be careful with ordering of load and stores done by
> > them to the guest system memory, which need to be observed in
> > the right order by the guest operating system.
> > 
> > This adds a barrier call to the basic DMA read/write ops which
> > is currently implemented as a smp_mb(), but could be later
> > improved for more fine grained control of barriers.
> > 
> > Additionally, a _relaxed() variant of the accessors is provided
> > to easily convert devices who would be performance sensitive
> > and negatively impacted by the change.
> > 
> > Signed-off-by: Benjamin Herrenschmidt <b...@kernel.crashing.org>
> > ---
> 
> (Note to Rusty: The number I told you on ST is wrong, see below)
> 
> So I tried to do some performance measurements with that patch using
> netperf on an x86 laptop (x220 with core i7).
> 
> It's a bit tricky. For example, if I just create a tap interface,
> give it a local IP on the laptop and a different IP on the guest,
> (ie talking to a netserver on the host basically from the guest
> via tap), the performance is pretty poor and the numbers seem
> useless with and without the barrier.
> 
> So I did tests involving talking to a server on our gigabit network
> instead.
> 
> The baseline is the laptop without kvm talking to the server. The
> TCP_STREAM test results are:

It's not a good test. The thing most affecting throughput results is how
much CPU does you guest get. So as a minumum you need to measure CPU
utilization on the host and divide by that.

-- 
MST

Reply via email to