On Thu, Apr 04, 2013 at 04:02:57PM +0200, Alexander Graf wrote:
> 
> On 04.04.2013, at 14:58, Michael S. Tsirkin wrote:
> 
> > On Thu, Apr 04, 2013 at 02:22:09PM +0200, Alexander Graf wrote:
> >> 
> >> On 04.04.2013, at 14:08, Gleb Natapov wrote:
> >> 
> >>> On Thu, Apr 04, 2013 at 01:57:34PM +0200, Alexander Graf wrote:
> >>>> 
> >>>> On 04.04.2013, at 12:50, Michael S. Tsirkin wrote:
> >>>> 
> >>>>> With KVM, MMIO is much slower than PIO, due to the need to
> >>>>> do page walk and emulation. But with EPT, it does not have to be: we
> >>>>> know the address from the VMCS so if the address is unique, we can look
> >>>>> up the eventfd directly, bypassing emulation.
> >>>>> 
> >>>>> Add an interface for userspace to specify this per-address, we can
> >>>>> use this e.g. for virtio.
> >>>>> 
> >>>>> The implementation adds a separate bus internally. This serves two
> >>>>> purposes:
> >>>>> - minimize overhead for old userspace that does not use PV MMIO
> >>>>> - minimize disruption in other code (since we don't know the length,
> >>>>> devices on the MMIO bus only get a valid address in write, this
> >>>>> way we don't need to touch all devices to teach them handle
> >>>>> an dinvalid length)
> >>>>> 
> >>>>> At the moment, this optimization is only supported for EPT on x86 and
> >>>>> silently ignored for NPT and MMU, so everything works correctly but
> >>>>> slowly.
> >>>>> 
> >>>>> TODO: NPT, MMU and non x86 architectures.
> >>>>> 
> >>>>> The idea was suggested by Peter Anvin.  Lots of thanks to Gleb for
> >>>>> pre-review and suggestions.
> >>>>> 
> >>>>> Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
> >>>> 
> >>>> This still uses page fault intercepts which are orders of magnitudes 
> >>>> slower than hypercalls. Why don't you just create a PV MMIO hypercall 
> >>>> that the guest can use to invoke MMIO accesses towards the host based on 
> >>>> physical addresses with explicit length encodings?
> >>>> 
> >>> It is slower, but not an order of magnitude slower. It become faster
> >>> with newer HW.
> >>> 
> >>>> That way you simplify and speed up all code paths, exceeding the speed 
> >>>> of PIO exits even. It should also be quite easily portable, as all other 
> >>>> platforms have hypercalls available as well.
> >>>> 
> >>> We are trying to avoid PV as much as possible (well this is also PV,
> >>> but not guest visible). We haven't replaced PIO with hypercall for the
> >>> same reason. My hope is that future HW will provide us with instruction
> >>> decode for basic mov instruction at which point this optimisation can be
> >>> dropped.
> >> 
> >> The same applies to an MMIO hypercall. Once the PV interface becomes 
> >> obsolete, we can drop the capability we expose to the guest.
> > 
> > Yes but unlike a hypercall this optimization does not need special code
> > in the guest. You can use standard OS interfaces for memory access.
> 
> Yes, but let's try to understand the room for optimization we're
> talking about here. The "normal" MMIO case seems excessively slower in
> MST's benchmarks than it did in mine. So maybe we're really just
> looking at a bug here.

Could be. I posted the code (kvm,qemu and test) so please review and try
to spot a bug.

> Also, if hcalls are again only 50% of a fast MMIO callback, it's
> certainly worth checking out what room for improvement we're really
> wasting.
> 
> 
> Alex

Take a look at 'kvm: pci PORT IO MMIO and PV MMIO speed tests'.
Try running the test on your hardware and see what happens.
Or post the test you used, I can try it on my box if you like.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to