Re: [Qemu-devel] [PATCH for-2.9 2/2] intel_iommu: extend supported guest aw to 48 bits

Michael S. Tsirkin Tue, 13 Dec 2016 06:41:19 -0800

On Tue, Dec 13, 2016 at 06:17:47AM -0700, Alex Williamson wrote:
> On Tue, 13 Dec 2016 14:12:12 +0800
> Peter Xu <pet...@redhat.com> wrote:
> 
> > On Mon, Dec 12, 2016 at 10:48:28PM -0700, Alex Williamson wrote:
> > > On Tue, 13 Dec 2016 13:24:29 +0800
> > > Peter Xu <pet...@redhat.com> wrote:
> > >   
> > > > On Mon, Dec 12, 2016 at 08:51:50PM -0700, Alex Williamson wrote:
> > > > 
> > > > [...]
> > > >   
> > > > > > > I'm not sure how the vIOMMU supporting 39 bits or 48 bits is 
> > > > > > > directly
> > > > > > > relevant to vfio, we're not sharing page tables.  There is 
> > > > > > > already a
> > > > > > > case today, without vIOMMU that you can make a guest which has 
> > > > > > > more
> > > > > > > guest physical address space than the hardware IOMMU by 
> > > > > > > overcommitting
> > > > > > > system memory.  Generally this quickly resolves itself when we 
> > > > > > > start
> > > > > > > pinning pages since the physical address width of the IOMMU is
> > > > > > > typically the same as the physical address width of the host 
> > > > > > > system
> > > > > > > (ie. we exhaust the host memory).      
> > > > > > 
> > > > > > Hi, Alex,
> > > > > > 
> > > > > > Here does "hardware IOMMU" means the IOMMU iova address space width?
> > > > > > For example, if guest has 48 bits physical address width (without
> > > > > > vIOMMU), but host hardware IOMMU only supports 39 bits for its iova
> > > > > > address space, could device assigment work in this case?    
> > > > > 
> > > > > The current usage depends entirely on what the user (VM) tries to map.
> > > > > You could expose a vIOMMU with a 64bit address width, but the moment
> > > > > you try to perform a DMA mapping with IOVA beyond bit 39 (if that's 
> > > > > the
> > > > > host IOMMU address width), the ioctl will fail and the VM will abort.
> > > > > IOW, you can claim whatever vIOMMU address width you want, but if you
> > > > > layout guest memory or devices in such a way that actually require 
> > > > > IOVA
> > > > > mapping beyond the host capabilities, you're going to abort.  
> > > > > Likewise,
> > > > > without a vIOMMU if the guest memory layout is sufficiently sparse to
> > > > > require such IOVAs, you're going to abort.  Thanks,    
> > > > 
> > > > Thanks for the explanation. I got the point.
> > > > 
> > > > However, should we allow guest behaviors affect hypervisor? In this
> > > > case, if guest maps IOVA range over 39 bits (assuming vIOMMU is
> > > > declaring itself with 48 bits address width), the VM will crash. How
> > > > about we shrink vIOMMU address width to 39 bits during boot if we
> > > > detected that assigned devices are configured? IMHO no matter what we
> > > > do in the guest, the hypervisor should keep the guest alive from
> > > > hypervisor POV (emulation of the guest hardware should not be stopped
> > > > by guest behavior). If any operation in guest can cause hypervisor
> > > > down, isn't it a bug?  
> > > 
> > > Any case of the guest crashing the hypervisor (ie. the host) is a
> > > serious bug, but a guest causing it's own VM to abort is an entirely
> > > different class, and in some cases justified.  For instance, you only
> > > need a guest misbehaving in the virtio protocol to generate a VM
> > > abort.  The cases Kevin raises make me reconsider because they are
> > > cases of a VM behaving properly, within the specifications of the
> > > hardware exposed to it, generating a VM abort, and in the case of vfio
> > > exposed through to a guest user, allow the VM to be susceptible to the
> > > actions of that user.
> > > 
> > > Of course any time we tie VM hardware to a host constraint, we're
> > > asking for trouble.  You're example of shrinking the vIOMMU address
> > > width to 39bits on boot highlights that.  Clearly cold plug devices is
> > > only one scenario, what about hotplug devices?  We cannot dynamically
> > > change the vIOMMU address width.  What about migration, we could start
> > > the VM w/o an assigned device on a 48bit capable host and migrate it to
> > > a 39bit host and then attempt to hot add an assigned device.  For the
> > > most compatibility, why would we ever configure the VM with a vIOMMU
> > > address width beyond the minimum necessary to support the potential
> > > populated guest physical memory?  Thanks,  
> > 
> > For now, I feel a tunable for the address width more essential - let's
> > just name it as "aw-bits", which should only be used by advanced
> > users. By default, we can use an address width safe enough, like 39
> > bits (I assume that most pIOMMUs should support at least 39 bits).
> > User configurations can override (for now, we can limit the options to
> > only 39/48 bits).
> > 
> > Then, we can temporarily live even without the interface to detect
> > host parameters - when user specify a specific width, he/she will
> > manage the rest (of course taking the risk of VM aborts).
> 
> I'm sorry, what is the actual benefit of a 48-bit address width?
> Simply to be able to support larger memory VMs?  In that case the
> address width should be automatically configured when necessary rather
> than providing yet another obscure user configuration.


I think we need to map out all the issues, and a tunable
isn't a bad way to experiment in order do this.

>  Minimally, if
> we don't have the support worked out for an option we should denote it
> as an experimental option by prefixing it with 'x-'.  Once we make a
> non-experimental option, we're stuck with it, and if feels like this is
> being rushed through without an concrete requirement for supporting
> it.  Thanks,
> 
> Alex

That's a good idea I think. We'll rename once we have
a better understanding what this depends on.

-- 
MST

Re: [Qemu-devel] [PATCH for-2.9 2/2] intel_iommu: extend supported guest aw to 48 bits

Reply via email to