On Tue, Dec 13, 2016 at 06:17:47AM -0700, Alex Williamson wrote: > On Tue, 13 Dec 2016 14:12:12 +0800 > Peter Xu <pet...@redhat.com> wrote: > > > On Mon, Dec 12, 2016 at 10:48:28PM -0700, Alex Williamson wrote: > > > On Tue, 13 Dec 2016 13:24:29 +0800 > > > Peter Xu <pet...@redhat.com> wrote: > > > > > > > On Mon, Dec 12, 2016 at 08:51:50PM -0700, Alex Williamson wrote: > > > > > > > > [...] > > > > > > > > > > > I'm not sure how the vIOMMU supporting 39 bits or 48 bits is > > > > > > > directly > > > > > > > relevant to vfio, we're not sharing page tables. There is > > > > > > > already a > > > > > > > case today, without vIOMMU that you can make a guest which has > > > > > > > more > > > > > > > guest physical address space than the hardware IOMMU by > > > > > > > overcommitting > > > > > > > system memory. Generally this quickly resolves itself when we > > > > > > > start > > > > > > > pinning pages since the physical address width of the IOMMU is > > > > > > > typically the same as the physical address width of the host > > > > > > > system > > > > > > > (ie. we exhaust the host memory). > > > > > > > > > > > > Hi, Alex, > > > > > > > > > > > > Here does "hardware IOMMU" means the IOMMU iova address space width? > > > > > > For example, if guest has 48 bits physical address width (without > > > > > > vIOMMU), but host hardware IOMMU only supports 39 bits for its iova > > > > > > address space, could device assigment work in this case? > > > > > > > > > > The current usage depends entirely on what the user (VM) tries to map. > > > > > You could expose a vIOMMU with a 64bit address width, but the moment > > > > > you try to perform a DMA mapping with IOVA beyond bit 39 (if that's > > > > > the > > > > > host IOMMU address width), the ioctl will fail and the VM will abort. > > > > > IOW, you can claim whatever vIOMMU address width you want, but if you > > > > > layout guest memory or devices in such a way that actually require > > > > > IOVA > > > > > mapping beyond the host capabilities, you're going to abort. > > > > > Likewise, > > > > > without a vIOMMU if the guest memory layout is sufficiently sparse to > > > > > require such IOVAs, you're going to abort. Thanks, > > > > > > > > Thanks for the explanation. I got the point. > > > > > > > > However, should we allow guest behaviors affect hypervisor? In this > > > > case, if guest maps IOVA range over 39 bits (assuming vIOMMU is > > > > declaring itself with 48 bits address width), the VM will crash. How > > > > about we shrink vIOMMU address width to 39 bits during boot if we > > > > detected that assigned devices are configured? IMHO no matter what we > > > > do in the guest, the hypervisor should keep the guest alive from > > > > hypervisor POV (emulation of the guest hardware should not be stopped > > > > by guest behavior). If any operation in guest can cause hypervisor > > > > down, isn't it a bug? > > > > > > Any case of the guest crashing the hypervisor (ie. the host) is a > > > serious bug, but a guest causing it's own VM to abort is an entirely > > > different class, and in some cases justified. For instance, you only > > > need a guest misbehaving in the virtio protocol to generate a VM > > > abort. The cases Kevin raises make me reconsider because they are > > > cases of a VM behaving properly, within the specifications of the > > > hardware exposed to it, generating a VM abort, and in the case of vfio > > > exposed through to a guest user, allow the VM to be susceptible to the > > > actions of that user. > > > > > > Of course any time we tie VM hardware to a host constraint, we're > > > asking for trouble. You're example of shrinking the vIOMMU address > > > width to 39bits on boot highlights that. Clearly cold plug devices is > > > only one scenario, what about hotplug devices? We cannot dynamically > > > change the vIOMMU address width. What about migration, we could start > > > the VM w/o an assigned device on a 48bit capable host and migrate it to > > > a 39bit host and then attempt to hot add an assigned device. For the > > > most compatibility, why would we ever configure the VM with a vIOMMU > > > address width beyond the minimum necessary to support the potential > > > populated guest physical memory? Thanks, > > > > For now, I feel a tunable for the address width more essential - let's > > just name it as "aw-bits", which should only be used by advanced > > users. By default, we can use an address width safe enough, like 39 > > bits (I assume that most pIOMMUs should support at least 39 bits). > > User configurations can override (for now, we can limit the options to > > only 39/48 bits). > > > > Then, we can temporarily live even without the interface to detect > > host parameters - when user specify a specific width, he/she will > > manage the rest (of course taking the risk of VM aborts). > > I'm sorry, what is the actual benefit of a 48-bit address width? > Simply to be able to support larger memory VMs? In that case the > address width should be automatically configured when necessary rather > than providing yet another obscure user configuration.
I think we need to map out all the issues, and a tunable isn't a bad way to experiment in order do this. > Minimally, if > we don't have the support worked out for an option we should denote it > as an experimental option by prefixing it with 'x-'. Once we make a > non-experimental option, we're stuck with it, and if feels like this is > being rushed through without an concrete requirement for supporting > it. Thanks, > > Alex That's a good idea I think. We'll rename once we have a better understanding what this depends on. -- MST