On Thu, 28 Oct 2021 12:31:29 +0800 Peter Xu <pet...@redhat.com> wrote:
> Add a pre-plug hook for x86-iommu, so that we can detect vfio-pci devices > before realizing the vIOMMU device. > > When the guest contains both the x86 vIOMMU and vfio-pci devices, the user > needs to specify the x86 vIOMMU before the vfio-pci devices. The reason is, > vfio_realize() calls pci_device_iommu_address_space() to fetch the correct dma > address space for the device, while that API can only work right after the > vIOMMU device initialized first. > > For example, the iommu_fn() that is used in pci_device_iommu_address_space() > is > only setup in realize() of the vIOMMU devices. > > For a long time we have had libvirt making sure that the ordering is correct, > however from qemu side we never fail a guest from booting even if the ordering > is specified wrongly. When the order is wrong, the guest will encounter > misterious error when operating on the vfio-pci device because in QEMU we'll > still assume the vfio-pci devices are put into the default DMA domain (which > is > normally the direct GPA mapping), so e.g. the DMAs will never go right. > > This patch fails the guest from booting when we detected such errornous > cmdline > specified, then the guest at least won't encounter weird device behavior after > booted. The error message will also help the user to know how to fix the > issue. > > Cc: Alex Williamson <alex.william...@redhat.com> > Suggested-by: Igor Mammedov <imamm...@redhat.com> > Signed-off-by: Peter Xu <pet...@redhat.com> > --- > hw/i386/pc.c | 4 ++++ > hw/i386/x86-iommu.c | 14 ++++++++++++++ > include/hw/i386/x86-iommu.h | 8 ++++++++ > 3 files changed, 26 insertions(+) > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c > index 86223acfd3..b70a04011e 100644 > --- a/hw/i386/pc.c > +++ b/hw/i386/pc.c > @@ -81,6 +81,7 @@ > #include "hw/core/cpu.h" > #include "hw/usb.h" > #include "hw/i386/intel_iommu.h" > +#include "hw/i386/x86-iommu.h" > #include "hw/net/ne2000-isa.h" > #include "standard-headers/asm-x86/bootparam.h" > #include "hw/virtio/virtio-pmem-pci.h" > @@ -1327,6 +1328,8 @@ static void > pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev, > pc_memory_pre_plug(hotplug_dev, dev, errp); > } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) { > x86_cpu_pre_plug(hotplug_dev, dev, errp); > + } else if (object_dynamic_cast(OBJECT(dev), TYPE_X86_IOMMU_DEVICE)) { > + x86_iommu_pre_plug(X86_IOMMU_DEVICE(dev), errp); > } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) || > object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) { > pc_virtio_md_pci_pre_plug(hotplug_dev, dev, errp); > @@ -1383,6 +1386,7 @@ static HotplugHandler > *pc_get_hotplug_handler(MachineState *machine, > { > if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) || > object_dynamic_cast(OBJECT(dev), TYPE_CPU) || > + object_dynamic_cast(OBJECT(dev), TYPE_X86_IOMMU_DEVICE) || > object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) || > object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) { > return HOTPLUG_HANDLER(machine); > diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c > index 86ad03972e..c9ee9041a3 100644 > --- a/hw/i386/x86-iommu.c > +++ b/hw/i386/x86-iommu.c > @@ -22,6 +22,7 @@ > #include "hw/i386/x86-iommu.h" > #include "hw/qdev-properties.h" > #include "hw/i386/pc.h" > +#include "hw/vfio/pci.h" > #include "qapi/error.h" > #include "qemu/error-report.h" > #include "trace.h" > @@ -103,6 +104,19 @@ IommuType x86_iommu_get_type(void) > return x86_iommu_default->type; > } > > +void x86_iommu_pre_plug(X86IOMMUState *iommu, Error **errp) > +{ > + bool ambiguous = false; > + Object *object; > + > + object = object_resolve_path_type("", TYPE_VFIO_PCI, &ambiguous); > + if (object || ambiguous) { > + /* There're one or more vfio-pci devices detected */ > + error_setg(errp, "Please specify all the vfio-pci devices to be > after " > + "the vIOMMU device"); > + } I still really don't buy the argument that vfio-pci is the only driver that does "this thing", therefore we can just look for vfio-pci devices by name rather than try to generically detect devices that have this dependency. That seems short sighted. I've already suggested that pci-core could record on the PCIDevice structure if the device address space has been accessed. We could also do something like create a TYPE_PCI_AS_DEVICE class derived from TYPE_PCI_DEVICE and any PCI drivers that make use of the device address space before machine-init-done would be of this class. That could even be enforced by pci_device_iommu_address_space() and would allow the same sort of object resolution as used here. Thanks, Alex > +} > + > static void x86_iommu_realize(DeviceState *dev, Error **errp) > { > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev); > diff --git a/include/hw/i386/x86-iommu.h b/include/hw/i386/x86-iommu.h > index 9de92d33a1..e8b6c293e0 100644 > --- a/include/hw/i386/x86-iommu.h > +++ b/include/hw/i386/x86-iommu.h > @@ -172,4 +172,12 @@ void x86_iommu_iec_notify_all(X86IOMMUState *iommu, bool > global, > * @out: Output MSI message > */ > void x86_iommu_irq_to_msi_message(X86IOMMUIrq *irq, MSIMessage *out); > + > +/** > + * x86_iommu_pre_plug: called before plugging the iommu device > + * @X86IOMMUState: the pointer to x86 iommu state > + * @errp: the double pointer to Error, set if we want to fail the plug > + */ > +void x86_iommu_pre_plug(X86IOMMUState *iommu, Error **errp); > + > #endif