* Jag Raman (jag.ra...@oracle.com) wrote: > > > > On Jan 25, 2022, at 1:38 PM, Dr. David Alan Gilbert <dgilb...@redhat.com> > > wrote: > > > > * Jag Raman (jag.ra...@oracle.com) wrote: > >> > >> > >>> On Jan 19, 2022, at 7:12 PM, Michael S. Tsirkin <m...@redhat.com> wrote: > >>> > >>> On Wed, Jan 19, 2022 at 04:41:52PM -0500, Jagannathan Raman wrote: > >>>> Allow PCI buses to be part of isolated CPU address spaces. This has a > >>>> niche usage. > >>>> > >>>> TYPE_REMOTE_MACHINE allows multiple VMs to house their PCI devices in > >>>> the same machine/server. This would cause address space collision as > >>>> well as be a security vulnerability. Having separate address spaces for > >>>> each PCI bus would solve this problem. > >>> > >>> Fascinating, but I am not sure I understand. any examples? > >> > >> Hi Michael! > >> > >> multiprocess QEMU and vfio-user implement a client-server model to allow > >> out-of-process emulation of devices. The client QEMU, which makes ioctls > >> to the kernel and runs VCPUs, could attach devices running in a server > >> QEMU. The server QEMU needs access to parts of the client’s RAM to > >> perform DMA. > > > > Do you ever have the opposite problem? i.e. when an emulated PCI device > > That’s an interesting question. > > > exposes a chunk of RAM-like space (frame buffer, or maybe a mapped file) > > that the client can see. What happens if two emulated devices need to > > access each others emulated address space? > > In this case, the kernel driver would map the destination’s chunk of internal > RAM into > the DMA space of the source device. Then the source device could write to that > mapped address range, and the IOMMU should direct those writes to the > destination device.
Are all devices mappable like that? > I would like to take a closer look at the IOMMU implementation on how to > achieve > this, and get back to you. I think the IOMMU would handle this. Could you > please > point me to the IOMMU implementation you have in mind? I didn't have one in mind; I was just hitting a similar problem on Virtiofs DAX. Dave > Thank you! > -- > Jag > > > > > Dave > > > >> In the case where multiple clients attach devices that are running on the > >> same server, we need to ensure that each devices has isolated memory > >> ranges. This ensures that the memory space of one device is not visible > >> to other devices in the same server. > >> > >>> > >>> I also wonder whether this special type could be modelled like a special > >>> kind of iommu internally. > >> > >> Could you please provide some more details on the design? > >> > >>> > >>>> Signed-off-by: Elena Ufimtseva <elena.ufimts...@oracle.com> > >>>> Signed-off-by: John G Johnson <john.g.john...@oracle.com> > >>>> Signed-off-by: Jagannathan Raman <jag.ra...@oracle.com> > >>>> --- > >>>> include/hw/pci/pci.h | 2 ++ > >>>> include/hw/pci/pci_bus.h | 17 +++++++++++++++++ > >>>> hw/pci/pci.c | 17 +++++++++++++++++ > >>>> hw/pci/pci_bridge.c | 5 +++++ > >>>> 4 files changed, 41 insertions(+) > >>>> > >>>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h > >>>> index 023abc0f79..9bb4472abc 100644 > >>>> --- a/include/hw/pci/pci.h > >>>> +++ b/include/hw/pci/pci.h > >>>> @@ -387,6 +387,8 @@ void pci_device_save(PCIDevice *s, QEMUFile *f); > >>>> int pci_device_load(PCIDevice *s, QEMUFile *f); > >>>> MemoryRegion *pci_address_space(PCIDevice *dev); > >>>> MemoryRegion *pci_address_space_io(PCIDevice *dev); > >>>> +AddressSpace *pci_isol_as_mem(PCIDevice *dev); > >>>> +AddressSpace *pci_isol_as_io(PCIDevice *dev); > >>>> > >>>> /* > >>>> * Should not normally be used by devices. For use by sPAPR target > >>>> diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h > >>>> index 347440d42c..d78258e79e 100644 > >>>> --- a/include/hw/pci/pci_bus.h > >>>> +++ b/include/hw/pci/pci_bus.h > >>>> @@ -39,9 +39,26 @@ struct PCIBus { > >>>> void *irq_opaque; > >>>> PCIDevice *devices[PCI_SLOT_MAX * PCI_FUNC_MAX]; > >>>> PCIDevice *parent_dev; > >>>> + > >>>> MemoryRegion *address_space_mem; > >>>> MemoryRegion *address_space_io; > >>>> > >>>> + /** > >>>> + * Isolated address spaces - these allow the PCI bus to be part > >>>> + * of an isolated address space as opposed to the global > >>>> + * address_space_memory & address_space_io. > >>> > >>> Are you sure address_space_memory & address_space_io are > >>> always global? even in the case of an iommu? > >> > >> On the CPU side of the Root Complex, I believe address_space_memory > >> & address_space_io are global. > >> > >> In the vfio-user case, devices on the same machine (TYPE_REMOTE_MACHINE) > >> could be attached to different clients VMs. Each client would have their > >> own address > >> space for their CPUs. With isolated address spaces, we ensure that the > >> devices > >> see the address space of the CPUs they’re attached to. > >> > >> Not sure if it’s OK to share weblinks in this mailing list, please let me > >> know if that’s > >> not preferred. But I’m referring to the terminology used in the following > >> block diagram: > >> https://en.wikipedia.org/wiki/Root_complex#/media/File:Example_PCI_Express_Topology.svg > >> > >>> > >>>> This allows the > >>>> + * bus to be attached to CPUs from different machines. The > >>>> + * following is not used used commonly. > >>>> + * > >>>> + * TYPE_REMOTE_MACHINE allows emulating devices from multiple > >>>> + * VM clients, > >>> > >>> what are VM clients? > >> > >> It’s the client in the client - server model explained above. > >> > >> Thank you! > >> -- > >> Jag > >> > >>> > >>>> as such it needs the PCI buses in the same machine > >>>> + * to be part of different CPU address spaces. The following is > >>>> + * useful in that scenario. > >>>> + * > >>>> + */ > >>>> + AddressSpace *isol_as_mem; > >>>> + AddressSpace *isol_as_io; > >>>> + > >>>> QLIST_HEAD(, PCIBus) child; /* this will be replaced by qdev later */ > >>>> QLIST_ENTRY(PCIBus) sibling;/* this will be replaced by qdev later */ > >>>> > >>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c > >>>> index 5d30f9ca60..d5f1c6c421 100644 > >>>> --- a/hw/pci/pci.c > >>>> +++ b/hw/pci/pci.c > >>>> @@ -442,6 +442,8 @@ static void pci_root_bus_internal_init(PCIBus *bus, > >>>> DeviceState *parent, > >>>> bus->slot_reserved_mask = 0x0; > >>>> bus->address_space_mem = address_space_mem; > >>>> bus->address_space_io = address_space_io; > >>>> + bus->isol_as_mem = NULL; > >>>> + bus->isol_as_io = NULL; > >>>> bus->flags |= PCI_BUS_IS_ROOT; > >>>> > >>>> /* host bridge */ > >>>> @@ -2676,6 +2678,16 @@ MemoryRegion *pci_address_space_io(PCIDevice *dev) > >>>> return pci_get_bus(dev)->address_space_io; > >>>> } > >>>> > >>>> +AddressSpace *pci_isol_as_mem(PCIDevice *dev) > >>>> +{ > >>>> + return pci_get_bus(dev)->isol_as_mem; > >>>> +} > >>>> + > >>>> +AddressSpace *pci_isol_as_io(PCIDevice *dev) > >>>> +{ > >>>> + return pci_get_bus(dev)->isol_as_io; > >>>> +} > >>>> + > >>>> static void pci_device_class_init(ObjectClass *klass, void *data) > >>>> { > >>>> DeviceClass *k = DEVICE_CLASS(klass); > >>>> @@ -2699,6 +2711,7 @@ static void pci_device_class_base_init(ObjectClass > >>>> *klass, void *data) > >>>> > >>>> AddressSpace *pci_device_iommu_address_space(PCIDevice *dev) > >>>> { > >>>> + AddressSpace *iommu_as = NULL; > >>>> PCIBus *bus = pci_get_bus(dev); > >>>> PCIBus *iommu_bus = bus; > >>>> uint8_t devfn = dev->devfn; > >>>> @@ -2745,6 +2758,10 @@ AddressSpace > >>>> *pci_device_iommu_address_space(PCIDevice *dev) > >>>> if (!pci_bus_bypass_iommu(bus) && iommu_bus && iommu_bus->iommu_fn) { > >>>> return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, devfn); > >>>> } > >>>> + iommu_as = pci_isol_as_mem(dev); > >>>> + if (iommu_as) { > >>>> + return iommu_as; > >>>> + } > >>>> return &address_space_memory; > >>>> } > >>>> > >>>> diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c > >>>> index da34c8ebcd..98366768d2 100644 > >>>> --- a/hw/pci/pci_bridge.c > >>>> +++ b/hw/pci/pci_bridge.c > >>>> @@ -383,6 +383,11 @@ void pci_bridge_initfn(PCIDevice *dev, const char > >>>> *typename) > >>>> sec_bus->address_space_io = &br->address_space_io; > >>>> memory_region_init(&br->address_space_io, OBJECT(br), "pci_bridge_io", > >>>> 4 * GiB); > >>>> + > >>>> + /* This PCI bridge puts the sec_bus in its parent's address space */ > >>>> + sec_bus->isol_as_mem = pci_isol_as_mem(dev); > >>>> + sec_bus->isol_as_io = pci_isol_as_io(dev); > >>>> + > >>>> br->windows = pci_bridge_region_init(br); > >>>> QLIST_INIT(&sec_bus->child); > >>>> QLIST_INSERT_HEAD(&parent->child, sec_bus, sibling); > >>>> -- > >>>> 2.20.1 > >> > > -- > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK > > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK