> On Jan 26, 2022, at 1:13 PM, Dr. David Alan Gilbert <dgilb...@redhat.com> > wrote: > > * Jag Raman (jag.ra...@oracle.com) wrote: >> >> >>> On Jan 25, 2022, at 1:38 PM, Dr. David Alan Gilbert <dgilb...@redhat.com> >>> wrote: >>> >>> * Jag Raman (jag.ra...@oracle.com) wrote: >>>> >>>> >>>>> On Jan 19, 2022, at 7:12 PM, Michael S. Tsirkin <m...@redhat.com> wrote: >>>>> >>>>> On Wed, Jan 19, 2022 at 04:41:52PM -0500, Jagannathan Raman wrote: >>>>>> Allow PCI buses to be part of isolated CPU address spaces. This has a >>>>>> niche usage. >>>>>> >>>>>> TYPE_REMOTE_MACHINE allows multiple VMs to house their PCI devices in >>>>>> the same machine/server. This would cause address space collision as >>>>>> well as be a security vulnerability. Having separate address spaces for >>>>>> each PCI bus would solve this problem. >>>>> >>>>> Fascinating, but I am not sure I understand. any examples? >>>> >>>> Hi Michael! >>>> >>>> multiprocess QEMU and vfio-user implement a client-server model to allow >>>> out-of-process emulation of devices. The client QEMU, which makes ioctls >>>> to the kernel and runs VCPUs, could attach devices running in a server >>>> QEMU. The server QEMU needs access to parts of the client’s RAM to >>>> perform DMA. >>> >>> Do you ever have the opposite problem? i.e. when an emulated PCI device >> >> That’s an interesting question. >> >>> exposes a chunk of RAM-like space (frame buffer, or maybe a mapped file) >>> that the client can see. What happens if two emulated devices need to >>> access each others emulated address space? >> >> In this case, the kernel driver would map the destination’s chunk of >> internal RAM into >> the DMA space of the source device. Then the source device could write to >> that >> mapped address range, and the IOMMU should direct those writes to the >> destination device. > > Are all devices mappable like that?
If there is an IOMMU that supports DMA-Remapping (DMAR), that would be possible - the kernel could configure the DMAR to facilitate such mapping. If there is no DMAR, the kernel/cpu could buffer the write between devices. -- Jag > >> I would like to take a closer look at the IOMMU implementation on how to >> achieve >> this, and get back to you. I think the IOMMU would handle this. Could you >> please >> point me to the IOMMU implementation you have in mind? > > I didn't have one in mind; I was just hitting a similar problem on > Virtiofs DAX. > > Dave > >> Thank you! >> -- >> Jag >> >>> >>> Dave >>> >>>> In the case where multiple clients attach devices that are running on the >>>> same server, we need to ensure that each devices has isolated memory >>>> ranges. This ensures that the memory space of one device is not visible >>>> to other devices in the same server. >>>> >>>>> >>>>> I also wonder whether this special type could be modelled like a special >>>>> kind of iommu internally. >>>> >>>> Could you please provide some more details on the design? >>>> >>>>> >>>>>> Signed-off-by: Elena Ufimtseva <elena.ufimts...@oracle.com> >>>>>> Signed-off-by: John G Johnson <john.g.john...@oracle.com> >>>>>> Signed-off-by: Jagannathan Raman <jag.ra...@oracle.com> >>>>>> --- >>>>>> include/hw/pci/pci.h | 2 ++ >>>>>> include/hw/pci/pci_bus.h | 17 +++++++++++++++++ >>>>>> hw/pci/pci.c | 17 +++++++++++++++++ >>>>>> hw/pci/pci_bridge.c | 5 +++++ >>>>>> 4 files changed, 41 insertions(+) >>>>>> >>>>>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h >>>>>> index 023abc0f79..9bb4472abc 100644 >>>>>> --- a/include/hw/pci/pci.h >>>>>> +++ b/include/hw/pci/pci.h >>>>>> @@ -387,6 +387,8 @@ void pci_device_save(PCIDevice *s, QEMUFile *f); >>>>>> int pci_device_load(PCIDevice *s, QEMUFile *f); >>>>>> MemoryRegion *pci_address_space(PCIDevice *dev); >>>>>> MemoryRegion *pci_address_space_io(PCIDevice *dev); >>>>>> +AddressSpace *pci_isol_as_mem(PCIDevice *dev); >>>>>> +AddressSpace *pci_isol_as_io(PCIDevice *dev); >>>>>> >>>>>> /* >>>>>> * Should not normally be used by devices. For use by sPAPR target >>>>>> diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h >>>>>> index 347440d42c..d78258e79e 100644 >>>>>> --- a/include/hw/pci/pci_bus.h >>>>>> +++ b/include/hw/pci/pci_bus.h >>>>>> @@ -39,9 +39,26 @@ struct PCIBus { >>>>>> void *irq_opaque; >>>>>> PCIDevice *devices[PCI_SLOT_MAX * PCI_FUNC_MAX]; >>>>>> PCIDevice *parent_dev; >>>>>> + >>>>>> MemoryRegion *address_space_mem; >>>>>> MemoryRegion *address_space_io; >>>>>> >>>>>> + /** >>>>>> + * Isolated address spaces - these allow the PCI bus to be part >>>>>> + * of an isolated address space as opposed to the global >>>>>> + * address_space_memory & address_space_io. >>>>> >>>>> Are you sure address_space_memory & address_space_io are >>>>> always global? even in the case of an iommu? >>>> >>>> On the CPU side of the Root Complex, I believe address_space_memory >>>> & address_space_io are global. >>>> >>>> In the vfio-user case, devices on the same machine (TYPE_REMOTE_MACHINE) >>>> could be attached to different clients VMs. Each client would have their >>>> own address >>>> space for their CPUs. With isolated address spaces, we ensure that the >>>> devices >>>> see the address space of the CPUs they’re attached to. >>>> >>>> Not sure if it’s OK to share weblinks in this mailing list, please let me >>>> know if that’s >>>> not preferred. But I’m referring to the terminology used in the following >>>> block diagram: >>>> https://en.wikipedia.org/wiki/Root_complex#/media/File:Example_PCI_Express_Topology.svg >>>> >>>>> >>>>>> This allows the >>>>>> + * bus to be attached to CPUs from different machines. The >>>>>> + * following is not used used commonly. >>>>>> + * >>>>>> + * TYPE_REMOTE_MACHINE allows emulating devices from multiple >>>>>> + * VM clients, >>>>> >>>>> what are VM clients? >>>> >>>> It’s the client in the client - server model explained above. >>>> >>>> Thank you! >>>> -- >>>> Jag >>>> >>>>> >>>>>> as such it needs the PCI buses in the same machine >>>>>> + * to be part of different CPU address spaces. The following is >>>>>> + * useful in that scenario. >>>>>> + * >>>>>> + */ >>>>>> + AddressSpace *isol_as_mem; >>>>>> + AddressSpace *isol_as_io; >>>>>> + >>>>>> QLIST_HEAD(, PCIBus) child; /* this will be replaced by qdev later */ >>>>>> QLIST_ENTRY(PCIBus) sibling;/* this will be replaced by qdev later */ >>>>>> >>>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c >>>>>> index 5d30f9ca60..d5f1c6c421 100644 >>>>>> --- a/hw/pci/pci.c >>>>>> +++ b/hw/pci/pci.c >>>>>> @@ -442,6 +442,8 @@ static void pci_root_bus_internal_init(PCIBus *bus, >>>>>> DeviceState *parent, >>>>>> bus->slot_reserved_mask = 0x0; >>>>>> bus->address_space_mem = address_space_mem; >>>>>> bus->address_space_io = address_space_io; >>>>>> + bus->isol_as_mem = NULL; >>>>>> + bus->isol_as_io = NULL; >>>>>> bus->flags |= PCI_BUS_IS_ROOT; >>>>>> >>>>>> /* host bridge */ >>>>>> @@ -2676,6 +2678,16 @@ MemoryRegion *pci_address_space_io(PCIDevice *dev) >>>>>> return pci_get_bus(dev)->address_space_io; >>>>>> } >>>>>> >>>>>> +AddressSpace *pci_isol_as_mem(PCIDevice *dev) >>>>>> +{ >>>>>> + return pci_get_bus(dev)->isol_as_mem; >>>>>> +} >>>>>> + >>>>>> +AddressSpace *pci_isol_as_io(PCIDevice *dev) >>>>>> +{ >>>>>> + return pci_get_bus(dev)->isol_as_io; >>>>>> +} >>>>>> + >>>>>> static void pci_device_class_init(ObjectClass *klass, void *data) >>>>>> { >>>>>> DeviceClass *k = DEVICE_CLASS(klass); >>>>>> @@ -2699,6 +2711,7 @@ static void pci_device_class_base_init(ObjectClass >>>>>> *klass, void *data) >>>>>> >>>>>> AddressSpace *pci_device_iommu_address_space(PCIDevice *dev) >>>>>> { >>>>>> + AddressSpace *iommu_as = NULL; >>>>>> PCIBus *bus = pci_get_bus(dev); >>>>>> PCIBus *iommu_bus = bus; >>>>>> uint8_t devfn = dev->devfn; >>>>>> @@ -2745,6 +2758,10 @@ AddressSpace >>>>>> *pci_device_iommu_address_space(PCIDevice *dev) >>>>>> if (!pci_bus_bypass_iommu(bus) && iommu_bus && iommu_bus->iommu_fn) { >>>>>> return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, devfn); >>>>>> } >>>>>> + iommu_as = pci_isol_as_mem(dev); >>>>>> + if (iommu_as) { >>>>>> + return iommu_as; >>>>>> + } >>>>>> return &address_space_memory; >>>>>> } >>>>>> >>>>>> diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c >>>>>> index da34c8ebcd..98366768d2 100644 >>>>>> --- a/hw/pci/pci_bridge.c >>>>>> +++ b/hw/pci/pci_bridge.c >>>>>> @@ -383,6 +383,11 @@ void pci_bridge_initfn(PCIDevice *dev, const char >>>>>> *typename) >>>>>> sec_bus->address_space_io = &br->address_space_io; >>>>>> memory_region_init(&br->address_space_io, OBJECT(br), "pci_bridge_io", >>>>>> 4 * GiB); >>>>>> + >>>>>> + /* This PCI bridge puts the sec_bus in its parent's address space */ >>>>>> + sec_bus->isol_as_mem = pci_isol_as_mem(dev); >>>>>> + sec_bus->isol_as_io = pci_isol_as_io(dev); >>>>>> + >>>>>> br->windows = pci_bridge_region_init(br); >>>>>> QLIST_INIT(&sec_bus->child); >>>>>> QLIST_INSERT_HEAD(&parent->child, sec_bus, sibling); >>>>>> -- >>>>>> 2.20.1 >>>> >>> -- >>> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK >>> >> > -- > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK >