> On Jan 26, 2022, at 1:13 PM, Dr. David Alan Gilbert <dgilb...@redhat.com> 
> wrote:
> 
> * Jag Raman (jag.ra...@oracle.com) wrote:
>> 
>> 
>>> On Jan 25, 2022, at 1:38 PM, Dr. David Alan Gilbert <dgilb...@redhat.com> 
>>> wrote:
>>> 
>>> * Jag Raman (jag.ra...@oracle.com) wrote:
>>>> 
>>>> 
>>>>> On Jan 19, 2022, at 7:12 PM, Michael S. Tsirkin <m...@redhat.com> wrote:
>>>>> 
>>>>> On Wed, Jan 19, 2022 at 04:41:52PM -0500, Jagannathan Raman wrote:
>>>>>> Allow PCI buses to be part of isolated CPU address spaces. This has a
>>>>>> niche usage.
>>>>>> 
>>>>>> TYPE_REMOTE_MACHINE allows multiple VMs to house their PCI devices in
>>>>>> the same machine/server. This would cause address space collision as
>>>>>> well as be a security vulnerability. Having separate address spaces for
>>>>>> each PCI bus would solve this problem.
>>>>> 
>>>>> Fascinating, but I am not sure I understand. any examples?
>>>> 
>>>> Hi Michael!
>>>> 
>>>> multiprocess QEMU and vfio-user implement a client-server model to allow
>>>> out-of-process emulation of devices. The client QEMU, which makes ioctls
>>>> to the kernel and runs VCPUs, could attach devices running in a server
>>>> QEMU. The server QEMU needs access to parts of the client’s RAM to
>>>> perform DMA.
>>> 
>>> Do you ever have the opposite problem? i.e. when an emulated PCI device
>> 
>> That’s an interesting question.
>> 
>>> exposes a chunk of RAM-like space (frame buffer, or maybe a mapped file)
>>> that the client can see.  What happens if two emulated devices need to
>>> access each others emulated address space?
>> 
>> In this case, the kernel driver would map the destination’s chunk of 
>> internal RAM into
>> the DMA space of the source device. Then the source device could write to 
>> that
>> mapped address range, and the IOMMU should direct those writes to the
>> destination device.
> 
> Are all devices mappable like that?

If there is an IOMMU that supports DMA-Remapping (DMAR), that would be
possible - the kernel could configure the DMAR to facilitate such mapping.

If there is no DMAR, the kernel/cpu could buffer the write between devices.

--
Jag

> 
>> I would like to take a closer look at the IOMMU implementation on how to 
>> achieve
>> this, and get back to you. I think the IOMMU would handle this. Could you 
>> please
>> point me to the IOMMU implementation you have in mind?
> 
> I didn't have one in mind; I was just hitting a similar problem on
> Virtiofs DAX.
> 
> Dave
> 
>> Thank you!
>> --
>> Jag
>> 
>>> 
>>> Dave
>>> 
>>>> In the case where multiple clients attach devices that are running on the
>>>> same server, we need to ensure that each devices has isolated memory
>>>> ranges. This ensures that the memory space of one device is not visible
>>>> to other devices in the same server.
>>>> 
>>>>> 
>>>>> I also wonder whether this special type could be modelled like a special
>>>>> kind of iommu internally.
>>>> 
>>>> Could you please provide some more details on the design?
>>>> 
>>>>> 
>>>>>> Signed-off-by: Elena Ufimtseva <elena.ufimts...@oracle.com>
>>>>>> Signed-off-by: John G Johnson <john.g.john...@oracle.com>
>>>>>> Signed-off-by: Jagannathan Raman <jag.ra...@oracle.com>
>>>>>> ---
>>>>>> include/hw/pci/pci.h     |  2 ++
>>>>>> include/hw/pci/pci_bus.h | 17 +++++++++++++++++
>>>>>> hw/pci/pci.c             | 17 +++++++++++++++++
>>>>>> hw/pci/pci_bridge.c      |  5 +++++
>>>>>> 4 files changed, 41 insertions(+)
>>>>>> 
>>>>>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
>>>>>> index 023abc0f79..9bb4472abc 100644
>>>>>> --- a/include/hw/pci/pci.h
>>>>>> +++ b/include/hw/pci/pci.h
>>>>>> @@ -387,6 +387,8 @@ void pci_device_save(PCIDevice *s, QEMUFile *f);
>>>>>> int pci_device_load(PCIDevice *s, QEMUFile *f);
>>>>>> MemoryRegion *pci_address_space(PCIDevice *dev);
>>>>>> MemoryRegion *pci_address_space_io(PCIDevice *dev);
>>>>>> +AddressSpace *pci_isol_as_mem(PCIDevice *dev);
>>>>>> +AddressSpace *pci_isol_as_io(PCIDevice *dev);
>>>>>> 
>>>>>> /*
>>>>>> * Should not normally be used by devices. For use by sPAPR target
>>>>>> diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h
>>>>>> index 347440d42c..d78258e79e 100644
>>>>>> --- a/include/hw/pci/pci_bus.h
>>>>>> +++ b/include/hw/pci/pci_bus.h
>>>>>> @@ -39,9 +39,26 @@ struct PCIBus {
>>>>>>   void *irq_opaque;
>>>>>>   PCIDevice *devices[PCI_SLOT_MAX * PCI_FUNC_MAX];
>>>>>>   PCIDevice *parent_dev;
>>>>>> +
>>>>>>   MemoryRegion *address_space_mem;
>>>>>>   MemoryRegion *address_space_io;
>>>>>> 
>>>>>> +    /**
>>>>>> +     * Isolated address spaces - these allow the PCI bus to be part
>>>>>> +     * of an isolated address space as opposed to the global
>>>>>> +     * address_space_memory & address_space_io.
>>>>> 
>>>>> Are you sure address_space_memory & address_space_io are
>>>>> always global? even in the case of an iommu?
>>>> 
>>>> On the CPU side of the Root Complex, I believe address_space_memory
>>>> & address_space_io are global.
>>>> 
>>>> In the vfio-user case, devices on the same machine (TYPE_REMOTE_MACHINE)
>>>> could be attached to different clients VMs. Each client would have their 
>>>> own address
>>>> space for their CPUs. With isolated address spaces, we ensure that the 
>>>> devices
>>>> see the address space of the CPUs they’re attached to.
>>>> 
>>>> Not sure if it’s OK to share weblinks in this mailing list, please let me 
>>>> know if that’s
>>>> not preferred. But I’m referring to the terminology used in the following 
>>>> block diagram:
>>>> https://en.wikipedia.org/wiki/Root_complex#/media/File:Example_PCI_Express_Topology.svg
>>>> 
>>>>> 
>>>>>> This allows the
>>>>>> +     * bus to be attached to CPUs from different machines. The
>>>>>> +     * following is not used used commonly.
>>>>>> +     *
>>>>>> +     * TYPE_REMOTE_MACHINE allows emulating devices from multiple
>>>>>> +     * VM clients,
>>>>> 
>>>>> what are VM clients?
>>>> 
>>>> It’s the client in the client - server model explained above.
>>>> 
>>>> Thank you!
>>>> --
>>>> Jag
>>>> 
>>>>> 
>>>>>> as such it needs the PCI buses in the same machine
>>>>>> +     * to be part of different CPU address spaces. The following is
>>>>>> +     * useful in that scenario.
>>>>>> +     *
>>>>>> +     */
>>>>>> +    AddressSpace *isol_as_mem;
>>>>>> +    AddressSpace *isol_as_io;
>>>>>> +
>>>>>>   QLIST_HEAD(, PCIBus) child; /* this will be replaced by qdev later */
>>>>>>   QLIST_ENTRY(PCIBus) sibling;/* this will be replaced by qdev later */
>>>>>> 
>>>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>>>> index 5d30f9ca60..d5f1c6c421 100644
>>>>>> --- a/hw/pci/pci.c
>>>>>> +++ b/hw/pci/pci.c
>>>>>> @@ -442,6 +442,8 @@ static void pci_root_bus_internal_init(PCIBus *bus, 
>>>>>> DeviceState *parent,
>>>>>>   bus->slot_reserved_mask = 0x0;
>>>>>>   bus->address_space_mem = address_space_mem;
>>>>>>   bus->address_space_io = address_space_io;
>>>>>> +    bus->isol_as_mem = NULL;
>>>>>> +    bus->isol_as_io = NULL;
>>>>>>   bus->flags |= PCI_BUS_IS_ROOT;
>>>>>> 
>>>>>>   /* host bridge */
>>>>>> @@ -2676,6 +2678,16 @@ MemoryRegion *pci_address_space_io(PCIDevice *dev)
>>>>>>   return pci_get_bus(dev)->address_space_io;
>>>>>> }
>>>>>> 
>>>>>> +AddressSpace *pci_isol_as_mem(PCIDevice *dev)
>>>>>> +{
>>>>>> +    return pci_get_bus(dev)->isol_as_mem;
>>>>>> +}
>>>>>> +
>>>>>> +AddressSpace *pci_isol_as_io(PCIDevice *dev)
>>>>>> +{
>>>>>> +    return pci_get_bus(dev)->isol_as_io;
>>>>>> +}
>>>>>> +
>>>>>> static void pci_device_class_init(ObjectClass *klass, void *data)
>>>>>> {
>>>>>>   DeviceClass *k = DEVICE_CLASS(klass);
>>>>>> @@ -2699,6 +2711,7 @@ static void pci_device_class_base_init(ObjectClass 
>>>>>> *klass, void *data)
>>>>>> 
>>>>>> AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
>>>>>> {
>>>>>> +    AddressSpace *iommu_as = NULL;
>>>>>>   PCIBus *bus = pci_get_bus(dev);
>>>>>>   PCIBus *iommu_bus = bus;
>>>>>>   uint8_t devfn = dev->devfn;
>>>>>> @@ -2745,6 +2758,10 @@ AddressSpace 
>>>>>> *pci_device_iommu_address_space(PCIDevice *dev)
>>>>>>   if (!pci_bus_bypass_iommu(bus) && iommu_bus && iommu_bus->iommu_fn) {
>>>>>>       return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, devfn);
>>>>>>   }
>>>>>> +    iommu_as = pci_isol_as_mem(dev);
>>>>>> +    if (iommu_as) {
>>>>>> +        return iommu_as;
>>>>>> +    }
>>>>>>   return &address_space_memory;
>>>>>> }
>>>>>> 
>>>>>> diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
>>>>>> index da34c8ebcd..98366768d2 100644
>>>>>> --- a/hw/pci/pci_bridge.c
>>>>>> +++ b/hw/pci/pci_bridge.c
>>>>>> @@ -383,6 +383,11 @@ void pci_bridge_initfn(PCIDevice *dev, const char 
>>>>>> *typename)
>>>>>>   sec_bus->address_space_io = &br->address_space_io;
>>>>>>   memory_region_init(&br->address_space_io, OBJECT(br), "pci_bridge_io",
>>>>>>                      4 * GiB);
>>>>>> +
>>>>>> +    /* This PCI bridge puts the sec_bus in its parent's address space */
>>>>>> +    sec_bus->isol_as_mem = pci_isol_as_mem(dev);
>>>>>> +    sec_bus->isol_as_io = pci_isol_as_io(dev);
>>>>>> +
>>>>>>   br->windows = pci_bridge_region_init(br);
>>>>>>   QLIST_INIT(&sec_bus->child);
>>>>>>   QLIST_INSERT_HEAD(&parent->child, sec_bus, sibling);
>>>>>> -- 
>>>>>> 2.20.1
>>>> 
>>> -- 
>>> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>>> 
>> 
> -- 
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> 

Reply via email to