Re: [PATCH v5 03/18] pci: isolated address space for PCI bus

Jag Raman Tue, 01 Feb 2022 19:06:01 -0800


> On Feb 1, 2022, at 5:47 PM, Alex Williamson <alex.william...@redhat.com> 
> wrote:
> 
> On Tue, 1 Feb 2022 21:24:08 +0000
> Jag Raman <jag.ra...@oracle.com> wrote:
> 
>>> On Feb 1, 2022, at 10:24 AM, Alex Williamson <alex.william...@redhat.com> 
>>> wrote:
>>> 
>>> On Tue, 1 Feb 2022 09:30:35 +0000
>>> Stefan Hajnoczi <stefa...@redhat.com> wrote:
>>> 
>>>> On Mon, Jan 31, 2022 at 09:16:23AM -0700, Alex Williamson wrote:  
>>>>> On Fri, 28 Jan 2022 09:18:08 +0000
>>>>> Stefan Hajnoczi <stefa...@redhat.com> wrote:
>>>>> 
>>>>>> On Thu, Jan 27, 2022 at 02:22:53PM -0700, Alex Williamson wrote:    
>>>>>>> If the goal here is to restrict DMA between devices, ie. peer-to-peer
>>>>>>> (p2p), why are we trying to re-invent what an IOMMU already does?      
>>>>>> 
>>>>>> The issue Dave raised is that vfio-user servers run in separate
>>>>>> processses from QEMU with shared memory access to RAM but no direct
>>>>>> access to non-RAM MemoryRegions. The virtiofs DAX Window BAR is one
>>>>>> example of a non-RAM MemoryRegion that can be the source/target of DMA
>>>>>> requests.
>>>>>> 
>>>>>> I don't think IOMMUs solve this problem but luckily the vfio-user
>>>>>> protocol already has messages that vfio-user servers can use as a
>>>>>> fallback when DMA cannot be completed through the shared memory RAM
>>>>>> accesses.
>>>>>> 
>>>>>>> In
>>>>>>> fact, it seems like an IOMMU does this better in providing an IOVA
>>>>>>> address space per BDF.  Is the dynamic mapping overhead too much?  What
>>>>>>> physical hardware properties or specifications could we leverage to
>>>>>>> restrict p2p mappings to a device?  Should it be governed by machine
>>>>>>> type to provide consistency between devices?  Should each "isolated"
>>>>>>> bus be in a separate root complex?  Thanks,      
>>>>>> 
>>>>>> There is a separate issue in this patch series regarding isolating the
>>>>>> address space where BAR accesses are made (i.e. the global
>>>>>> address_space_memory/io). When one process hosts multiple vfio-user
>>>>>> server instances (e.g. a software-defined network switch with multiple
>>>>>> ethernet devices) then each instance needs isolated memory and io address
>>>>>> spaces so that vfio-user clients don't cause collisions when they map
>>>>>> BARs to the same address.
>>>>>> 
>>>>>> I think the the separate root complex idea is a good solution. This
>>>>>> patch series takes a different approach by adding the concept of
>>>>>> isolated address spaces into hw/pci/.    
>>>>> 
>>>>> This all still seems pretty sketchy, BARs cannot overlap within the
>>>>> same vCPU address space, perhaps with the exception of when they're
>>>>> being sized, but DMA should be disabled during sizing.
>>>>> 
>>>>> Devices within the same VM context with identical BARs would need to
>>>>> operate in different address spaces.  For example a translation offset
>>>>> in the vCPU address space would allow unique addressing to the devices,
>>>>> perhaps using the translation offset bits to address a root complex and
>>>>> masking those bits for downstream transactions.
>>>>> 
>>>>> In general, the device simply operates in an address space, ie. an
>>>>> IOVA.  When a mapping is made within that address space, we perform a
>>>>> translation as necessary to generate a guest physical address.  The
>>>>> IOVA itself is only meaningful within the context of the address space,
>>>>> there is no requirement or expectation for it to be globally unique.
>>>>> 
>>>>> If the vfio-user server is making some sort of requirement that IOVAs
>>>>> are unique across all devices, that seems very, very wrong.  Thanks,    
>>>> 
>>>> Yes, BARs and IOVAs don't need to be unique across all devices.
>>>> 
>>>> The issue is that there can be as many guest physical address spaces as
>>>> there are vfio-user clients connected, so per-client isolated address
>>>> spaces are required. This patch series has a solution to that problem
>>>> with the new pci_isol_as_mem/io() API.  
>>> 
>>> Sorry, this still doesn't follow for me.  A server that hosts multiple
>>> devices across many VMs (I'm not sure if you're referring to the device
>>> or the VM as a client) needs to deal with different address spaces per
>>> device.  The server needs to be able to uniquely identify every DMA,
>>> which must be part of the interface protocol.  But I don't see how that
>>> imposes a requirement of an isolated address space.  If we want the
>>> device isolated because we don't trust the server, that's where an IOMMU
>>> provides per device isolation.  What is the restriction of the
>>> per-client isolated address space and why do we need it?  The server
>>> needing to support multiple clients is not a sufficient answer to
>>> impose new PCI bus types with an implicit restriction on the VM.  
>> 
>> Hi Alex,
>> 
>> I believe there are two separate problems with running PCI devices in
>> the vfio-user server. The first one is concerning memory isolation and
>> second one is vectoring of BAR accesses (as explained below).
>> 
>> In our previous patches (v3), we used an IOMMU to isolate memory
>> spaces. But we still had trouble with the vectoring. So we implemented
>> separate address spaces for each PCIBus to tackle both problems
>> simultaneously, based on the feedback we got.
>> 
>> The following gives an overview of issues concerning vectoring of
>> BAR accesses.
>> 
>> The device’s BAR regions are mapped into the guest physical address
>> space. The guest writes the guest PA of each BAR into the device’s BAR
>> registers. To access the BAR regions of the device, QEMU uses
>> address_space_rw() which vectors the physical address access to the
>> device BAR region handlers.
> 
> The guest physical address written to the BAR is irrelevant from the
> device perspective, this only serves to assign the BAR an offset within
> the address_space_mem, which is used by the vCPU (and possibly other
> devices depending on their address space).  There is no reason for the
> device itself to care about this address.


Thank you for the explanation, Alex!

The confusion at my part is whether we are inside the device already when
the server receives a request to access BAR region of a device. Based on
your explanation, I get that your view is the BAR access request has
propagated into the device already, whereas I was under the impression
that the request is still on the CPU side of the PCI root complex.

Your view makes sense to me - once the BAR access request reaches the
client (on the other side), we could consider that the request has reached
the device.

On a separate note, if devices don’t care about the values in BAR
registers, why do the default PCI config handlers intercept and map
the BAR region into address_space_mem?
(pci_default_write_config() -> pci_update_mappings())

Thank you!
--
Jag

> 
>> The PCIBus data structure already has address_space_mem and
>> address_space_io to contain the BAR regions of devices attached
>> to it. I understand that these two PCIBus members form the
>> PCI address space.
> 
> These are the CPU address spaces.  When there's no IOMMU, the PCI bus is
> identity mapped to the CPU address space.  When there is an IOMMU, the
> device address space is determined by the granularity of the IOMMU and
> may be entirely separate from address_space_mem.
> 
> I/O port space is always the identity mapped CPU address space unless
> sparse translations are used to create multiple I/O port spaces (not
> implemented).  I/O port space is only accessed by the CPU, there are no
> device initiated I/O port transactions, so the address space relative
> to the device is irrelevant.
> 
>> Typically, the machines map the PCI address space into the system address
>> space. For example, pc_pci_as_mapping_init() does this for ‘pc' machine 
>> types.
>> As such, there is a 1:1 mapping between system address space and PCI address
>> space of the root bus. Since all the PCI devices in the machine are assigned 
>> to
>> the same VM, we could map the PCI address space of all PCI buses to the same
>> system address space.
> 
> "Typically" only if we're restricted to the "pc", ie. i440FX, machine
> type since it doesn't support a vIOMMU.  There's no reason to focus on
> the identity map case versus the vIOMMU case.
> 
>> Whereas in the case of vfio-user, the devices running in the server could
>> belong to different VMs. Therefore, along with the physical address, we would
>> need to know the address space that the device belongs for
>> address_space_rw() to successfully vector BAR accesses into the PCI device.
> 
> But as far as device initiated transactions, there is only one address
> space for a given device, it's either address_space_mem or one provided
> by the vIOMMU and pci_device_iommu_address_space() tells us that
> address space.  Furthermore, the device never operates on a "physical
> address", it only ever operates on an IOVA, ie. an offset within the
> address space assigned to the device.  The IOVA should be considered
> arbitrary relative to mappings in any other address spaces.
> 
> Device initiated transactions operate on an IOVA within the (single)
> address space to which the device is assigned.  Any attempt to do
> otherwise violates the isolation put in place by things like vIOMMUs
> and ought to be considered a security concern, especially for a device
> serviced by an external process.  Thanks,
> 
> Alex
>

Re: [PATCH v5 03/18] pci: isolated address space for PCI bus

Reply via email to