> On Feb 1, 2022, at 5:47 PM, Alex Williamson <alex.william...@redhat.com> > wrote: > > On Tue, 1 Feb 2022 21:24:08 +0000 > Jag Raman <jag.ra...@oracle.com> wrote: > >>> On Feb 1, 2022, at 10:24 AM, Alex Williamson <alex.william...@redhat.com> >>> wrote: >>> >>> On Tue, 1 Feb 2022 09:30:35 +0000 >>> Stefan Hajnoczi <stefa...@redhat.com> wrote: >>> >>>> On Mon, Jan 31, 2022 at 09:16:23AM -0700, Alex Williamson wrote: >>>>> On Fri, 28 Jan 2022 09:18:08 +0000 >>>>> Stefan Hajnoczi <stefa...@redhat.com> wrote: >>>>> >>>>>> On Thu, Jan 27, 2022 at 02:22:53PM -0700, Alex Williamson wrote: >>>>>>> If the goal here is to restrict DMA between devices, ie. peer-to-peer >>>>>>> (p2p), why are we trying to re-invent what an IOMMU already does? >>>>>> >>>>>> The issue Dave raised is that vfio-user servers run in separate >>>>>> processses from QEMU with shared memory access to RAM but no direct >>>>>> access to non-RAM MemoryRegions. The virtiofs DAX Window BAR is one >>>>>> example of a non-RAM MemoryRegion that can be the source/target of DMA >>>>>> requests. >>>>>> >>>>>> I don't think IOMMUs solve this problem but luckily the vfio-user >>>>>> protocol already has messages that vfio-user servers can use as a >>>>>> fallback when DMA cannot be completed through the shared memory RAM >>>>>> accesses. >>>>>> >>>>>>> In >>>>>>> fact, it seems like an IOMMU does this better in providing an IOVA >>>>>>> address space per BDF. Is the dynamic mapping overhead too much? What >>>>>>> physical hardware properties or specifications could we leverage to >>>>>>> restrict p2p mappings to a device? Should it be governed by machine >>>>>>> type to provide consistency between devices? Should each "isolated" >>>>>>> bus be in a separate root complex? Thanks, >>>>>> >>>>>> There is a separate issue in this patch series regarding isolating the >>>>>> address space where BAR accesses are made (i.e. the global >>>>>> address_space_memory/io). When one process hosts multiple vfio-user >>>>>> server instances (e.g. a software-defined network switch with multiple >>>>>> ethernet devices) then each instance needs isolated memory and io address >>>>>> spaces so that vfio-user clients don't cause collisions when they map >>>>>> BARs to the same address. >>>>>> >>>>>> I think the the separate root complex idea is a good solution. This >>>>>> patch series takes a different approach by adding the concept of >>>>>> isolated address spaces into hw/pci/. >>>>> >>>>> This all still seems pretty sketchy, BARs cannot overlap within the >>>>> same vCPU address space, perhaps with the exception of when they're >>>>> being sized, but DMA should be disabled during sizing. >>>>> >>>>> Devices within the same VM context with identical BARs would need to >>>>> operate in different address spaces. For example a translation offset >>>>> in the vCPU address space would allow unique addressing to the devices, >>>>> perhaps using the translation offset bits to address a root complex and >>>>> masking those bits for downstream transactions. >>>>> >>>>> In general, the device simply operates in an address space, ie. an >>>>> IOVA. When a mapping is made within that address space, we perform a >>>>> translation as necessary to generate a guest physical address. The >>>>> IOVA itself is only meaningful within the context of the address space, >>>>> there is no requirement or expectation for it to be globally unique. >>>>> >>>>> If the vfio-user server is making some sort of requirement that IOVAs >>>>> are unique across all devices, that seems very, very wrong. Thanks, >>>> >>>> Yes, BARs and IOVAs don't need to be unique across all devices. >>>> >>>> The issue is that there can be as many guest physical address spaces as >>>> there are vfio-user clients connected, so per-client isolated address >>>> spaces are required. This patch series has a solution to that problem >>>> with the new pci_isol_as_mem/io() API. >>> >>> Sorry, this still doesn't follow for me. A server that hosts multiple >>> devices across many VMs (I'm not sure if you're referring to the device >>> or the VM as a client) needs to deal with different address spaces per >>> device. The server needs to be able to uniquely identify every DMA, >>> which must be part of the interface protocol. But I don't see how that >>> imposes a requirement of an isolated address space. If we want the >>> device isolated because we don't trust the server, that's where an IOMMU >>> provides per device isolation. What is the restriction of the >>> per-client isolated address space and why do we need it? The server >>> needing to support multiple clients is not a sufficient answer to >>> impose new PCI bus types with an implicit restriction on the VM. >> >> Hi Alex, >> >> I believe there are two separate problems with running PCI devices in >> the vfio-user server. The first one is concerning memory isolation and >> second one is vectoring of BAR accesses (as explained below). >> >> In our previous patches (v3), we used an IOMMU to isolate memory >> spaces. But we still had trouble with the vectoring. So we implemented >> separate address spaces for each PCIBus to tackle both problems >> simultaneously, based on the feedback we got. >> >> The following gives an overview of issues concerning vectoring of >> BAR accesses. >> >> The device’s BAR regions are mapped into the guest physical address >> space. The guest writes the guest PA of each BAR into the device’s BAR >> registers. To access the BAR regions of the device, QEMU uses >> address_space_rw() which vectors the physical address access to the >> device BAR region handlers. > > The guest physical address written to the BAR is irrelevant from the > device perspective, this only serves to assign the BAR an offset within > the address_space_mem, which is used by the vCPU (and possibly other > devices depending on their address space). There is no reason for the > device itself to care about this address.
Thank you for the explanation, Alex! The confusion at my part is whether we are inside the device already when the server receives a request to access BAR region of a device. Based on your explanation, I get that your view is the BAR access request has propagated into the device already, whereas I was under the impression that the request is still on the CPU side of the PCI root complex. Your view makes sense to me - once the BAR access request reaches the client (on the other side), we could consider that the request has reached the device. On a separate note, if devices don’t care about the values in BAR registers, why do the default PCI config handlers intercept and map the BAR region into address_space_mem? (pci_default_write_config() -> pci_update_mappings()) Thank you! -- Jag > >> The PCIBus data structure already has address_space_mem and >> address_space_io to contain the BAR regions of devices attached >> to it. I understand that these two PCIBus members form the >> PCI address space. > > These are the CPU address spaces. When there's no IOMMU, the PCI bus is > identity mapped to the CPU address space. When there is an IOMMU, the > device address space is determined by the granularity of the IOMMU and > may be entirely separate from address_space_mem. > > I/O port space is always the identity mapped CPU address space unless > sparse translations are used to create multiple I/O port spaces (not > implemented). I/O port space is only accessed by the CPU, there are no > device initiated I/O port transactions, so the address space relative > to the device is irrelevant. > >> Typically, the machines map the PCI address space into the system address >> space. For example, pc_pci_as_mapping_init() does this for ‘pc' machine >> types. >> As such, there is a 1:1 mapping between system address space and PCI address >> space of the root bus. Since all the PCI devices in the machine are assigned >> to >> the same VM, we could map the PCI address space of all PCI buses to the same >> system address space. > > "Typically" only if we're restricted to the "pc", ie. i440FX, machine > type since it doesn't support a vIOMMU. There's no reason to focus on > the identity map case versus the vIOMMU case. > >> Whereas in the case of vfio-user, the devices running in the server could >> belong to different VMs. Therefore, along with the physical address, we would >> need to know the address space that the device belongs for >> address_space_rw() to successfully vector BAR accesses into the PCI device. > > But as far as device initiated transactions, there is only one address > space for a given device, it's either address_space_mem or one provided > by the vIOMMU and pci_device_iommu_address_space() tells us that > address space. Furthermore, the device never operates on a "physical > address", it only ever operates on an IOVA, ie. an offset within the > address space assigned to the device. The IOVA should be considered > arbitrary relative to mappings in any other address spaces. > > Device initiated transactions operate on an IOVA within the (single) > address space to which the device is assigned. Any attempt to do > otherwise violates the isolation put in place by things like vIOMMUs > and ought to be considered a security concern, especially for a device > serviced by an external process. Thanks, > > Alex >