Adding Knut to CC as he particularly looked into and fixed the bridging issues or the vtd emulation. I will have to refresh my memories first.
Jan On 2015-01-30 06:45, Benjamin Herrenschmidt wrote: > Hi folks ! > > > I've looked at the intel iommu code to try to figure out how to properly > implement a Power8 "native" iommu model and encountered some issues. > > Today "pseries" ppc machine is paravirtualized and so uses a pretty > simplistic iommu model that essentially has one address space per host > bridge. > > However, the real HW model I'm working on is closer to Intel in that we > have various tables walked by HW that match an originator RID to what > we called a "PE" (Partitionable Endpoint) to which corresponds an > address space. > > So on a given domain, individual functions can have different iommu > address spaces & translation structures, or group of devices etc... > which can all be configured dynamically by the guest OS. This is similar > as far as I understand things to the Intel model though the details of > the implementation are very different. > > So I implemented something along the lines of what you guys did for q35 > and intel_iommu, and quickly discovered that it doesn't work, which > makes me wonder whether the intel stuff in qemu actually works, or > rather, does it work when adding bridges & switches into the picture. > > I basically have two problems but they are somewhat related. Firstly > the way the intel code works is that it creates lazily context > structures that contain the address space, and get associated with > devices when pci_device_iommu_address_space() is called which in > turns calls the bridge iommu_fn which performs the association. > > The first problem is that the association is done based on bus/dev/fn > of the device... at a time where bus numbers have not been assigned yet. > > In fact, the bus numbers are assigned dynamically by SW, the BIOS > typically, but the OS can renumber things and it's bogus to assume thus > that the RID (bus/dev/fn) of a PCI device/function is fixed. However > that's exactly what the code does as it calls > pci_device_iommu_address_space() once at device instanciation time in > qemu, even before SW had a chance to assign anything. > > So as far as I can tell, things will work as long as you are on bus 0 > and there is no bridge, otherwise, it's broken by design, unless I'm > missing something... > > I've hacked that locally in my code by using the PCIBus * pointer > instead of the bus number to match the device to the iommu context. > > The second problem is that pci_device_iommu_address_space(), as it walks > up the hierarchy to find the iommu_fn, drops the original device > information. That means that if a device is below a switch or a p2p > bridge of some sort, once you reach the host bridge top level bus, all > we know is the bus & devfn of the last p2p entity along the path, we > lose the original bus & devfn information. > > This is incorrect for that sort of iommu, at least while in the PCIe > domain, as the original RID is carried along with DMA transactions and i > thus needed to properly associate the device/function with a context. > > One fix could be to populate the iommu_fn of every bus down the food > chain but that's fairly cumbersome... unless we make the PCI bridges by > default "inherit" from their parent iommu_fn. > > Here, I've done a hack locally to keep the original device information > in pci_device_iommu_address_space() but it's not a proper way to do it > either, ultimately, each bridge need to be able to tell whether it > properly forwards the RID information or not, so the bridge itself need > to have some attribute to control that. Typically a PCIe switch or root > complex will always forward the full RID... while most PCI-E -> PCI-X > bridges are busted in that regard. Worse, some bridges forward *some* > bits (partial RID) which is even more broken but I don't know if we can > or even care about simulating it. Thankfully most PCI-X or PCI bridges > will behave properly and make it look like all DMAs are coming from the > bridge itself. > > What do you guys reckon is the right approach for both problems ? > > Cheers, > Ben. > > -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux