On Mon, 2014-08-18 at 20:50 +0200, Jan Kiszka wrote: > On 2014-08-18 18:34, Knut Omang wrote: > > On Sat, 2014-08-16 at 10:47 +0200, Jan Kiszka wrote: > >> On 2014-08-16 10:45, Jan Kiszka wrote: > >>> On 2014-08-16 09:54, Knut Omang wrote: > >>>> On Fri, 2014-08-15 at 19:37 +0800, Le Tan wrote: > >>>>> Hi Knut, > >>>>> > >>>>> 2014-08-15 19:15 GMT+08:00 Knut Omang <knut.om...@oracle.com>: > >>>>>> On Fri, 2014-08-15 at 06:42 +0200, Knut Omang wrote: > >>>>>>> On Thu, 2014-08-14 at 14:10 +0200, Jan Kiszka wrote: > >>>>>>>> On 2014-08-14 13:15, Michael S. Tsirkin wrote: > >>>>>>>>> On Mon, Aug 11, 2014 at 03:04:57PM +0800, Le Tan wrote: > >>>>>>>>>> Hi, > >>>>>>>>>> > >>>>>>>>>> These patches are intended to introduce Intel IOMMU (VT-d) > >>>>>>>>>> emulation to q35 > >>>>>>>>>> chipset. The major job in these patches is to add support for > >>>>>>>>>> emulating Intel > >>>>>>>>>> IOMMU according to the VT-d specification, including basic > >>>>>>>>>> responses to CSRs > >>>>>>>>>> accesses, the logics of DMAR (DMA remapping) and DMA memory address > >>>>>>>>>> translations. > >>>>>>>>> > >>>>>>>>> Thanks! > >>>>>>>>> Looks very good overall, I noted some coding style issues - I didn't > >>>>>>>>> bother reporting each issue in every place where it appears - > >>>>>>>>> reported > >>>>>>>>> each issue once only, so please find and fix all instances of each > >>>>>>>>> issue. > >>>>>>>> > >>>>>>>> BTW, because I was in urgent need for virtual test environment for > >>>>>>>> Jailhouse, I hacked interrupt remapping on top of Le's patches: > >>>>>>>> > >>>>>>>> http://git.kiszka.org/?p=qemu.git;a=shortlog;h=refs/heads/queues/vtd-intremap > >>>>>>>> > >>>>>>>> The approach likely needs further discussions and refinements but it > >>>>>>>> already allows me to work on top with our hypervisor, and also Linux. > >>>>>>>> You can see from the last commit that Le's work made it pretty easy > >>>>>>>> to > >>>>>>>> build this on top. > >>>>>>> > >>>>>>> Le, > >>>>>>> > >>>>>>> I have tried Jan's branch with my device setup which consists of a > >>>>>>> minimal q35 setup, an ioh3420 root port (specified as -device > >>>>>>> ioh3420,slot=0 ) and a pcie device plugged into that root port, which > >>>>>>> gives the following lscpi -t: > >>>>>>> > >>>>>>> -[0000:00]-+-00.0 > >>>>>>> +-01.0 > >>>>>>> +-02.0 > >>>>>>> +-03.0-[01]----00.0 > >>>>>>> +-04.0 > >>>>>>> +-1f.0 > >>>>>>> +-1f.2 > >>>>>>> \-1f.3 > >>>>>>> > >>>>>>> All seems to work beautifully (I see the ISA bridge happily receive > >>>>>>> translations) until the first DMA from my device model (at 1:00.0) > >>>>>>> arrives, at which point I get: > >>>>>>> > >>>>>>> [ 1663.732413] dmar: DMAR:[DMA Write] Request device [00:03.0] fault > >>>>>>> addr fffa0000 > >>>>>>> [ 1663.732413] DMAR:[fault reason 02] Present bit in context entry is > >>>>>>> clear > >>>>>>> > >>>>>>> I would have expected request device 01:00.0 for this. > >>>>>>> It is not clear to me yet if this is a weakness of the implementation > >>>>>>> of > >>>>>>> ioh3420 or the iommu. Just wanted to let you know right away in case > >>>>>>> you > >>>>>>> can shed some light to it or it is an easy fix, > >>>>>>> > >>>>>>> The device uses pci_dma_rw with itself as device pointer. > >>>>>> > >>>>>> To verify my hypothesis: with this rude hack my device now works much > >>>>>> better: > >>>>>> > >>>>>> @@ -774,6 +780,8 @@ static void iommu_translate(VTDAddressSpace > >>>>>> *vtd_as, > >>>>>> int bus_num, int devfn, > >>>>>> is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD; > >>>>>> } else { > >>>>>> ret_fr = dev_to_context_entry(s, bus_num, devfn, &ce); > >>>>>> + if (ret_fr) > >>>>>> + ret_fr = dev_to_context_entry(s, 1, 0, &ce); > >>>>>> is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD; > >>>>>> if (ret_fr) { > >>>>>> ret_fr = -ret_fr; > >>>>>> > >>>>>> Looking at how things look on hardware, multiple devices often receive > >>>>>> overlapping DMA address ranges for different physical addresses. > >>>>>> > >>>>>> So if I understand the way this works, every requester ID would also > >>>>>> need to have it's own unique VTDAddressSpace, as each pci > >>>>>> device/function sees a unique DMA address space.. > >>>>> > >>>>> ioh3420 is a pcie-to-pcie bridge, right? > >>>> > >>>> Yes. > >>>> > >>>>> In my opinion, each pci-e > >>>>> device behind the pcie-to-pcie bridge can be assigned individually. > >>>>> For now I added the VT-d to q35 by just adding it to the root pci bus. > >>>>> You can see here in q35.c: > >>>>> pci_setup_iommu(pci_bus, q35_host_dma_iommu, mch->iommu); > >>>>> So if we add a pcie-to-pcie bridge, we may have to call the > >>>>> pci_setup_iommu() for that new bus. I don't know where to hook into > >>>>> this now. :) If you know the mechanism behind that, you can try to add > >>>>> that for the new bus. (I will dive into this after the clean up.) > >>>>> What do you think? > >>>> > >>>> Thanks for the quick answer, that helped a lot! > >>>> > >>>> Looking into the details here I realize it is slightly more complicated: > >>>> secondary buses are enumerated after device instantiation, as part of > >>>> the host PCI enumeration, so if I add a similar setup call in the bridge > >>>> setup, it will be called for a new device long before it has received > >>>> it's bus number from the OS (via config[PCI_SECONDARY_BUS] ) > >>>> > >>>> I agree that the lookup function for contexts needs to be as efficient > >>>> as possible so the simple <busno,defvn> lookup key may be the best > >>>> solution but then the address_spaces table cannot be populated with the > >>>> secondary bus entries before it receives a nonzero != 255 bus number, > >>>> eg. along the lines of this: > >>>> > >>>> diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c > >>>> index 4becdc1..d9a8c23 100644 > >>>> --- a/hw/pci/pci_bridge.c > >>>> +++ b/hw/pci/pci_bridge.c > >>>> @@ -265,6 +265,12 @@ void pci_bridge_write_config(PCIDevice *d, > >>>> pci_bridge_update_mappings(s); > >>>> } > >>>> > >>>> + if (ranges_overlap(address, len, PCI_SECONDARY_BUS, 1)) { > >>>> + int bus_num = pci_bus_num(&s->sec_bus); > >>>> + if (bus_num != 0xff && bus_num != 0x00) > >>>> + <handle bus number change> > >>>> + } > >>>> + > >>>> newctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL); > >>>> if (~oldctl & newctl & PCI_BRIDGE_CTL_BUS_RESET) { > >>>> /* Trigger hot reset on 0->1 transition. */ > >>>> > >>>> but it is getting complicated... > >>>> Thoughts? > >>> > >>> Point to the PCI bus from VTDAddressSpace instead of storing the bus_num > >>> there? > >> > >> Also, each PCIe bus should hold an array of VTDAddressSpaces, instead of > >> the IntelIOMMUState. > > > > Thanks - that got me going - after some playing around with the data > > structures I ended up with these patches (based on top of Jan's > > vtd-intremap branch): > > Are you depending on interrupt remapping? If not, my patches are a bit > hacky and may cause their own issues if you are unlucky. It does not depend directly but interprets a NULL PciDevice pointer as the special bus number (0xff) for non-pci devices, so I have tried to take heights for that - I can rebase if so desired, but I would like to see the interrupt remapping emerge as well ;-)
Knut > > https://github.com/knuto/qemu/tree/vtd_patches > > > > In essence I ended up replacing the address_spaces[] array in > > IntelIOMMUState with a pointer dma_as in PCIDevice and a QLIST linking > > the VTDAddressSpace objects to serve vtd_context_device_invalidate, then > > use pci_bus_num() whenever a bus number is requires, except for the > > "special" bus needed by the interrupt remapping code. > > > > To achieve this, I had to change the signature of the pci_setup_iommu > > function (the first commit). > > > > With this I am now seeing translations for the device in the virtual > > root port, but I think this should work equally well with other PCI > > bridges. > > Good to know that we are on the right path! Maybe this can be integrated > in the next posting round so that we do not need any bridge exceptions > due to incompatibility. > > Jan > >