On Mon, Sep 30, 2024 at 10:45:31AM +0000, Shameerali Kolothum Thodi wrote: > > -----Original Message----- > > From: Nicolin Chen <nicol...@nvidia.com> > > Sent: Thursday, September 5, 2024 9:37 PM > > To: Shameerali Kolothum Thodi <shameerali.kolothum.th...@huawei.com> > > Cc: Eric Auger <eric.au...@redhat.com>; Mostafa Saleh > > <smost...@google.com>; qemu-...@nongnu.org; qemu- > > de...@nongnu.org; Peter Maydell <peter.mayd...@linaro.org>; Jason > > Gunthorpe <j...@nvidia.com>; Jean-Philippe Brucker <jean- > > phili...@linaro.org>; Moritz Fischer <m...@kernel.org>; Michael Shavit > > <msha...@google.com>; Andrea Bolognani <abolo...@redhat.com>; > > Michael S. Tsirkin <m...@redhat.com>; Peter Xu <pet...@redhat.com> > > Subject: Re: nested-smmuv3 topic, Sep 2024 > > > > Hi Shameer, > > > > Thanks for the reply! > > > > On Thu, Sep 05, 2024 at 12:55:52PM +0000, Shameerali Kolothum Thodi > > wrote: > > > > The main takeaway from the discussion is to > > > > 1) Turn the vSMMU module into a pluggable one, like intel-iommu > > > > 2) Move the per-SMMU pxb bus and device auto-assign into libvirt > > > > > > > > Apart from the multi-vSMMU thing, there's basic nesting series: > > > > 0) Keep updating to the latest kernel uAPIs to support nesting > > > > > > By this you mean the old HWPT based nested-smmuv3 support? > > > > HWPT + vIOMMU. The for-viommu/virq branches that I shared in my > > kernel series have those changes. Invalidations is done via the > > vIOMMU infrastructure. > > > > > > > > > > I was trying to do all these three, but apparently too ambitious. > > > > The kernel side of work is still taking a lot of my bandwidth. So > > > > far I had almost-zero progress on task (1) and completely-zero on > > > > task (2). > > > > > > > > <-- Help Needed ---> > > > > So, I'm wondering if anyone(s) might have some extra bandwidth in > > > > the following months helping these two tasks, either of which can > > > > be a standalone project I think. > > > > > > > > For task (0), I think I can keep updating the uAPI part, although > > > > it'd need some help for reviews, which I was hoping to occur after > > > > Intel sends the QEMU nesting backend patches. Once we know how big > > > > the rework is going to be, we may need to borrow some help at that > > > > point once again.. > > > > > > I might have some bandwidth starting October and can take a look at > > > task 1 above. I haven't gone through the VIOMMU API model completely > > > yet and plan to do that soon. > > > > I had an initial look at this and also had some discussions with Eric at KVM > Forum(Thanks Eric!).
Wow, thank both of you! > Going through the code, is it ok to introduce a "pci-bus" for the proposed > nested SMMUv3 device which will create the link between the SMMUv3 dev > and the associated root complex(pxb-pcie). > > Something like below, > > -device pxb-pcie,id=pcie.1,bus_nr=2,bus=pcie.0 \ > -device arm-nested-smmuv3,pci-bus=pcie.1 \ > -device pcie-root-port,id=pcie.port1,bus=pcie.1 \ > -device vfio-pci,host=0000:75:00.1, bus=pcie.port1 \ > ... > -device pxb-pcie,id=pcie.2,bus_nr=8,bus=pcie.0 \ > -device arm-nested-smmuv3,pci-bus=pcie.2 \ > -device pcie-root-port,id=pcie.port2,bus=pcie.2 \ > -device vfio-pci,host=0000:75:00.2, bus=pcie.port2 \ > > This way we can invoke the pci_setup_iommu() with the > right PCIBus during the nested SMMUv3 device realize fn. > > Please let me know, if this works/scales with all the use cases we have. That looks nice to me. Hopefully, IORT or Device Tree would be easy to tie to the corresponding pci-bus as well.. > Also Eric mentioned that when he initially added the support for SMMUv3, > the initial approach was -device based solution, but later changed to machine > option instead based on review comments. I managed to find the link where > this change was proposed(by Peter), > > https://lore.kernel.org/all/cafeaca_h+srawnvhezc48es11n6dc9cyewtl44tperipbo+...@mail.gmail.com/ > > I hope the use cases we now have make it reasonable to introduce a "-device > arm-nested-smmuv3" model. > Please let me know if there are still objections to going this way. I assume so. With multiple smmuv3 devices in the VM, we would need this kinda flexibility to create them. And FYI, I also found some resource in NVIDIA who will help me on the QEMU workload, including our remaining task -- libvirt. I'll align with them in the days ahead, and will keep all of us updated after. Thanks! Nicolin