> -----Original Message----- > From: Peter Maydell <peter.mayd...@linaro.org> > Sent: Friday, May 9, 2025 12:44 PM > To: Daniel P. Berrangé <berra...@redhat.com> > Cc: Shameerali Kolothum Thodi > <shameerali.kolothum.th...@huawei.com>; Donald Dutile > <ddut...@redhat.com>; Markus Armbruster <arm...@redhat.com>; > Shameer Kolothum via <qemu-devel@nongnu.org>; qemu- > a...@nongnu.org; eric.au...@redhat.com; j...@nvidia.com; > nicol...@nvidia.com; nath...@nvidia.com; mo...@nvidia.com; > smost...@google.com; Linuxarm <linux...@huawei.com>; Wangzhou (B) > <wangzh...@hisilicon.com>; jiangkunkun <jiangkun...@huawei.com>; > Jonathan Cameron <jonathan.came...@huawei.com>; > zhangfei....@linaro.org > Subject: Re: [PATCH v2 1/6] hw/arm/smmuv3: Add support to associate a > PCIe RC > > On Fri, 9 May 2025 at 11:46, Daniel P. Berrangé <berra...@redhat.com> > wrote: > > > > On Fri, May 09, 2025 at 11:37:14AM +0100, Peter Maydell wrote: > > > (I want to start here by saying that I appreciate that I'm > > > coming in without having read the previous discussion, so > > > this is kind of going back over ground you've already > > > been through.) > > > > > > I agree that rather than having an entirely separate "SMMU with > > > acceleration" it would be better to have it be a property on > > > the SMMU device. But why do we need it to be user created? > > > Making it user-created leads into all kinds of tricky areas > > > mostly surrounding the fact that QEMU right now simply doesn't > > > support having user-created sysbus devices and other kinds > > > of device with complex wiring-up. -device is really intended > > > for "this is a model of a device that in real hardware is > > > pluggable and has basically one connection, like a PCI card > > > has a PCI-slot". > > > > In terms of "why does it need to be user created" - the goal was to expose > > multiple SMMUs to the guest, each associated with a separate physical > SMMU. > > IIUC, each physical NUMA node would have its own SMMU. > > > > So configuring a guest VM will require creating multiple PXBs, one for > > each virtual NUMA node, and then creating SMMUs for each PXB. > > > > Since there was a need for the user to create SMMUs for the PXBs, the > > question was then raised, why shouldn't the default SMMU also be > > user creatable in the same way, so that mgmt apps like libvirt have > > a single way to configure the SMMUs with -device. > > Sure, the default "there's just one pci bridge and either > no SMMU or one SMMU" isn't that special. But we don't > have good infrastructure for creating sysbus devices on > the command line, whether it's the default SMMU or the > extra SMMUs or a UART or anything else. I guess the > dynamic_sysbus stuff works, but I've never really liked it > (it's basically "the board will magically do the right thing", > and to some extent it's working around the way we have > very patchy support for "I want to configure a board the > device created rather than configuring a device I am > creating").
I agree that having users create sysbus devices for a machine isn't ideal. However, in this case, the association between SMMUv3 and PCIe makes the topology difficult to represent cleanly in any other way. There was a previous attempt by Nicolin [0], where the virt machine would probe all host physical SMMUv3 instances and automatically create the corresponding arm-smmuv3 devices along with associated pxb-pcie bridges. The main concern raised with that approach was that QEMU shouldn't implicitly construct PCIe topology behind the scenes especially when it can conflict with how libvirt expects to define the PCIe hierarchy [1]. Another suggestion was to add an iommu=<id> option to the PCIe bridge: -device pxb-pcie,iommu=<id>,... With this, the virt machine would create a new SMMUv3 device whenever a new iommu ID is encountered. But this also has limitations, as we want to support additional options like accel or vcmdq on the created SMMUv3 devices. The most flexible way to express such configurations is directly via the device model, e.g.: -device arm-smmuv3,accel=on,vcmdq=on, I think this is also consistent with how the intel-iommu device works today. Please let me know suggestions if there's a better solution here. But if the current approach — while not ideal — is acceptable, I can rework the series with the review comments and post a v3 soon. Please let me know. Thanks, Shameer [0] https://lore.kernel.org/qemu-devel/cover.1719361174.git.nicol...@nvidia.com/ [1] https://lore.kernel.org/qemu-devel/cabjz62pvt9h9776djxkpyq_mf+auj-0yhndi-osaqcqrsrg...@mail.gmail.com/