> -----Original Message----- > From: Donald Dutile <ddut...@redhat.com> > Sent: Thursday, May 8, 2025 2:45 PM > To: Shameerali Kolothum Thodi > <shameerali.kolothum.th...@huawei.com>; Markus Armbruster > <arm...@redhat.com> > Cc: Shameer Kolothum via <qemu-devel@nongnu.org>; qemu- > a...@nongnu.org; eric.au...@redhat.com; peter.mayd...@linaro.org; > j...@nvidia.com; nicol...@nvidia.com; berra...@redhat.com; > nath...@nvidia.com; mo...@nvidia.com; smost...@google.com; Linuxarm > <linux...@huawei.com>; Wangzhou (B) <wangzh...@hisilicon.com>; > jiangkunkun <jiangkun...@huawei.com>; Jonathan Cameron > <jonathan.came...@huawei.com>; zhangfei....@linaro.org > Subject: Re: [PATCH v2 1/6] hw/arm/smmuv3: Add support to associate a > PCIe RC > > > > On 5/7/25 4:50 AM, Shameerali Kolothum Thodi wrote: > > > > > >> -----Original Message----- > >> From: Markus Armbruster <arm...@redhat.com> > >> Sent: Wednesday, May 7, 2025 8:17 AM > >> To: Donald Dutile <ddut...@redhat.com> > >> Cc: Shameer Kolothum via <qemu-devel@nongnu.org>; qemu- > >> a...@nongnu.org; Shameerali Kolothum Thodi > >> <shameerali.kolothum.th...@huawei.com>; eric.au...@redhat.com; > >> peter.mayd...@linaro.org; j...@nvidia.com; nicol...@nvidia.com; > >> berra...@redhat.com; nath...@nvidia.com; mo...@nvidia.com; > >> smost...@google.com; Linuxarm <linux...@huawei.com>; Wangzhou > (B) > >> <wangzh...@hisilicon.com>; jiangkunkun <jiangkun...@huawei.com>; > >> Jonathan Cameron <jonathan.came...@huawei.com>; > >> zhangfei....@linaro.org > >> Subject: Re: [PATCH v2 1/6] hw/arm/smmuv3: Add support to associate a > >> PCIe RC > >> > >> Donald Dutile <ddut...@redhat.com> writes: > >> > >> [...] > >> > >>> In this series, an iommu/smmu needs to be placed -BETWEEN- a sysbus > >> and a PCIe-tree, > >>> or step-wise, plug an smmuv3 into a sysbus, and a pcie tree/domain/RC > >> into an SMMUv3. > >> > >> RC = root complex? > > > > Yes. > > > +1. > > >> > >>> So, an smmu needs to be associated with a bus (tree), i.e., pcie.0, > pcie.1... > >>> One could model it as a PCIe device, attached at the pcie-RC ... but > that's > >> not how it's modelled in ARM hw. > >> > >> Physical ARM hardware? > >> > yes, physical hw. > > >> Assuming the virtual devices and buses we're discussing model physical > >> devices and buses: > >> > >> * What are the physical devices of interest? > >> > >> * How are they wired together? Which of the wires are buses, in > >> particular PCI buses? > > > > SMMUv3 is a platform device and its placement in a system is typically as > below > > for PCI devices, > > > > +------------------+ > > | PCIe Devices | > > +------------------+ > > | > > v > > +-------------+ +---------------+ > > | PCIe RC A |<---->| Interconnect | > > +-------------+ +---------------+ > > | > > | > > +------v---+ > > | SMMUv3.A | > > | (IOMMU) | > > +----+-----+ > > | > > v > > +-------+--------+ > > | System RAM | > > +----------------+ > > > > This patch is attempting to establish that association between the PCIe > RC and > > the SMMUv3 device so that Qemu can build the ACPI tables/DT iommu > mappings > > for the SMMUv3 device. > > > I would refer to the ARM SMMU spec, Figure 2.3 in the G.a version, where > it's slightly different; more like:
That's right. The spec does indeed cover all possible scenarios, whereas my earlier comments were focused more specifically on the common case of systems using SMMUv3 with PCIe devices. Currently, QEMU doesn't support non-PCI devices with SMMUv3, neither the more complex distributed SMMU cases you have described below. And this series doesn't aim to add those supports either. If needed, we can treat those as a separate efforts—similar to what was attempted in [1]. That said, agree that the design choices we make now should not hinder adding such support in the future. And as far as I can see, nothing in this series would prevent that and if anything, the new device type SMMUv3 model actually makes it easier to support those. > +------------------+ > | PCIe Devices | (one device, unless a PCIe switch is btwn the RC & > 'Devices'; > +------------------+ or, see more typical expansion below) > | > +-------------+ > | PCIe RC A | > +-------------+ > | > +------v---+ +-----------------------------------+ > | SMMUv3.A | | Wide assortment of other platform | > | (IOMMU) | | devices not using SMMU | > +----+-----+ +-----------------------------------+ > | | | | > +------+----------------------+---+---+-+ > | System Interconnect | > +---------------------------------------+ > | > +-------+--------+ +-----+-------------+ > | System RAM |<--->| CPU (NUMA socket) | > +----------------+ +-------------------+ > > In fact, the PCIe can be quite complex with PCIe bridges, and multiple Root > Ports (RP's), > and multiple SMMU's: > > +--------------+ +--------------+ +--------------+ > | PCIe Device | | PCIe Device | | PCIe Device | > +--------------+ +--------------+ +--------------+ > | | | <--- PCIe bus > +----------+ +----------+ +----------+ > | PCIe RP | | PCIe RP | | PCIe RP | <- may be PCI > Bridge, may > not > +----------+ +----------+ +----------+ > | | | > +----------+ +----------+ +----------+ > | SMMU | | SMMU | | SMMU | > +----------+ +----------+ +----------+ > | | | <- may be a bus, may > not(hidden from OS) > +------------------+------------------+ > | > +--------------------------+ > | PCI RC A | > +--------------------------+ > > where PCIe RP's could be represented (even virtually) in -hw- > as a PCIe bridge, each downstream being a different PCIe bus under > a single PCIe RC (A, in above pic) -domain-. > ... or the RPs don't have to have a PCIe bridge, and look like > 'just an RP' that provides a PCIe (pt-to-pt, serial) bus, provided > by a PCIe RC. ... the PCIe architecture allows both, and I've seen > both implementations in hw (at least from an lspci perspective). > > You can see the above hw implementation by doing an lspci on most > PCIe systems (definitely common on x86), where the RP's are represented > by 'PCIe bridge' elements -- and lots of them. > In real hw, these RP's effectively become (multiple) up & downstream > transaction queues > (which implement PCI ordering, and deadlock avoidance). > SMMUs are effectively 'inserted' in the (upstream) queue path(s). > > The important take away above: the SMMU can be RP &/or device-specific - > - they > do not have to be bound to an entire PCIe domain ... the *fourth* part of > an lspci output for a PCIe device: Domain:Bus:Device.Function. > This is the case for INTEL & AMD IOMMUs ... and why the ACPI tables have > to describe which devices (busses often) are associated with which > SMMU(in IORT) > or IOMMU(DMAR/IVRS's for INTEL/AMD IOMMU). > > The final take away: the (QEMU) SMMU/IOMMU must be associated with a > PCIe bus > OR, the format has to be something like: > -device smmuv3, id=smmuv3.1 > -device <blah>, smmu=smmuv3.1 Agree. For PCie devices with SMMUv3 we need to associate it with a PCIe bus and for non-pci cases probably needs a per device association. > where the device <-> SMMU (or if extended to x86, iommu) associativity is > set w/o bus associativity. > It'd be far easier to tag an entire bus with an SMMU/IOMMU, than a per- > device format, esp. if > one has lots of PCIe devices in their model; actually, even if they only have > one bus and 8 devices > (common), it'd be nice if a single iommu/smmu<->bus-num associativity > could be set. > > Oh, one final note: it is possible, although I haven't seen it done this way > yet, > that an SMMU could be -in- a PCIe switch (further distributing SMMU > functionality > across a large PCIe subsystem) and it -could- be a PCIe device in the switch, > btwn the upstream and downstream bridges -- actually doing the SMMU > xlations > at that layer..... for QEMU & IORT, it's associated with a PCIe bus. > But, if done correctly, that shouldn't matter -- in the example you gave wrt > serial, > it would be a new device, using common smmu core: smmuv3-pcie. > [Note: AMD actually identifies it's IOMMU as a PCIe device in an RC ... but > still uses > the ACPI tables to configure it to the OS.. so the PCIe-device is > basically > a > device w/o a PCIe driver. AMD just went through hoops dealing with > MS > and AMD-IOMMU-identification via PCIe.] > > So, stepping back, and looking at a broad(er) SMMU -or- IOMMU QEMU > perspective, > I would think this type of format would be best: > > - bus pcie, id=pcie.<num> > - device iommu=[intel_iommu|smmuv3|amd_iommu], bus=[sysbus | > pcie.<num>], id=iommu.<num> > [Yes, I'm sticking with 'iommu' as the generic naming... everyone thinks of > device SMMUs as IOMMUs, > and QEMU should have a more arch-agnostic naming of these system > functions. ] Ok. But to circle back to what originally started this discussion—how important is it to rely on the default "bus" in this case? As Markus pointed out, SMMUv3 is a platform device on the sysbus, so its default bus type can’t point to something like PCIe. QEMU doesn’t currently support that. The main motivation for using the default "bus" so far has been to have better compatibility with libvirt. Would libvirt be flexible enough if we switched to using something like a "primary-bus" property instead? -device arm-smmuv3,primary-bus=pcie.0 -device virtio-net-pci,bus=pcie.0 -device pxb-pcie,id=pcie.1,bus_nr=2 -device arm-smmuv3,primary-bus=pcie.1 ... Please let me know. Thanks, Shameer [1] https://lore.kernel.org/all/20210902081429.140293-1-chunming_li1...@163.com/