On 30/01/2025 11:12 am, Jan Beulich wrote: > In order for amd_iommu_detect_one_acpi()'s call to pci_ro_device() to > have permanent effect, pci_segments_init() needs to be called ahead of > making it there. Without this we're losing segment 0's r/o map, and thus > we're losing write-protection of the PCI devices representing IOMMUs. > Which in turn means that half-way recent Linux Dom0 will, as it boots, > turn off MSI on these devices, thus preventing any IOMMU events (faults > in particular) from being reported on pre-x2APIC hardware. > > As the acpi_iommu_init() invocation was moved ahead of > acpi_mmcfg_init()'s by the offending commit, move the call to > pci_segments_init() accordingly. > > Fixes: 3950f2485bbc ("x86/x2APIC: defer probe until after IOMMU ACPI table > parsing") > Signed-off-by: Jan Beulich <jbeul...@suse.com> > --- > Of course it would have been quite a bit easier to notice this issue if > radix_tree_insert() wouldn't work fine ahead of radix_tree_init() being > invoked for a given radix tree, when the index inserted at is 0. > > While hunting down various other dead paths to actually find the root > cause, it occurred to me that it's probably not a good idea to fully > disallow config space writes for r/o devices: Dom0 won't be able to size > their BARs (luckily the IOMMU "devices" don't have any, but e.g. serial > ones generally will have at least one), for example. Without being able > to size BARs it also will likely be unable to correctly account for the > address space taken by these BARs. However, outside of vPCI it's not > really clear to me how we could reasonably emulate such BAR sizing > writes - we can't, after all, allow Dom0 to actually write to the > underlying physical registers, yet we don't intercept reads (i.e. we > can't mimic expected behavior then). > > --- a/xen/arch/x86/x86_64/mmconfig-shared.c > +++ b/xen/arch/x86/x86_64/mmconfig-shared.c > @@ -402,8 +402,6 @@ void __init acpi_mmcfg_init(void) > { > bool valid = true; > > - pci_segments_init(); > - > /* MMCONFIG disabled */ > if ((pci_probe & PCI_PROBE_MMCONF) == 0) > return; > --- a/xen/drivers/passthrough/x86/iommu.c > +++ b/xen/drivers/passthrough/x86/iommu.c > @@ -55,6 +55,8 @@ void __init acpi_iommu_init(void) > { > int ret = -ENODEV; > > + pci_segments_init(); > + > if ( !iommu_enable && !iommu_intremap ) > return; > >
I can't help but feel this is taking a bad problem and not making it any better. pci_segments_init() is even less (obviously) relevant in apci_iommu_init() than it is in acpi_mmcfg_init(), and given the fine-grain Kconfig-ing going on, is only one small step from accidentally being compiled out. ARM is in a bad state too, with this initialisation even being behind the PCI Passthrough cmdline option. IMO there are two problems here; one as you pointed out (radix_tree_insert() doesn't fail), and that PCI handling requires explicit initialisation to begin with. Looking through radix tree, it wouldn't be hard to create a RADIX_TREE_INIT macro to allow initialisation at compile time for suitable objects (pci_segments and acpi_ivrs currently). That involves exporting rcu_node_{alloc,free}(), although the last caller of radix_tree_set_alloc_callbacks() was dropped when TMEM went, so we could reasonably remove that infrastructure too, at which point radix_tree_init() is a simple zero of the structure. Dealing with alloc_pseg(0) is harder. As we never free the PCI segments, we could just opencode the radix_tree_root of height=1 with a static pseg0 structure, and that would drop the need for pci_segemnts_init() completely. This gets us into a far less fragile position, and one liable to survive future refactoring too. ~Andrew P.S. Yes AMD IOMMUs really do have BARs. The BIOS programs them, then sets a register in config space to hide the BAR registers. You can reprogram them if you know how.