On 30/01/2025 11:12 am, Jan Beulich wrote:
> In order for amd_iommu_detect_one_acpi()'s call to pci_ro_device() to
> have permanent effect, pci_segments_init() needs to be called ahead of
> making it there. Without this we're losing segment 0's r/o map, and thus
> we're losing write-protection of the PCI devices representing IOMMUs.
> Which in turn means that half-way recent Linux Dom0 will, as it boots,
> turn off MSI on these devices, thus preventing any IOMMU events (faults
> in particular) from being reported on pre-x2APIC hardware.
>
> As the acpi_iommu_init() invocation was moved ahead of
> acpi_mmcfg_init()'s by the offending commit, move the call to
> pci_segments_init() accordingly.
>
> Fixes: 3950f2485bbc ("x86/x2APIC: defer probe until after IOMMU ACPI table 
> parsing")
> Signed-off-by: Jan Beulich <jbeul...@suse.com>
> ---
> Of course it would have been quite a bit easier to notice this issue if
> radix_tree_insert() wouldn't work fine ahead of radix_tree_init() being
> invoked for a given radix tree, when the index inserted at is 0.
>
> While hunting down various other dead paths to actually find the root
> cause, it occurred to me that it's probably not a good idea to fully
> disallow config space writes for r/o devices: Dom0 won't be able to size
> their BARs (luckily the IOMMU "devices" don't have any, but e.g. serial
> ones generally will have at least one), for example. Without being able
> to size BARs it also will likely be unable to correctly account for the
> address space taken by these BARs. However, outside of vPCI it's not
> really clear to me how we could reasonably emulate such BAR sizing
> writes - we can't, after all, allow Dom0 to actually write to the
> underlying physical registers, yet we don't intercept reads (i.e. we
> can't mimic expected behavior then).
>
> --- a/xen/arch/x86/x86_64/mmconfig-shared.c
> +++ b/xen/arch/x86/x86_64/mmconfig-shared.c
> @@ -402,8 +402,6 @@ void __init acpi_mmcfg_init(void)
>  {
>      bool valid = true;
>  
> -    pci_segments_init();
> -
>      /* MMCONFIG disabled */
>      if ((pci_probe & PCI_PROBE_MMCONF) == 0)
>          return;
> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -55,6 +55,8 @@ void __init acpi_iommu_init(void)
>  {
>      int ret = -ENODEV;
>  
> +    pci_segments_init();
> +
>      if ( !iommu_enable && !iommu_intremap )
>          return;
>  
>

I can't help but feel this is taking a bad problem and not making it any
better.

pci_segments_init() is even less (obviously) relevant in
apci_iommu_init() than it is in acpi_mmcfg_init(), and given the
fine-grain Kconfig-ing going on, is only one small step from
accidentally being compiled out.

ARM is in a bad state too, with this initialisation even being behind
the PCI Passthrough cmdline option.

IMO there are two problems here; one as you pointed out
(radix_tree_insert() doesn't fail), and that PCI handling requires
explicit initialisation to begin with.

Looking through radix tree, it wouldn't be hard to create a
RADIX_TREE_INIT macro to allow initialisation at compile time for
suitable objects (pci_segments and acpi_ivrs currently).

That involves exporting rcu_node_{alloc,free}(), although the last
caller of radix_tree_set_alloc_callbacks() was dropped when TMEM went,
so we could reasonably remove that infrastructure too, at which point
radix_tree_init() is a simple zero of the structure.

Dealing with alloc_pseg(0) is harder.  As we never free the PCI
segments, we could just opencode the radix_tree_root of height=1 with a
static pseg0 structure, and that would drop the need for
pci_segemnts_init() completely.

This gets us into a far less fragile position, and one liable to survive
future refactoring too.

~Andrew

P.S. Yes AMD IOMMUs really do have BARs.  The BIOS programs them, then
sets a register in config space to hide the BAR registers.  You can
reprogram them if you know how.

Reply via email to