On 06/08/2019 14:09, Jan Beulich wrote:
> ACPI tables are free to list far more device coordinates than there are
> actual devices. By delaying the table allocations for most cases, and
> doing them only when an actual device is known to be present at a given
> position, overall memory used for the tables goes down from over 500k
> pages to just over 1k ones.

This is slightly awkward grammar.  While I don't think it is strictly
speaking incorrect, it would be more normal to phrase as "just over 1k
pages".

>
> Tables continue to get allocated right away for special entries
> (IO-APIC, HPET) as well as for alias IDs. While in the former case
> that's simply because there may not be any device at a given position,
> in the latter case this is to avoid having to introduce ref-counting of
> table usage.
>
> The change involves invoking
> iterate_ivrs_mappings(amd_iommu_setup_device_table) a second time,
> because the function now wants to be able to find PCI devices, which
> isn't possible yet when IOMMU setup happens very early during x2APIC
> mode setup. In this context amd_iommu_init_interrupt() gets renamed as
> well.
>
> The logic adjusting a DTE's interrupt remapping attributes also gets
> changed, such that the lack of an IRT would result in target aborted
> rather than not remapped interrupts (should any occur).
>
> Note that for now phantom functions get separate IRTs allocated, as was
> the case before.
>
> Signed-off-by: Jan Beulich <jbeul...@suse.com>
> ---
> v5: New.
> ---
> TBD: This retains prior (but suspicious) behavior of not calling
>      amd_iommu_set_intremap_table() for "invalid" IVRS mapping entries.
>      Since DTE.IV=0 means un-remapped interrupts, I wonder if this needs
>      changing.

How would an invalid entry get DTE.V set in the first place?

Whatever the old behaviour may have been, we shouldn't have code which
comes with a chance of having non-remapped interrupts by accident.

>
> @@ -1266,13 +1270,46 @@ static int __init amd_iommu_setup_device
>          if ( ivrs_mappings[bdf].valid )
>          {
>              void *dte;
> +            const struct pci_dev *pdev = NULL;
>  
>              /* add device table entry */
>              dte = device_table.buffer + (bdf *
> IOMMU_DEV_TABLE_ENTRY_SIZE);
>              iommu_dte_add_device_entry(dte, &ivrs_mappings[bdf]);
>  
> +            if ( iommu_intremap &&
> +                 ivrs_mappings[bdf].dte_requestor_id == bdf &&
> +                 !ivrs_mappings[bdf].intremap_table )
> +            {
> +                if ( !pci_init )
> +                    continue;
> +                pcidevs_lock();
> +                pdev = pci_get_pdev(seg, PCI_BUS(bdf), PCI_DEVFN2(bdf));
> +                pcidevs_unlock();
> +            }
> +
> +            if ( pdev )
> +            {
> +                unsigned int req_id = bdf;
> +
> +                do {
> +                    ivrs_mappings[req_id].intremap_table =
> +                        amd_iommu_alloc_intremap_table(
> +                            ivrs_mappings[bdf].iommu,
> +                            &ivrs_mappings[req_id].intremap_inuse);
> +                    if ( !ivrs_mappings[req_id].intremap_table )
> +                        return -ENOMEM;
> +
> +                    if ( !pdev->phantom_stride )
> +                        break;
> +                    req_id += pdev->phantom_stride;
> +                } while ( PCI_SLOT(req_id) == pdev->sbdf.dev );
> +            }
> +
>              amd_iommu_set_intremap_table(
> -                dte, virt_to_maddr(ivrs_mappings[bdf].intremap_table),
> +                dte,
> +                ivrs_mappings[bdf].intremap_table
> +                ? virt_to_maddr(ivrs_mappings[bdf].intremap_table)
> +                : 0,

Under what circumstances can ivrs_mappings[bdf].intremap_table be NULL,
given the ENOMEM above?

This case isn't very clear cut given the !pdev possibility, but...

> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -458,6 +459,32 @@ static int amd_iommu_add_device(u8 devfn
>          return -ENODEV;
>      }
>  
> +    ivrs_mappings = get_ivrs_mappings(pdev->seg);
> +    bdf = PCI_BDF2(pdev->bus, devfn);
> +    if ( !ivrs_mappings ||
> +         !ivrs_mappings[ivrs_mappings[bdf].dte_requestor_id].valid )
> +        return -EPERM;
> +
> +    if ( iommu_intremap &&
> +         ivrs_mappings[bdf].dte_requestor_id == bdf &&
> +         !ivrs_mappings[bdf].intremap_table )
> +    {
> +        ivrs_mappings[bdf].intremap_table =
> +            amd_iommu_alloc_intremap_table(
> +                iommu, &ivrs_mappings[bdf].intremap_inuse);
> +        if ( !ivrs_mappings[bdf].intremap_table )
> +            return -ENOMEM;
> +
> +        amd_iommu_set_intremap_table(
> +            iommu->dev_table.buffer + (bdf *
> IOMMU_DEV_TABLE_ENTRY_SIZE),
> +            ivrs_mappings[bdf].intremap_table
> +            ? virt_to_maddr(ivrs_mappings[bdf].intremap_table)
> +            : 0,

... this case definitely cannot occur.

~Andrew

> +            iommu_intremap);
> +
> +        amd_iommu_flush_device(iommu, bdf);
> +    }
> +
>      amd_iommu_setup_domain_device(pdev->domain, iommu, devfn, pdev);
>      return 0;
>  }
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to