On 17/11/2023 9:47 am, Roger Pau Monne wrote:
> The current loop that iterates from 0 to the maximum RAM address in order to
> setup the IOMMU mappings is highly inefficient, and it will get worse as the
> amount of RAM increases.  It's also not accounting for any reserved regions
> past the last RAM address.
>
> Instead of iterating over memory addresses, iterate over the memory map 
> regions
> and use a rangeset in order to keep track of which ranges need to be identity
> mapped in the hardware domain physical address space.
>
> On an AMD EPYC 7452 with 512GiB of RAM, the time to execute
> arch_iommu_hwdom_init() in nanoseconds is:
>
> x old
> + new
>     N           Min           Max        Median           Avg        Stddev
> x   5 2.2364154e+10  2.338244e+10 2.2474685e+10 2.2622409e+10 4.2949869e+08
> +   5       1025012       1033036       1026188     1028276.2     3623.1194
> Difference at 95.0% confidence
>       -2.26214e+10 +/- 4.42931e+08
>       -99.9955% +/- 9.05152e-05%
>       (Student's t, pooled s = 3.03701e+08)
>
> Execution time of arch_iommu_hwdom_init() goes down from ~22s to ~0.001s.
>
> Note there's a change for HVM domains (ie: PVH dom0) that get switched to
> create the p2m mappings using map_mmio_regions() instead of
> p2m_add_identity_entry(), so that ranges can be mapped with a single function
> call if possible.  Note that the interface of map_mmio_regions() doesn't
> allow creating read-only mappings, but so far there are no such mappings
> created for PVH dom0 in arch_iommu_hwdom_init().
>
> No change intended in the resulting mappings that a hardware domain gets.
>
> Signed-off-by: Roger Pau Monné <roger....@citrix.com>

Very nice numbers.  And yes - straight line performance like this (good
or bad) is all about the innermost loop.

Sadly, the patch diff is horrible to read.  Patch 2 remaining in common
code will improve this a little, but probably not very much.

If there are no better ideas, it's probably best to split into 3
patches, being:

1) Introduce new rangeset forms of existing operations
2) Rewrite arch_iommu_hwdom_init() to use rangesets
3) Delete old mfn forms

That at least means that the new and the old forms aren't expressed as a
delta against each-other.

~Andrew

Reply via email to