On 24.01.2024 18:29, Roger Pau Monne wrote: > The current loop that iterates from 0 to the maximum RAM address in order to > setup the IOMMU mappings is highly inefficient, and it will get worse as the > amount of RAM increases. It's also not accounting for any reserved regions > past the last RAM address. > > Instead of iterating over memory addresses, iterate over the memory map > regions > and use a rangeset in order to keep track of which ranges need to be identity > mapped in the hardware domain physical address space. > > On an AMD EPYC 7452 with 512GiB of RAM, the time to execute > arch_iommu_hwdom_init() in nanoseconds is: > > x old > + new > N Min Max Median Avg Stddev > x 5 2.2364154e+10 2.338244e+10 2.2474685e+10 2.2622409e+10 4.2949869e+08 > + 5 1025012 1033036 1026188 1028276.2 3623.1194 > Difference at 95.0% confidence > -2.26214e+10 +/- 4.42931e+08 > -99.9955% +/- 9.05152e-05% > (Student's t, pooled s = 3.03701e+08) > > Execution time of arch_iommu_hwdom_init() goes down from ~22s to ~0.001s. > > Note there's a change for HVM domains (ie: PVH dom0) that get switched to > create the p2m mappings using map_mmio_regions() instead of > p2m_add_identity_entry(), so that ranges can be mapped with a single function > call if possible. Note that the interface of map_mmio_regions() doesn't > allow creating read-only mappings, but so far there are no such mappings > created for PVH dom0 in arch_iommu_hwdom_init(). > > No change intended in the resulting mappings that a hardware domain gets. > > Signed-off-by: Roger Pau Monné <roger....@citrix.com>
Reviewed-by: Jan Beulich <jbeul...@suse.com>