On Fri, Jan 23, 2026 at 06:41:46PM -0400, Jason Gunthorpe wrote:
> On Fri, Jan 23, 2026 at 01:59:04PM -0800, Matthew Brost wrote:
> > The dma-map IOVA alloc, link, and sync APIs perform significantly better
> > than dma-map / dma-unmap, as they avoid costly IOMMU synchronizations.
> > This difference is especially noticeable when mapping a 2MB region in
> > 4KB pages.
> > 
> > Use dma-map IOVA alloc, link, and sync APIs for GPU SVM and DRM page,
> > which mappings between the CPU and GPU.
> > 
> > Initial results are promising.
> > 
> > Baseline CPU time during 2M / 64K fault with a migration:
> > Average migrate 2M cpu time (us, percentage): 552.36049107142857142857, 
> > .71943789893868318799
> > Average migrate 64K cpu time (us, percentage): 24.97767857142857142857, 
> > .34789908128526791960
> > 
> > After this series CPU time during 2M / 64K fault with a migration:
> > Average migrate 2M cpu time (us, percentage): 224.81808035714285714286, 
> > .51412827364772602557
> > Average migrate 64K cpu time (us, percentage): 14.65625000000000000000, 
> > .25659463050529524405
> 
> Thats a 2x improvement in overall full operation? Wow!
> 
> Did you look at how non-iommu cases perform too?
> 

Like intel_iommu=off kerenl command line? I haven't checked that but can.

> I think we can do better still for the non-cached platforms as I have
> a way in mind to batch up lines and flush the line instead of flushing
> for every 8 byte IOPTE written. Some ARM folks have been talking about
> this problem too..

Yes, prior to the IOMMU changes I believe the basline was ~330us so
dma-map/unmap are still way slower than before and if this affect
platforms other than Intel x86 there will be complaints everyone until
the entire kernel moves to the IOVA alloc model.

Also another question does IOVA alloc support modes similar to
dma_map_resource between per device? We also do that and I haven't
modified that code or check that for perf regressions.

Matt 

> 
> Jason

Reply via email to