On Tue, Aug 18, 2020 at 09:28:53AM +0100, Will Deacon wrote: > On Tue, Aug 18, 2020 at 04:43:10PM +0900, Cho KyongHo wrote: > > Cache maintenance operations in the most of CPU architectures needs > > memory barrier after the cache maintenance for the DMAs to view the > > region of the memory correctly. The problem is that memory barrier is > > very expensive and dma_[un]map_sg() and dma_sync_sg_for_{device|cpu}() > > involves the memory barrier per every single cache sg entry. In some > > CPU micro-architecture, a single memory barrier consumes more time than > > cache clean on 4KiB. It becomes more serious if the number of CPU cores > > are larger. > > Have you got higher-level performance data for this change? It's more likely > that the DSB is what actually forces the prior cache maintenance to > complete, so it's important to look at the bigger picture, not just the > apparent relative cost of these instructions. > > Also, it's a miracle that non-coherent DMA even works, so I'm not sure > that we should be complicating the implementation like this to try to > make it "fast".
And without not just an important in-tree user but one that actually matters and can show how this is correct the whole proposal is complete nonstarter. _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu