On 2025-12-27 8:59 pm, Barry Song wrote:
On Sun, Dec 28, 2025 at 9:16 AM Leon Romanovsky <[email protected]> wrote:

On Sat, Dec 27, 2025 at 11:52:48AM +1300, Barry Song wrote:
From: Barry Song <[email protected]>

Apply batched DMA synchronization to iommu_dma_sync_sg_for_cpu() and
iommu_dma_sync_sg_for_device(). For all buffers in an SG list, only
a single flush operation is needed.

I do not have the hardware to test this, so the patch is marked as
RFC. I would greatly appreciate any testing feedback.

Cc: Leon Romanovsky <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Ada Couprie Diaz <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Tangquan Zheng <[email protected]>
Signed-off-by: Barry Song <[email protected]>
---
  drivers/iommu/dma-iommu.c | 15 +++++++--------
  1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index ffa940bdbbaf..b68dbfcb7846 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1131,10 +1131,9 @@ void iommu_dma_sync_sg_for_cpu(struct device *dev, 
struct scatterlist *sgl,
                       iommu_dma_sync_single_for_cpu(dev, sg_dma_address(sg),
                                                     sg->length, dir);
       } else if (!dev_is_dma_coherent(dev)) {
-             for_each_sg(sgl, sg, nelems, i) {
+             for_each_sg(sgl, sg, nelems, i)
                       arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
-                     arch_sync_dma_flush();
-             }
+             arch_sync_dma_flush();

This and previous patches should be squashed into the one which
introduced arch_sync_dma_flush().

Hi Leon,

The series is structured to first introduce no functional change by
replacing all arch_sync_dma_for_* calls with arch_sync_dma_for_* plus
arch_sync_dma_flush(). Subsequent patches then add batching for
different scenarios as separate changes.

Another issue is that I was unable to find a board that both runs
mainline and exercises the IOMMU paths affected by these changes.
As a result, patches 7 and 8 are marked as RFC, while the other
patches have been tested on a real board running mainline + changes.

FWIW if you can get your hands on an M.2 NVMe for the Rock5 then that has an SMMU in front of PCIe (and could also work to test non-coherent SWIOTLB, with the SMMU in bypass and either some fake restrictive dma-ranges in the DT or a hack to reduce the DMA mask in the NVMe driver.)

Cheers,
Robin.

Reply via email to