Hello, I observed that NVMe performance is very bad when running fio on one CPU(aarch64) in remote numa node compared with the nvme pci numa node.
Please see the test result[1] 327K vs. 34.9K. Latency trace shows that one big difference is in iommu_dma_unmap_sg(), 1111 nsecs vs 25437 nsecs. [1] fio test & results 1) fio test result: - run fio on local CPU taskset -c 0 ~/git/tools/test/nvme/io_uring 10 1 /dev/nvme1n1 4k + fio --bs=4k --ioengine=io_uring --fixedbufs --registerfiles --hipri --iodepth=64 --iodepth_batch_submit=16 --iodepth_batch_complete_min=16 --filename=/dev/nvme1n1 --direct=1 --runtime=10 --numjobs=1 --rw=randread --name=test --group_reporting IOPS: 327K avg latency of iommu_dma_unmap_sg(): 1111 nsecs - run fio on remote CPU taskset -c 80 ~/git/tools/test/nvme/io_uring 10 1 /dev/nvme1n1 4k + fio --bs=4k --ioengine=io_uring --fixedbufs --registerfiles --hipri --iodepth=64 --iodepth_batch_submit=16 --iodepth_batch_complete_min=16 --filename=/dev/nvme1n1 --direct=1 --runtime=10 --numjobs=1 --rw=randread --name=test --group_reporting IOPS: 34.9K avg latency of iommu_dma_unmap_sg(): 25437 nsecs 2) system info [root@ampere-mtjade-04 ~]# lscpu | grep NUMA NUMA node(s): 2 NUMA node0 CPU(s): 0-79 NUMA node1 CPU(s): 80-159 lspci | grep NVMe 0003:01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [root@ampere-mtjade-04 ~]# cat /sys/block/nvme1n1/device/device/numa_node 0 Thanks, Ming _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu