Hi Will, Robin,
While analyzing an arm64 issue in interrupt handling for NVMe [0], we
have noticed a worryingly high CPU utilization in the SMMU driver.
The background is that we may get CPU lockup for high-throughput NVMe
testing, and we noticed that disabling the SMMU during testing avoids
the issue. However this lockup is a cross-architecture issue and there
are attempts to address it, like [1]. To me, disabling the SMMU is just
avoiding that specific issue.
Anyway, we should still consider this high CPU loading:
PerfTop: 1694 irqs/sec kernel:97.3% exact: 0.0% lost: 0/0
drop: 0/0 [4000Hz cycles], (all, CPU: 0)
--------------------------------------------------------------------------------------------------------------------------
50.84% [kernel] [k] arm_smmu_cmdq_issue_cmdlist
19.51% [kernel] [k] _raw_spin_unlock_irqrestore
5.14% [kernel] [k] __slab_free
2.37% [kernel] [k] bio_release_pages.part.42
2.20% [kernel] [k] fput_many
1.92% [kernel] [k] aio_complete_rw
1.85% [kernel] [k] __arm_lpae_unmap
1.71% [kernel] [k] arm_smmu_atc_inv_domain.constprop.42
1.11% [kernel] [k] sbitmap_queue_clear
1.05% [kernel] [k] blk_mq_free_request
0.97% [kernel] [k] nvme_irq
0.71% [kernel] [k] blk_account_io_done
0.66% [kernel] [k] kmem_cache_free
0.66% [kernel] [k] blk_mq_complete_request
This is for a CPU servicing the NVMe interrupt and doing the DMA unmap.
The DMA unmap is done in threaded interrupt context.
And for the overall system, we have:
PerfTop: 85864 irqs/sec kernel:89.6% exact: 0.0% lost: 0/34434
drop: 0/40116 [4000Hz cycles], (all, 96 CPUs)
--------------------------------------------------------------------------------------------------------------------------
27.43% [kernel] [k] arm_smmu_cmdq_issue_cmdlist
11.71% [kernel] [k] _raw_spin_unlock_irqrestore
6.35% [kernel] [k] _raw_spin_unlock_irq
2.65% [kernel] [k] get_user_pages_fast
2.03% [kernel] [k] __slab_free
1.55% [kernel] [k] tick_nohz_idle_exit
1.47% [kernel] [k] arm_lpae_map
1.39% [kernel] [k] __fget
1.14% [kernel] [k] __lock_text_start
1.09% [kernel] [k] _raw_spin_lock
1.08% [kernel] [k] bio_release_pages.part.42
1.03% [kernel] [k] __sbitmap_get_word
0.97% [kernel] [k] arm_smmu_atc_inv_domain.constprop.42
0.91% [kernel] [k] fput_many
0.88% [kernel] [k] __arm_lpae_map
One thing to note is that we still spend an appreciable amount of time
in arm_smmu_atc_inv_domain(), which is disappointing when considering it
should effectively be a noop.
As for arm_smmu_cmdq_issue_cmdlist(), I do note that during the testing
our batch size is 1, so we're not seeing the real benefit of the
batching. I can't help but think that we could improve this code to try
to combine CMD SYNCs for small batches.
Anyway, let me know your thoughts or any questions. I'll have a look if
a get a chance for other possible bottlenecks.
[0]
https://lore.kernel.org/lkml/[email protected]/
[1]
https://lore.kernel.org/linux-nvme/[email protected]/
Cheers,
John
On 21/08/2019 16:17, Will Deacon wrote:
Hi again,
This is version two of the patches I posted yesterday:
v1: https://lkml.kernel.org/r/[email protected]
Changes since then include:
* Fix 'ats_enabled' checking when enabling ATS
* Remove redundant 'dev_is_pci()' calls
* Remove bool bitfield
* Add patch temporarily disabling ATS detection for -stable
* Issue ATC invalidation even when non-leaf
* Elide invalidation/SYNC for zero-sized address ranges
* Shuffle the patches round a bit
Thanks,
Will
Cc: Zhen Lei <[email protected]>
Cc: Jean-Philippe Brucker <[email protected]>
Cc: John Garry <[email protected]>
Cc: Robin Murphy <[email protected]>
--->8
Will Deacon (8):
iommu/arm-smmu-v3: Document ordering guarantees of command insertion
iommu/arm-smmu-v3: Disable detection of ATS and PRI
iommu/arm-smmu-v3: Remove boolean bitfield for 'ats_enabled' flag
iommu/arm-smmu-v3: Don't issue CMD_SYNC for zero-length invalidations
iommu/arm-smmu-v3: Rework enabling/disabling of ATS for PCI masters
iommu/arm-smmu-v3: Fix ATC invalidation ordering wrt main TLBs
iommu/arm-smmu-v3: Avoid locking on invalidation path when not using
ATS
Revert "iommu/arm-smmu-v3: Disable detection of ATS and PRI"
drivers/iommu/arm-smmu-v3.c | 117 ++++++++++++++++++++++++++++++++------------
1 file changed, 87 insertions(+), 30 deletions(-)
_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu