[PATCH RESEND v5 0/5] iommu: Allow IOVA rcache range be configured

2022-04-04 Thread John Garry via iommu
For streaming DMA mappings involving an IOMMU and whose IOVA len regularly exceeds the IOVA rcache upper limit (meaning that they are not cached), performance can be reduced. This may be much more pronounced from commit 4e89dce72521 ("iommu/iova: Retry from last rb tree node if iova search fails"

[PATCH RESEND v5 1/5] iommu: Refactor iommu_group_store_type()

2022-04-04 Thread John Garry via iommu
Function iommu_group_store_type() supports changing the default domain of an IOMMU group. Many conditions need to be satisfied and steps taken for this action to be successful. Satisfying these conditions and steps will be required for setting other IOMMU group attributes, so factor into a common

[PATCH RESEND v5 2/5] iova: Allow rcache range upper limit to be flexible

2022-04-04 Thread John Garry via iommu
Some low-level drivers may request DMA mappings whose IOVA length exceeds that of the current rcache upper limit. This means that allocations for those IOVAs will never be cached, and always must be allocated and freed from the RB tree per DMA mapping cycle. This has a significant effect on perfor

[PATCH RESEND v5 3/5] iommu: Allow iommu_change_dev_def_domain() realloc same default domain type

2022-04-04 Thread John Garry via iommu
Allow iommu_change_dev_def_domain() to create a new default domain, keeping the same as current. Also remove comment about the function purpose, which will become stale. Signed-off-by: John Garry --- drivers/iommu/iommu.c | 49 ++- include/linux/iommu.h |

[PATCH RESEND v5 4/5] iommu: Allow max opt DMA len be set for a group via sysfs

2022-04-04 Thread John Garry via iommu
Add support to allow the maximum optimised DMA len be set for an IOMMU group via sysfs. This is much the same with the method to change the default domain type for a group. Signed-off-by: John Garry --- .../ABI/testing/sysfs-kernel-iommu_groups | 16 + drivers/iommu/iommu.c

[PATCH RESEND v5 5/5] iova: Add iova_len argument to iova_domain_init_rcaches()

2022-04-04 Thread John Garry via iommu
Add max opt argument to iova_domain_init_rcaches(), and use it to set the rcaches range. Also fix up all users to set this value (at 0, meaning use default), including a wrapper for that, iova_domain_init_rcaches_default(). For dma-iommu.c we derive the iova_len argument from the IOMMU group max

Re: [PATCH RESEND v5 5/5] iova: Add iova_len argument to iova_domain_init_rcaches()

2022-04-07 Thread John Garry via iommu
On 07/04/2022 09:27, Leizhen (ThunderTown) wrote: Thanks for having a look On 2022/4/4 19:27, John Garry wrote: Add max opt argument to iova_domain_init_rcaches(), and use it to set the rcaches range. Also fix up all users to set this value (at 0, meaning use default), including a wra

Re: [PATCH RESEND v5 4/5] iommu: Allow max opt DMA len be set for a group via sysfs

2022-04-07 Thread John Garry via iommu
On 07/04/2022 09:21, Leizhen (ThunderTown) wrote: On 2022/4/4 19:27, John Garry wrote: Add support to allow the maximum optimised DMA len be set for an IOMMU group via sysfs. This is much the same with the method to change the default domain type for a group. Signed-off-by: John Garry ---

Re: [PATCH v7 2/7] hwtracing: Add trace function support for HiSilicon PCIe Tune and Trace device

2022-04-11 Thread John Garry via iommu
On 07/04/2022 13:58, Yicong Yang wrote: HiSilicon PCIe tune and trace device(PTT) is a PCIe Root Complex integrated Endpoint(RCiEP) device, providing the capability to dynamically monitor and tune the PCIe traffic, and trace the TLP headers. Add the driver for the device to enable the trace func

Re: [PATCH v7 5/7] perf tool: Add support for HiSilicon PCIe Tune and Trace device driver

2022-04-11 Thread John Garry via iommu
On 07/04/2022 13:58, Yicong Yang wrote: From: Qi Liu 'perf record' and 'perf report --dump-raw-trace' supported in this patch. Example usage: Output will contain raw PTT data and its textual representation, such as: 0 0 0x5810 [0x30]: PERF_RECORD_AUXTRACE size: 0x40 offset: 0 ref: 0xa5d

Re: [PATCH v7 2/7] hwtracing: Add trace function support for HiSilicon PCIe Tune and Trace device

2022-04-12 Thread John Garry via iommu
+static int hisi_ptt_alloc_trace_buf(struct hisi_ptt *hisi_ptt) +{ +    struct hisi_ptt_trace_ctrl *ctrl = &hisi_ptt->trace_ctrl; +    struct device *dev = &hisi_ptt->pdev->dev; +    int i; + +    hisi_ptt->trace_ctrl.buf_index = 0; + +    /* If the trace buffer has already been allocated, zero it

Re: [PATCH v7 5/7] perf tool: Add support for HiSilicon PCIe Tune and Trace device driver

2022-04-14 Thread John Garry via iommu
On 12/04/2022 08:41, Yicong Yang wrote: +    hisi_ptt_pmus = zalloc(sizeof(struct perf_pmu *) * (*nr_ptts)); +    if (!hisi_ptt_pmus) { +    pr_err("hisi_ptt alloc failed\n"); +    *err = -ENOMEM; using PTR_ERR seems better, if possible ok will change to that. *err = -ENOMEM is used he

Re: [PATCH v7 1/7] iommu/arm-smmu-v3: Make default domain type of HiSilicon PTT device to identity

2022-05-11 Thread John Garry via iommu
On 07/04/2022 13:58, Yicong Yang wrote: The DMA operations of HiSilicon PTT device can only work properly with identical mappings. So add a quirk for the device to force the domain I'm not sure if you meant to write "identity mappings". as passthrough. Signed-off-by: Yicong Yang FWIW, Re

[RFC PATCH] dma-iommu: Add iommu_dma_max_mapping_size()

2022-05-16 Thread John Garry via iommu
For streaming DMA mappings involving an IOMMU and whose IOVA len regularly exceeds the IOVA rcache upper limit (meaning that they are not cached), performance can be reduced. Add the IOMMU callback for DMA mapping API dma_max_mapping_size(), which allows the drivers to know the mapping limit and t

Re: [PATCH v8 2/8] hwtracing: hisi_ptt: Add trace function support for HiSilicon PCIe Tune and Trace device

2022-05-16 Thread John Garry via iommu
On 16/05/2022 13:52, Yicong Yang wrote: HiSilicon PCIe tune and trace device(PTT) is a PCIe Root Complex integrated Endpoint(RCiEP) device, providing the capability to dynamically monitor and tune the PCIe traffic and trace the TLP headers. Add the driver for the device to enable the trace funct

Re: [PATCH v8 3/8] hwtracing: hisi_ptt: Add tune function support for HiSilicon PCIe Tune and Trace device

2022-05-16 Thread John Garry via iommu
On 16/05/2022 13:52, Yicong Yang wrote: Add tune function for the HiSilicon Tune and Trace device. The interface of tune is exposed through sysfs attributes of PTT PMU device. Signed-off-by: Yicong Yang Reviewed-by: Jonathan Cameron Apart from a comment on preferential style: Reviewed-by: J

Re: [PATCH v8 4/8] perf arm: Refactor event list iteration in auxtrace_record__init()

2022-05-16 Thread John Garry via iommu
On 16/05/2022 13:52, Yicong Yang wrote: As requested before, please mention "perf tool" in the commit subject From: Qi Liu Use find_pmu_for_event() to simplify logic in auxtrace_record__init(). Signed-off-by: Qi Liu Signed-off-by: Yicong Yang --- tools/perf/arch/arm/util/auxtrace.c | 53

Re: [PATCH v8 2/8] hwtracing: hisi_ptt: Add trace function support for HiSilicon PCIe Tune and Trace device

2022-05-17 Thread John Garry via iommu
On 17/05/2022 09:09, Yicong Yang wrote: +    target = cpumask_any(cpumask_of_node(dev_to_node(&hisi_ptt->pdev->dev))); +    if (target < nr_cpumask_bits) { the comment for cpumask_any() hints to check against nr_cpu_ids - any specific reason to check against nr_cpumask_bits? here should be:

Re: [RFC PATCH] dma-iommu: Add iommu_dma_max_mapping_size()

2022-05-17 Thread John Garry via iommu
On 17/05/2022 09:38, Christoph Hellwig wrote: On Mon, May 16, 2022 at 09:06:01PM +0800, John Garry wrote: For streaming DMA mappings involving an IOMMU and whose IOVA len regularly exceeds the IOVA rcache upper limit (meaning that they are not cached), performance can be reduced. Add the IOMMU

Re: [RFC PATCH] dma-iommu: Add iommu_dma_max_mapping_size()

2022-05-17 Thread John Garry via iommu
On 17/05/2022 11:40, Robin Murphy wrote: On 2022-05-16 14:06, John Garry wrote: For streaming DMA mappings involving an IOMMU and whose IOVA len regularly exceeds the IOVA rcache upper limit (meaning that they are not cached), performance can be reduced. Add the IOMMU callback for DMA mapping

Re: [RFC PATCH] dma-iommu: Add iommu_dma_max_mapping_size()

2022-05-17 Thread John Garry via iommu
On 17/05/2022 13:02, Robin Murphy wrote: Indeed, sorry but NAK for this being nonsense. As I've said at least once before, if the unnecessary SAC address allocation attempt slows down your workload, make it not do that in the first place. If you don't like the existing command-line parameter

Re: [PATCH] iommu/dma: Add config for PCI SAC address trick

2022-05-19 Thread John Garry via iommu
On 18/05/2022 18:36, Robin Murphy wrote: For devices stuck behind a conventional PCI bus, saving extra cycles at 33MHz is probably fairly significant. However since native PCI Express is now the norm for high-performance devices, the optimisation to always prefer 32-bit addresses for the sake of

[PATCH 0/4] DMA mapping changes for SCSI core

2022-05-20 Thread John Garry via iommu
As reported in [0], DMA mappings whose size exceeds the IOMMU IOVA caching limit may see a big performance hit. This series introduces a new DMA mapping API, dma_opt_mapping_size(), so that drivers may know this limit when performance is a factor in the mapping. Robin didn't like using dma_max_ma

[PATCH 2/4] dma-iommu: Add iommu_dma_opt_mapping_size()

2022-05-20 Thread John Garry via iommu
Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which allows the drivers to know the optimal mapping limit and thus limit the requested IOVA lengths. This value is based on the IOVA rcache range limit, as IOVAs allocated above this limit must always be newly allocated, which may

[PATCH 1/4] dma-mapping: Add dma_opt_mapping_size()

2022-05-20 Thread John Garry via iommu
Streaming DMA mapping involving an IOMMU may be much slower for larger total mapping size. This is because every IOMMU DMA mapping requires an IOVA to be allocated and freed. IOVA sizes above a certain limit are not cached, which can have a big impact on DMA mapping performance. Provide an API for

[PATCH 3/4] scsi: core: Cap shost max_sectors according to DMA optimum mapping limits

2022-05-20 Thread John Garry via iommu
Streaming DMA mappings may be considerably slower when mappings go through an IOMMU and the total mapping length is somewhat long. This is because the IOMMU IOVA code allocates and free an IOVA for each mapping, which may affect performance. For performance reasons set the request_queue max_sector

[PATCH 4/4] libata-scsi: Cap ata_device->max_sectors according to shost->max_sectors

2022-05-20 Thread John Garry via iommu
ATA devices (struct ata_device) have a max_sectors field which is configured internally in libata. This is then used to (re)configure the associated sdev request queue max_sectors value from how it is earlier set in __scsi_init_queue(). In __scsi_init_queue() the max_sectors value is set according

Re: [PATCH 3/4] scsi: core: Cap shost max_sectors according to DMA optimum mapping limits

2022-05-22 Thread John Garry via iommu
On 21/05/2022 00:30, Damien Le Moal wrote: diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c index f69b77cbf538..a3ae6345473b 100644 --- a/drivers/scsi/hosts.c +++ b/drivers/scsi/hosts.c @@ -225,6 +225,11 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev, s

Re: [PATCH 2/4] dma-iommu: Add iommu_dma_opt_mapping_size()

2022-05-23 Thread John Garry via iommu
On 21/05/2022 00:33, Damien Le Moal wrote: Hi Damien, +unsigned long iova_rcache_range(void) Why not a size_t return type ? The IOVA code generally uses unsigned long for size/range while dam-iommu uses size_t as appropiate, so I'm just sticking to that. +{ + return PAGE_SIZE <<

Re: [PATCH 3/4] scsi: core: Cap shost max_sectors according to DMA optimum mapping limits

2022-05-23 Thread John Garry via iommu
On 23/05/2022 12:08, Dan Carpenter wrote: Thanks for the report 50b6cb3516365c Dexuan Cui2021-10-07 224/* Use min_t(int, ...) in case shost->can_queue exceeds SHRT_MAX */ 50b6cb3516365c Dexuan Cui2021-10-07 225shost->cmd_per_lun = min_t(int, shost->cmd_per_lu

Re: [PATCH 0/4] DMA mapping changes for SCSI core

2022-05-23 Thread John Garry via iommu
On 22/05/2022 23:22, Damien Le Moal wrote: On 2022/05/22 22:13, Christoph Hellwig wrote: The whole series looks fine to me. I'll happily queue it up in the dma-mapping tree if the SCSI and ATA maintainers are ok with that. Fine with me. I sent an acked-by for the libata bit. Thanks, I'm g

[PATCH v2 0/4] DMA mapping changes for SCSI core

2022-05-26 Thread John Garry via iommu
As reported in [0], DMA mappings whose size exceeds the IOMMU IOVA caching limit may see a big performance hit. This series introduces a new DMA mapping API, dma_opt_mapping_size(), so that drivers may know this limit when performance is a factor in the mapping. Robin didn't like using dma_max_ma

[PATCH v2 1/4] dma-mapping: Add dma_opt_mapping_size()

2022-05-26 Thread John Garry via iommu
Streaming DMA mapping involving an IOMMU may be much slower for larger total mapping size. This is because every IOMMU DMA mapping requires an IOVA to be allocated and freed. IOVA sizes above a certain limit are not cached, which can have a big impact on DMA mapping performance. Provide an API for

[PATCH v2 2/4] dma-iommu: Add iommu_dma_opt_mapping_size()

2022-05-26 Thread John Garry via iommu
Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which allows the drivers to know the optimal mapping limit and thus limit the requested IOVA lengths. This value is based on the IOVA rcache range limit, as IOVAs allocated above this limit must always be newly allocated, which may

[PATCH v2 3/4] scsi: core: Cap shost max_sectors according to DMA optimum mapping limits

2022-05-26 Thread John Garry via iommu
Streaming DMA mappings may be considerably slower when mappings go through an IOMMU and the total mapping length is somewhat long. This is because the IOMMU IOVA code allocates and free an IOVA for each mapping, which may affect performance. For performance reasons set the request_queue max_sector

[PATCH v2 4/4] libata-scsi: Cap ata_device->max_sectors according to shost->max_sectors

2022-05-26 Thread John Garry via iommu
ATA devices (struct ata_device) have a max_sectors field which is configured internally in libata. This is then used to (re)configure the associated sdev request queue max_sectors value from how it is earlier set in __scsi_init_queue(). In __scsi_init_queue() the max_sectors value is set according

[PATCH v3 1/4] dma-mapping: Add dma_opt_mapping_size()

2022-06-06 Thread John Garry via iommu
Streaming DMA mapping involving an IOMMU may be much slower for larger total mapping size. This is because every IOMMU DMA mapping requires an IOVA to be allocated and freed. IOVA sizes above a certain limit are not cached, which can have a big impact on DMA mapping performance. Provide an API for

[PATCH v3 0/4] DMA mapping changes for SCSI core

2022-06-06 Thread John Garry via iommu
As reported in [0], DMA mappings whose size exceeds the IOMMU IOVA caching limit may see a big performance hit. This series introduces a new DMA mapping API, dma_opt_mapping_size(), so that drivers may know this limit when performance is a factor in the mapping. Robin didn't like using dma_max_ma

[PATCH v3 2/4] dma-iommu: Add iommu_dma_opt_mapping_size()

2022-06-06 Thread John Garry via iommu
Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which allows the drivers to know the optimal mapping limit and thus limit the requested IOVA lengths. This value is based on the IOVA rcache range limit, as IOVAs allocated above this limit must always be newly allocated, which may

[PATCH v3 3/4] scsi: core: Cap shost max_sectors according to DMA optimum mapping limits

2022-06-06 Thread John Garry via iommu
Streaming DMA mappings may be considerably slower when mappings go through an IOMMU and the total mapping length is somewhat long. This is because the IOMMU IOVA code allocates and free an IOVA for each mapping, which may affect performance. For performance reasons set the request_queue max_sector

[PATCH v3 4/4] libata-scsi: Cap ata_device->max_sectors according to shost->max_sectors

2022-06-06 Thread John Garry via iommu
ATA devices (struct ata_device) have a max_sectors field which is configured internally in libata. This is then used to (re)configure the associated sdev request queue max_sectors value from how it is earlier set in __scsi_init_queue(). In __scsi_init_queue() the max_sectors value is set according

Re: [PATCH v3 0/4] DMA mapping changes for SCSI core

2022-06-08 Thread John Garry via iommu
On 07/06/2022 23:43, Bart Van Assche wrote: On 6/6/22 02:30, John Garry wrote: As reported in [0], DMA mappings whose size exceeds the IOMMU IOVA caching limit may see a big performance hit. This series introduces a new DMA mapping API, dma_opt_mapping_size(), so that drivers may know this lim

Re: [PATCH v3 2/4] dma-iommu: Add iommu_dma_opt_mapping_size()

2022-06-08 Thread John Garry via iommu
On 08/06/2022 18:26, Bart Van Assche wrote: On 6/6/22 02:30, John Garry via iommu wrote: +unsigned long iova_rcache_range(void) +{ +    return PAGE_SIZE << (IOVA_RANGE_CACHE_MAX_SIZE - 1); +} My understanding is that iova cache entries may be smaller than IOVA_RANGE_CACHE_MAX_SIZE and

Re: [PATCH v3 3/4] scsi: core: Cap shost max_sectors according to DMA optimum mapping limits

2022-06-08 Thread John Garry via iommu
On 08/06/2022 18:33, Bart Van Assche wrote: On 6/6/22 02:30, John Garry wrote: +    if (dma_dev->dma_mask) { +    shost->max_sectors = min_t(unsigned int, shost->max_sectors, +    dma_opt_mapping_size(dma_dev) >> SECTOR_SHIFT); +    } Since IOVA_RANGE_CACHE_MAX_SIZE = 6 this li

Re: [PATCH v3 3/4] scsi: core: Cap shost max_sectors according to DMA optimum mapping limits

2022-06-09 Thread John Garry via iommu
On 08/06/2022 22:07, Bart Van Assche wrote: On 6/8/22 10:50, John Garry wrote: Please note that this limit only applies if we have an IOMMU enabled for the scsi host dma device. Otherwise we are limited by dma direct or swiotlb max mapping size, as before. SCSI host bus adapters that support

Re: [PATCH v3 3/4] scsi: core: Cap shost max_sectors according to DMA optimum mapping limits

2022-06-09 Thread John Garry via iommu
On 09/06/2022 18:18, Bart Van Assche wrote: SCSI host bus adapters that support 64-bit DMA may support much larger transfer sizes than 128 KiB. Indeed, and that is my problem today, as my storage controller is generating DMA mapping lengths which exceeds 128K and they slow everything down.

Re: [PATCH v2] iommu/dma: Add config for PCI SAC address trick

2022-06-09 Thread John Garry via iommu
On 09/06/2022 16:12, Robin Murphy wrote: For devices stuck behind a conventional PCI bus, saving extra cycles at 33MHz is probably fairly significant. However since native PCI Express is now the norm for high-performance devices, the optimisation to always prefer 32-bit addresses for the sake of

Re: [PATCH v3 3/4] scsi: core: Cap shost max_sectors according to DMA optimum mapping limits

2022-06-10 Thread John Garry via iommu
On 09/06/2022 21:34, Bart Van Assche wrote: On 6/9/22 10:54, John Garry wrote: ok, but do you have a system where the UFS host controller is behind an IOMMU? I had the impression that UFS controllers would be mostly found in embedded systems and IOMMUs are not as common on there. Modern phone

Re: [PATCH v3 2/4] dma-iommu: Add iommu_dma_opt_mapping_size()

2022-06-14 Thread John Garry via iommu
On 06/06/2022 10:30, John Garry wrote: Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which allows the drivers to know the optimal mapping limit and thus limit the requested IOVA lengths. This value is based on the IOVA rcache range limit, as IOVAs allocated above this limit

Re: [PATCH v3 3/4] scsi: core: Cap shost max_sectors according to DMA optimum mapping limits

2022-06-23 Thread John Garry via iommu
On 10/06/2022 16:37, John Garry via iommu wrote: On 6/9/22 10:54, John Garry wrote: ok, but do you have a system where the UFS host controller is behind an IOMMU? I had the impression that UFS controllers would be mostly found in embedded systems and IOMMUs are not as common on there

Re: [PATCH v3 2/4] dma-iommu: Add iommu_dma_opt_mapping_size()

2022-06-23 Thread John Garry via iommu
On 14/06/2022 14:12, John Garry wrote: On 06/06/2022 10:30, John Garry wrote: Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which allows the drivers to know the optimal mapping limit and thus limit the requested IOVA lengths. This value is based on the IOVA rcache range lim

[PATCH v4 1/5] dma-mapping: Add dma_opt_mapping_size()

2022-06-27 Thread John Garry via iommu
Streaming DMA mapping involving an IOMMU may be much slower for larger total mapping size. This is because every IOMMU DMA mapping requires an IOVA to be allocated and freed. IOVA sizes above a certain limit are not cached, which can have a big impact on DMA mapping performance. Provide an API for

[PATCH v4 0/5] DMA mapping changes for SCSI core

2022-06-27 Thread John Garry via iommu
As reported in [0], DMA mappings whose size exceeds the IOMMU IOVA caching limit may see a big performance hit. This series introduces a new DMA mapping API, dma_opt_mapping_size(), so that drivers may know this limit when performance is a factor in the mapping. The SCSI SAS transport code is mod

[PATCH v4 3/5] scsi: core: Cap shost max_sectors according to DMA mapping limits only once

2022-06-27 Thread John Garry via iommu
The shost->max_sectors is repeatedly capped according to the host DMA mapping limit for each sdev in __scsi_init_queue(). This is unnecessary, so set only once when adding the host. Signed-off-by: John Garry --- drivers/scsi/hosts.c| 5 + drivers/scsi/scsi_lib.c | 4 2 files changed

[PATCH v4 2/5] dma-iommu: Add iommu_dma_opt_mapping_size()

2022-06-27 Thread John Garry via iommu
Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which allows the drivers to know the optimal mapping limit and thus limit the requested IOVA lengths. This value is based on the IOVA rcache range limit, as IOVAs allocated above this limit must always be newly allocated, which may

[PATCH v4 4/5] scsi: scsi_transport_sas: Cap shost max_sectors according to DMA optimal mapping limit

2022-06-27 Thread John Garry via iommu
Streaming DMA mappings may be considerably slower when mappings go through an IOMMU and the total mapping length is somewhat long. This is because the IOMMU IOVA code allocates and free an IOVA for each mapping, which may affect performance. For performance reasons set the request queue max_sector

[PATCH v4 5/5] libata-scsi: Cap ata_device->max_sectors according to shost->max_sectors

2022-06-27 Thread John Garry via iommu
ATA devices (struct ata_device) have a max_sectors field which is configured internally in libata. This is then used to (re)configure the associated sdev request queue max_sectors value from how it is earlier set in __scsi_init_queue(). In __scsi_init_queue() the max_sectors value is set according

Re: [PATCH v4 5/5] libata-scsi: Cap ata_device->max_sectors according to shost->max_sectors

2022-06-28 Thread John Garry via iommu
On 28/06/2022 00:24, Damien Le Moal wrote: On 6/28/22 00:25, John Garry wrote: ATA devices (struct ata_device) have a max_sectors field which is configured internally in libata. This is then used to (re)configure the associated sdev request queue max_sectors value from how it is earlier set in _

Re: [PATCH v4 1/5] dma-mapping: Add dma_opt_mapping_size()

2022-06-28 Thread John Garry via iommu
On 28/06/2022 12:23, Robin Murphy wrote: + +    size_t +    dma_opt_mapping_size(struct device *dev); + +Returns the maximum optimal size of a mapping for the device. Mapping large +buffers may take longer so device drivers are advised to limit total DMA +streaming mappings length to the return

Re: [PATCH v4 5/5] libata-scsi: Cap ata_device->max_sectors according to shost->max_sectors

2022-06-28 Thread John Garry via iommu
On 28/06/2022 10:14, Damien Le Moal wrote: BTW, this patch has no real dependency on the rest of the series, so could be taken separately if you prefer. Sure, you can send it separately. Adding it through the scsi tree is fine too. Well Christoph originally offered to take this series via the

Re: [PATCH v4 5/5] libata-scsi: Cap ata_device->max_sectors according to shost->max_sectors

2022-06-29 Thread John Garry via iommu
On 29/06/2022 06:58, Damien Le Moal wrote: On 6/29/22 14:40, Christoph Hellwig wrote: On Tue, Jun 28, 2022 at 12:33:58PM +0100, John Garry wrote: Well Christoph originally offered to take this series via the dma-mapping tree. @Christoph, is that still ok with you? If so, would you rather I sen

Re: [PATCH v4 1/5] dma-mapping: Add dma_opt_mapping_size()

2022-06-29 Thread John Garry via iommu
On 28/06/2022 12:27, John Garry via iommu wrote: On 28/06/2022 12:23, Robin Murphy wrote: + +    size_t +    dma_opt_mapping_size(struct device *dev); + +Returns the maximum optimal size of a mapping for the device. Mapping large +buffers may take longer so device drivers are advised to limit

Re: [PATCH] iommu/iova: change IOVA_MAG_SIZE to 127 to save memory

2022-06-30 Thread John Garry via iommu
On 30/06/2022 10:02, Robin Murphy wrote: On 2022-06-30 08:33, Feng Tang wrote: kmalloc will round up the request size to power of 2, and current iova_magazine's size is 1032 (1024+8) bytes, so each instance allocated will get 2048 bytes from kmalloc, causing around 1KB waste. And in some exstre

Re: [PATCH] iommu/iova: change IOVA_MAG_SIZE to 127 to save memory

2022-06-30 Thread John Garry via iommu
   [    4.319253] iommu: Adding device :06:00.2 to group 5    [    4.325869] iommu: Adding device :20:01.0 to group 15    [    4.332648] iommu: Adding device :20:02.0 to group 16    [    4.338946] swapper/0 invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null

[PATCH v5 0/5] DMA mapping changes for SCSI core

2022-06-30 Thread John Garry via iommu
As reported in [0], DMA mappings whose size exceeds the IOMMU IOVA caching limit may see a big performance hit. This series introduces a new DMA mapping API, dma_opt_mapping_size(), so that drivers may know this limit when performance is a factor in the mapping. The SCSI SAS transport code is mod

[PATCH v5 1/5] dma-mapping: Add dma_opt_mapping_size()

2022-06-30 Thread John Garry via iommu
Streaming DMA mapping involving an IOMMU may be much slower for larger total mapping size. This is because every IOMMU DMA mapping requires an IOVA to be allocated and freed. IOVA sizes above a certain limit are not cached, which can have a big impact on DMA mapping performance. Provide an API for

[PATCH v5 2/5] dma-iommu: Add iommu_dma_opt_mapping_size()

2022-06-30 Thread John Garry via iommu
Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which allows the drivers to know the optimal mapping limit and thus limit the requested IOVA lengths. This value is based on the IOVA rcache range limit, as IOVAs allocated above this limit must always be newly allocated, which may

[PATCH v5 3/5] scsi: core: Cap shost max_sectors according to DMA limits only once

2022-06-30 Thread John Garry via iommu
The shost->max_sectors is repeatedly capped according to the host DMA mapping limit for each sdev in __scsi_init_queue(). This is unnecessary, so set only once when adding the host. Signed-off-by: John Garry --- drivers/scsi/hosts.c| 5 + drivers/scsi/scsi_lib.c | 4 2 files changed

[PATCH v5 4/5] scsi: scsi_transport_sas: Cap shost max_sectors according to DMA optimal limit

2022-06-30 Thread John Garry via iommu
Streaming DMA mappings may be considerably slower when mappings go through an IOMMU and the total mapping length is somewhat long. This is because the IOMMU IOVA code allocates and free an IOVA for each mapping, which may affect performance. For performance reasons set the request queue max_sector

[PATCH v5 5/5] ata: libata-scsi: Cap ata_device->max_sectors according to shost->max_sectors

2022-06-30 Thread John Garry via iommu
ATA devices (struct ata_device) have a max_sectors field which is configured internally in libata. This is then used to (re)configure the associated sdev request queue max_sectors value from how it is earlier set in __scsi_init_queue(). In __scsi_init_queue() the max_sectors value is set according

Re: [PATCH v5 3/5] scsi: core: Cap shost max_sectors according to DMA limits only once

2022-07-01 Thread John Garry via iommu
On 01/07/2022 00:41, Damien Le Moal wrote: shost->dma_dev = dma_dev; + if (dma_dev->dma_mask) { + shost->max_sectors = min_t(unsigned int, shost->max_sectors, + dma_max_mapping_size(dma_dev) >> SECTOR_SHIFT); + } Nit: you could remove th

Re: [PATCH v5 4/5] scsi: scsi_transport_sas: Cap shost max_sectors according to DMA optimal limit

2022-07-01 Thread John Garry via iommu
On 01/07/2022 00:49, Damien Le Moal wrote: + if (dma_dev) { + shost->max_sectors = min_t(unsigned int, shost->max_sectors, + dma_opt_mapping_size(dma_dev) >> SECTOR_SHIFT); + } Hi Damien, > Hmm... shost->max_sectors becomes the max_hw_sector

Re: [PATCH] iommu/iova: change IOVA_MAG_SIZE to 127 to save memory

2022-07-01 Thread John Garry via iommu
On 01/07/2022 04:56, Feng Tang wrote: inclination. ok, what you are saying sounds reasonable. I just remember that when we analyzed the longterm aging issue that we concluded that the FQ size and its relation to the magazine size was a factor and this change makes me a little worried about new

Re: [PATCH 0/9] iommu: Refactor flush queues into iommu-dma

2021-11-24 Thread John Garry via iommu
On 23/11/2021 14:10, Robin Murphy wrote: As promised, this series cleans up the flush queue code and streamlines it directly into iommu-dma. Since we no longer have per-driver DMA ops implementations, a lot of the abstraction is now no longer necessary, so there's a nice degree of simplification

Re: [PATCH 9/9] iommu: Move flush queue data into iommu_dma_cookie

2021-11-24 Thread John Garry via iommu
On 23/11/2021 14:10, Robin Murphy wrote: ruct iommu_dma_msi_page { struct list_headlist; @@ -41,7 +43,19 @@ struct iommu_dma_cookie { enum iommu_dma_cookie_type type; union { /* Full allocator for IOMMU_DMA_IOVA_COOKIE */ - struct

Re: [PATCH v2 3/3] perf/smmuv3: Synthesize IIDR from CoreSight ID registers

2021-12-07 Thread John Garry via iommu
On 17/11/2021 14:48, Jean-Philippe Brucker wrote: From: Robin Murphy The SMMU_PMCG_IIDR register was not present in older revisions of the Arm SMMUv3 spec. On Arm Ltd. implementations, the IIDR value consists of fields from several PIDR registers, allowing us to present a standardized identifier

[PATCH v2] iova: Move fast alloc size roundup into alloc_iova_fast()

2021-12-07 Thread John Garry via iommu
It really is a property of the IOVA rcache code that we need to alloc a power-of-2 size, so relocate the functionality to resize into alloc_iova_fast(), rather than the callsites. Signed-off-by: John Garry Acked-by: Will Deacon Reviewed-by: Xie Yongji Acked-by: Jason Wang Acked-by: Michael S.

Re: [PATCH v2 3/3] perf/smmuv3: Synthesize IIDR from CoreSight ID registers

2021-12-07 Thread John Garry via iommu
On 07/12/2021 12:04, Robin Murphy wrote: So is there some userspace part to go with this now? FWIW I've not looked into it - is it just a case of someone knocking out some JSON from the MMU-600/700 TRMs, or is there still mroe to do? That should just be it. I had the impression that *so

Re: [PATCH v2 3/3] perf/smmuv3: Synthesize IIDR from CoreSight ID registers

2021-12-07 Thread John Garry via iommu
On 07/12/2021 13:59, Leo Yan wrote: Whether other implementers might retroactively define "equivalent" IIDR values for their existing implementations in a way we could potentially quirk in the driver is an orthogonal question. Agreed, it makes sense that supports the standard IP modules in the m

Re: [PATCH v3 1/1] iommu/arm-smmu-v3: Simplify useless instructions in arm_smmu_cmdq_build_cmd()

2021-12-07 Thread John Garry via iommu
On 07/12/2021 09:41, Zhen Lei via iommu wrote: Although the parameter 'cmd' is always passed by a local array variable, and only this function modifies it, the compiler does not know this. Every time the 'cmd' variable is updated, a memory write operation is generated. This generates many useless

Re: [PATCH v3 1/1] iommu/arm-smmu-v3: Simplify useless instructions in arm_smmu_cmdq_build_cmd()

2021-12-08 Thread John Garry via iommu
Did you notice any performance change with this change? Hi John: Thanks for the tip. I wrote a test case today, and I found that the performance did not go up but down. I very quickly tested on a DMA mapping benchmark very similar to the kernel DMA benchmark module - I got mixed results. F

Re: [PATCH v2 01/11] iommu/iova: Fix race between FQ timeout and teardown

2021-12-10 Thread John Garry via iommu
On 10/12/2021 17:54, Robin Murphy wrote: From: Xiongfeng Wang It turns out to be possible for hotplugging out a device to reach the stage of tearing down the device's group and default domain before the domain's flush queue has drained naturally. At this point, it is then possible for the timeou

Re: [PATCH v2 01/11] iommu/iova: Fix race between FQ timeout and teardown

2021-12-10 Thread John Garry via iommu
On 10/12/2021 18:13, Robin Murphy wrote: possible for the timeout to expire just*before*  the del_timer() call super nit: "just*before*  the" - needs a whitespace before "before" :) Weird... the original patch file here and the copy received by lore via linux-iommu look fine, gremlins in you

Re: [PATCH v2 04/11] iommu/iova: Squash entry_dtor abstraction

2021-12-14 Thread John Garry via iommu
On 10/12/2021 17:54, Robin Murphy wrote: All flush queues are driven by iommu-dma now, so there is no need to abstract entry_dtor or its data any more. Squash the now-canonical implementation directly into the IOVA code to get it out of the way. Signed-off-by: Robin Murphy Seems pretty straigh

Re: [PATCH v2 05/11] iommu/iova: Squash flush_cb abstraction

2021-12-14 Thread John Garry via iommu
On 10/12/2021 17:54, Robin Murphy wrote: Once again, with iommu-dma now being the only flush queue user, we no longer need the extra level of indirection through flush_cb. Squash that and let the flush queue code call the domain method directly. Signed-off-by: Robin Murphy Again seems pretty s

Re: [PATCH v2 09/11] iommu/iova: Consolidate flush queue code

2021-12-14 Thread John Garry via iommu
On 10/12/2021 17:54, Robin Murphy wrote: Squash and simplify some of the freeing code, and move the init and free routines down into the rest of the flush queue code to obviate the forward declarations. It would be good to get rid of all of these eventually... Signed-off-by: Robin Murphy

Re: [PATCH v2 10/11] iommu/iova: Move flush queue code to iommu-dma

2021-12-14 Thread John Garry via iommu
On 10/12/2021 17:54, Robin Murphy wrote: Flush queues are specific to DMA ops, which are now handled exclusively by iommu-dma. As such, now that the historical artefacts from being shared directly with drivers have been cleaned up, move the flush queue code into iommu-dma itself to get it out of

Re: [PATCH v2 11/11] iommu: Move flush queue data into iommu_dma_cookie

2021-12-14 Thread John Garry via iommu
On 10/12/2021 17:54, Robin Murphy wrote: Complete the move into iommu-dma by refactoring the flush queues themselves to belong to the DMA cookie rather than the IOVA domain. The refactoring may as well extend to some minor cosmetic aspects too, to help us stay one step ahead of the style police.

Re: [PATCH v2 10/11] iommu/iova: Move flush queue code to iommu-dma

2021-12-14 Thread John Garry via iommu
On 10/12/2021 17:54, Robin Murphy wrote: + iovad->fq_domain = fq_domain; + iovad->fq = queue; + + timer_setup(&iovad->fq_timer, fq_flush_timeout, 0); + atomic_set(&iovad->fq_timer_on, 0); + + return 0; +} + + nit: a single blank line is standard, I think Cheers

Re: [PATCH 4/5] iommu: Separate IOVA rcache memories from iova_domain structure

2021-12-20 Thread John Garry via iommu
On 24/09/2021 11:01, John Garry wrote: Only dma-iommu.c and vdpa actually use the "fast" mode of IOVA alloc and free. As such, it's wasteful that all other IOVA domains hold the rcache memories. In addition, the current IOVA domain init implementation is poor (init_iova_domain()), in that errors

Re: [PATCH 4/5] iommu: Separate IOVA rcache memories from iova_domain structure

2021-12-22 Thread John Garry via iommu
On 20/12/2021 13:57, Robin Murphy wrote: Do you have any thoughts on this patch? The decision is whether we stick with a single iova domain structure or support this super structure for iova domains which support the rcache. I did not try the former - it would be do-able but I am not sure on ho

[PATCH] iommu/iova: Separate out rcache init

2022-01-26 Thread John Garry via iommu
Currently the rcache structures are allocated for all IOVA domains, even if they do not use "fast" alloc+free interface. This is wasteful of memory. In addition, fails in init_iova_rcaches() are not handled safely, which is less than ideal. Make "fast" users call a separate rcache init explicitly

Re: [PATCH] iommu/iova: Separate out rcache init

2022-01-26 Thread John Garry via iommu
Hi Robin, Signed-off-by: John Garry Mangled patch? (no "---" separator here) hmm... not sure. As an experiment, I just downloaded this patch from lore.kernel.org and it applies ok. Overall this looks great, just a few comments further down... ... +} +EXPORT_SYMBOL_GPL(iova_domain_i

[PATCH] iommu: Fix some W=1 warnings

2022-01-28 Thread John Garry via iommu
The code is mostly free of W=1 warning, so fix the following: drivers/iommu/iommu.c:996: warning: expecting prototype for iommu_group_for_each_dev(). Prototype was for __iommu_group_for_each_dev() instead drivers/iommu/iommu.c:3048: warning: Function parameter or member 'drvdata' not described

Re: [PATCH] iommu/iova: Separate out rcache init

2022-01-28 Thread John Garry via iommu
On 26/01/2022 17:00, Robin Murphy wrote: As above, I vote for just forward-declaring the free routine in iova.c and keeping it entirely private. BTW, speaking of forward declarations, it's possible to remove all the forward declarations in iova.c now that the FQ code is gone - but with a good

Re: [PATCH] iommu/amd: Fix loop timeout issue in iommu_ga_log_enable()

2022-01-31 Thread John Garry via iommu
On 31/01/2022 16:17, Joerg Roedel wrote: From: Joerg Roedel The polling loop for the register change in iommu_ga_log_enable() needs to have a udelay() in it. Otherwise the CPU might be faster than the IOMMU hardware and wrongly trigger the WARN_ON() further down the code stream. Fixes: 8bda0c

[PATCH v2] iommu/iova: Separate out rcache init

2022-02-03 Thread John Garry via iommu
Currently the rcache structures are allocated for all IOVA domains, even if they do not use "fast" alloc+free interface. This is wasteful of memory. In addition, fails in init_iova_rcaches() are not handled safely, which is less than ideal. Make "fast" users call a separate rcache init explicitly

Re: [PATCH v1 04/10] iommu/vt-d: Remove iova_cache_get/put()

2022-02-07 Thread John Garry via iommu
On 07/02/2022 06:41, Lu Baolu wrote: diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 583ec0fa4ac1..e8d58654361c 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -3348,9 +3348,6 @@ static inline int iommu_devinfo_cache_init(void) static i

Re: [PATCH v2] iommu/core: Remove comment reference to iommu_dev_has_feature

2022-02-07 Thread John Garry via iommu
On 07/02/2022 03:23, Akeem G Abodunrin wrote: iommu_dev_has_feature() api has been removed by the commit 262948f8ba573 ("iommu: Delete iommu_dev_has_feature()") - So this patch removes comment about the api to avoid any confusion. Signed-off-by: Akeem G Abodunrin Cc: Lu Baolu Reviewed-by: Chri

Re: [PATCH v3 1/8] hwtracing: Add trace function support for HiSilicon PCIe Tune and Trace device

2022-02-07 Thread John Garry via iommu
On 24/01/2022 13:11, Yicong Yang wrote: HiSilicon PCIe tune and trace device(PTT) is a PCIe Root Complex integrated Endpoint(RCiEP) device, providing the capability to dynamically monitor and tune the PCIe traffic, and trace the TLP headers. Add the driver for the device to enable the trace func

  1   2   >