For streaming DMA mappings involving an IOMMU and whose IOVA len regularly
exceeds the IOVA rcache upper limit (meaning that they are not cached),
performance can be reduced.
This may be much more pronounced from commit 4e89dce72521 ("iommu/iova:
Retry from last rb tree node if iova search fails"
Function iommu_group_store_type() supports changing the default domain
of an IOMMU group.
Many conditions need to be satisfied and steps taken for this action to be
successful.
Satisfying these conditions and steps will be required for setting other
IOMMU group attributes, so factor into a common
Some low-level drivers may request DMA mappings whose IOVA length exceeds
that of the current rcache upper limit.
This means that allocations for those IOVAs will never be cached, and
always must be allocated and freed from the RB tree per DMA mapping cycle.
This has a significant effect on perfor
Allow iommu_change_dev_def_domain() to create a new default domain, keeping
the same as current.
Also remove comment about the function purpose, which will become stale.
Signed-off-by: John Garry
---
drivers/iommu/iommu.c | 49 ++-
include/linux/iommu.h |
Add support to allow the maximum optimised DMA len be set for an IOMMU
group via sysfs.
This is much the same with the method to change the default domain type
for a group.
Signed-off-by: John Garry
---
.../ABI/testing/sysfs-kernel-iommu_groups | 16 +
drivers/iommu/iommu.c
Add max opt argument to iova_domain_init_rcaches(), and use it to set the
rcaches range.
Also fix up all users to set this value (at 0, meaning use default),
including a wrapper for that, iova_domain_init_rcaches_default().
For dma-iommu.c we derive the iova_len argument from the IOMMU group
max
On 07/04/2022 09:27, Leizhen (ThunderTown) wrote:
Thanks for having a look
On 2022/4/4 19:27, John Garry wrote:
Add max opt argument to iova_domain_init_rcaches(), and use it to set the
rcaches range.
Also fix up all users to set this value (at 0, meaning use default),
including a wra
On 07/04/2022 09:21, Leizhen (ThunderTown) wrote:
On 2022/4/4 19:27, John Garry wrote:
Add support to allow the maximum optimised DMA len be set for an IOMMU
group via sysfs.
This is much the same with the method to change the default domain type
for a group.
Signed-off-by: John Garry
---
On 07/04/2022 13:58, Yicong Yang wrote:
HiSilicon PCIe tune and trace device(PTT) is a PCIe Root Complex integrated
Endpoint(RCiEP) device, providing the capability to dynamically monitor and
tune the PCIe traffic, and trace the TLP headers.
Add the driver for the device to enable the trace func
On 07/04/2022 13:58, Yicong Yang wrote:
From: Qi Liu
'perf record' and 'perf report --dump-raw-trace' supported in this
patch.
Example usage:
Output will contain raw PTT data and its textual representation, such
as:
0 0 0x5810 [0x30]: PERF_RECORD_AUXTRACE size: 0x40 offset: 0
ref: 0xa5d
+static int hisi_ptt_alloc_trace_buf(struct hisi_ptt *hisi_ptt)
+{
+ struct hisi_ptt_trace_ctrl *ctrl = &hisi_ptt->trace_ctrl;
+ struct device *dev = &hisi_ptt->pdev->dev;
+ int i;
+
+ hisi_ptt->trace_ctrl.buf_index = 0;
+
+ /* If the trace buffer has already been allocated, zero it
On 12/04/2022 08:41, Yicong Yang wrote:
+ hisi_ptt_pmus = zalloc(sizeof(struct perf_pmu *) * (*nr_ptts));
+ if (!hisi_ptt_pmus) {
+ pr_err("hisi_ptt alloc failed\n");
+ *err = -ENOMEM;
using PTR_ERR seems better, if possible
ok will change to that. *err = -ENOMEM is used he
On 07/04/2022 13:58, Yicong Yang wrote:
The DMA operations of HiSilicon PTT device can only work properly with
identical mappings. So add a quirk for the device to force the domain
I'm not sure if you meant to write "identity mappings".
as passthrough.
Signed-off-by: Yicong Yang
FWIW,
Re
For streaming DMA mappings involving an IOMMU and whose IOVA len regularly
exceeds the IOVA rcache upper limit (meaning that they are not cached),
performance can be reduced.
Add the IOMMU callback for DMA mapping API dma_max_mapping_size(), which
allows the drivers to know the mapping limit and t
On 16/05/2022 13:52, Yicong Yang wrote:
HiSilicon PCIe tune and trace device(PTT) is a PCIe Root Complex integrated
Endpoint(RCiEP) device, providing the capability to dynamically monitor and
tune the PCIe traffic and trace the TLP headers.
Add the driver for the device to enable the trace funct
On 16/05/2022 13:52, Yicong Yang wrote:
Add tune function for the HiSilicon Tune and Trace device. The interface
of tune is exposed through sysfs attributes of PTT PMU device.
Signed-off-by: Yicong Yang
Reviewed-by: Jonathan Cameron
Apart from a comment on preferential style:
Reviewed-by: J
On 16/05/2022 13:52, Yicong Yang wrote:
As requested before, please mention "perf tool" in the commit subject
From: Qi Liu
Use find_pmu_for_event() to simplify logic in auxtrace_record__init().
Signed-off-by: Qi Liu
Signed-off-by: Yicong Yang
---
tools/perf/arch/arm/util/auxtrace.c | 53
On 17/05/2022 09:09, Yicong Yang wrote:
+ target = cpumask_any(cpumask_of_node(dev_to_node(&hisi_ptt->pdev->dev)));
+ if (target < nr_cpumask_bits) {
the comment for cpumask_any() hints to check against nr_cpu_ids - any specific
reason to check against nr_cpumask_bits?
here should be:
On 17/05/2022 09:38, Christoph Hellwig wrote:
On Mon, May 16, 2022 at 09:06:01PM +0800, John Garry wrote:
For streaming DMA mappings involving an IOMMU and whose IOVA len regularly
exceeds the IOVA rcache upper limit (meaning that they are not cached),
performance can be reduced.
Add the IOMMU
On 17/05/2022 11:40, Robin Murphy wrote:
On 2022-05-16 14:06, John Garry wrote:
For streaming DMA mappings involving an IOMMU and whose IOVA len
regularly
exceeds the IOVA rcache upper limit (meaning that they are not cached),
performance can be reduced.
Add the IOMMU callback for DMA mapping
On 17/05/2022 13:02, Robin Murphy wrote:
Indeed, sorry but NAK for this being nonsense. As I've said at least
once before, if the unnecessary SAC address allocation attempt slows
down your workload, make it not do that in the first place. If you
don't like the existing command-line parameter
On 18/05/2022 18:36, Robin Murphy wrote:
For devices stuck behind a conventional PCI bus, saving extra cycles at
33MHz is probably fairly significant. However since native PCI Express
is now the norm for high-performance devices, the optimisation to always
prefer 32-bit addresses for the sake of
As reported in [0], DMA mappings whose size exceeds the IOMMU IOVA caching
limit may see a big performance hit.
This series introduces a new DMA mapping API, dma_opt_mapping_size(), so
that drivers may know this limit when performance is a factor in the
mapping.
Robin didn't like using dma_max_ma
Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which
allows the drivers to know the optimal mapping limit and thus limit the
requested IOVA lengths.
This value is based on the IOVA rcache range limit, as IOVAs allocated
above this limit must always be newly allocated, which may
Streaming DMA mapping involving an IOMMU may be much slower for larger
total mapping size. This is because every IOMMU DMA mapping requires an
IOVA to be allocated and freed. IOVA sizes above a certain limit are not
cached, which can have a big impact on DMA mapping performance.
Provide an API for
Streaming DMA mappings may be considerably slower when mappings go through
an IOMMU and the total mapping length is somewhat long. This is because the
IOMMU IOVA code allocates and free an IOVA for each mapping, which may
affect performance.
For performance reasons set the request_queue max_sector
ATA devices (struct ata_device) have a max_sectors field which is
configured internally in libata. This is then used to (re)configure the
associated sdev request queue max_sectors value from how it is earlier set
in __scsi_init_queue(). In __scsi_init_queue() the max_sectors value is set
according
On 21/05/2022 00:30, Damien Le Moal wrote:
diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index f69b77cbf538..a3ae6345473b 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -225,6 +225,11 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct
device *dev,
s
On 21/05/2022 00:33, Damien Le Moal wrote:
Hi Damien,
+unsigned long iova_rcache_range(void)
Why not a size_t return type ?
The IOVA code generally uses unsigned long for size/range while
dam-iommu uses size_t as appropiate, so I'm just sticking to that.
+{
+ return PAGE_SIZE <<
On 23/05/2022 12:08, Dan Carpenter wrote:
Thanks for the report
50b6cb3516365c Dexuan Cui2021-10-07 224/* Use min_t(int, ...) in
case shost->can_queue exceeds SHRT_MAX */
50b6cb3516365c Dexuan Cui2021-10-07 225shost->cmd_per_lun =
min_t(int, shost->cmd_per_lu
On 22/05/2022 23:22, Damien Le Moal wrote:
On 2022/05/22 22:13, Christoph Hellwig wrote:
The whole series looks fine to me. I'll happily queue it up in the
dma-mapping tree if the SCSI and ATA maintainers are ok with that.
Fine with me. I sent an acked-by for the libata bit.
Thanks, I'm g
As reported in [0], DMA mappings whose size exceeds the IOMMU IOVA caching
limit may see a big performance hit.
This series introduces a new DMA mapping API, dma_opt_mapping_size(), so
that drivers may know this limit when performance is a factor in the
mapping.
Robin didn't like using dma_max_ma
Streaming DMA mapping involving an IOMMU may be much slower for larger
total mapping size. This is because every IOMMU DMA mapping requires an
IOVA to be allocated and freed. IOVA sizes above a certain limit are not
cached, which can have a big impact on DMA mapping performance.
Provide an API for
Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which
allows the drivers to know the optimal mapping limit and thus limit the
requested IOVA lengths.
This value is based on the IOVA rcache range limit, as IOVAs allocated
above this limit must always be newly allocated, which may
Streaming DMA mappings may be considerably slower when mappings go through
an IOMMU and the total mapping length is somewhat long. This is because the
IOMMU IOVA code allocates and free an IOVA for each mapping, which may
affect performance.
For performance reasons set the request_queue max_sector
ATA devices (struct ata_device) have a max_sectors field which is
configured internally in libata. This is then used to (re)configure the
associated sdev request queue max_sectors value from how it is earlier set
in __scsi_init_queue(). In __scsi_init_queue() the max_sectors value is set
according
Streaming DMA mapping involving an IOMMU may be much slower for larger
total mapping size. This is because every IOMMU DMA mapping requires an
IOVA to be allocated and freed. IOVA sizes above a certain limit are not
cached, which can have a big impact on DMA mapping performance.
Provide an API for
As reported in [0], DMA mappings whose size exceeds the IOMMU IOVA caching
limit may see a big performance hit.
This series introduces a new DMA mapping API, dma_opt_mapping_size(), so
that drivers may know this limit when performance is a factor in the
mapping.
Robin didn't like using dma_max_ma
Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which
allows the drivers to know the optimal mapping limit and thus limit the
requested IOVA lengths.
This value is based on the IOVA rcache range limit, as IOVAs allocated
above this limit must always be newly allocated, which may
Streaming DMA mappings may be considerably slower when mappings go through
an IOMMU and the total mapping length is somewhat long. This is because the
IOMMU IOVA code allocates and free an IOVA for each mapping, which may
affect performance.
For performance reasons set the request_queue max_sector
ATA devices (struct ata_device) have a max_sectors field which is
configured internally in libata. This is then used to (re)configure the
associated sdev request queue max_sectors value from how it is earlier set
in __scsi_init_queue(). In __scsi_init_queue() the max_sectors value is set
according
On 07/06/2022 23:43, Bart Van Assche wrote:
On 6/6/22 02:30, John Garry wrote:
As reported in [0], DMA mappings whose size exceeds the IOMMU IOVA
caching
limit may see a big performance hit.
This series introduces a new DMA mapping API, dma_opt_mapping_size(), so
that drivers may know this lim
On 08/06/2022 18:26, Bart Van Assche wrote:
On 6/6/22 02:30, John Garry via iommu wrote:
+unsigned long iova_rcache_range(void)
+{
+ return PAGE_SIZE << (IOVA_RANGE_CACHE_MAX_SIZE - 1);
+}
My understanding is that iova cache entries may be smaller than
IOVA_RANGE_CACHE_MAX_SIZE and
On 08/06/2022 18:33, Bart Van Assche wrote:
On 6/6/22 02:30, John Garry wrote:
+ if (dma_dev->dma_mask) {
+ shost->max_sectors = min_t(unsigned int, shost->max_sectors,
+ dma_opt_mapping_size(dma_dev) >> SECTOR_SHIFT);
+ }
Since IOVA_RANGE_CACHE_MAX_SIZE = 6 this li
On 08/06/2022 22:07, Bart Van Assche wrote:
On 6/8/22 10:50, John Garry wrote:
Please note that this limit only applies if we have an IOMMU enabled
for the scsi host dma device. Otherwise we are limited by dma direct
or swiotlb max mapping size, as before.
SCSI host bus adapters that support
On 09/06/2022 18:18, Bart Van Assche wrote:
SCSI host bus adapters that support 64-bit DMA may support much
larger transfer sizes than 128 KiB.
Indeed, and that is my problem today, as my storage controller is
generating DMA mapping lengths which exceeds 128K and they slow
everything down.
On 09/06/2022 16:12, Robin Murphy wrote:
For devices stuck behind a conventional PCI bus, saving extra cycles at
33MHz is probably fairly significant. However since native PCI Express
is now the norm for high-performance devices, the optimisation to always
prefer 32-bit addresses for the sake of
On 09/06/2022 21:34, Bart Van Assche wrote:
On 6/9/22 10:54, John Garry wrote:
ok, but do you have a system where the UFS host controller is behind
an IOMMU? I had the impression that UFS controllers would be mostly
found in embedded systems and IOMMUs are not as common on there.
Modern phone
On 06/06/2022 10:30, John Garry wrote:
Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which
allows the drivers to know the optimal mapping limit and thus limit the
requested IOVA lengths.
This value is based on the IOVA rcache range limit, as IOVAs allocated
above this limit
On 10/06/2022 16:37, John Garry via iommu wrote:
On 6/9/22 10:54, John Garry wrote:
ok, but do you have a system where the UFS host controller is behind
an IOMMU? I had the impression that UFS controllers would be mostly
found in embedded systems and IOMMUs are not as common on there
On 14/06/2022 14:12, John Garry wrote:
On 06/06/2022 10:30, John Garry wrote:
Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which
allows the drivers to know the optimal mapping limit and thus limit the
requested IOVA lengths.
This value is based on the IOVA rcache range lim
Streaming DMA mapping involving an IOMMU may be much slower for larger
total mapping size. This is because every IOMMU DMA mapping requires an
IOVA to be allocated and freed. IOVA sizes above a certain limit are not
cached, which can have a big impact on DMA mapping performance.
Provide an API for
As reported in [0], DMA mappings whose size exceeds the IOMMU IOVA caching
limit may see a big performance hit.
This series introduces a new DMA mapping API, dma_opt_mapping_size(), so
that drivers may know this limit when performance is a factor in the
mapping.
The SCSI SAS transport code is mod
The shost->max_sectors is repeatedly capped according to the host DMA
mapping limit for each sdev in __scsi_init_queue(). This is unnecessary, so
set only once when adding the host.
Signed-off-by: John Garry
---
drivers/scsi/hosts.c| 5 +
drivers/scsi/scsi_lib.c | 4
2 files changed
Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which
allows the drivers to know the optimal mapping limit and thus limit the
requested IOVA lengths.
This value is based on the IOVA rcache range limit, as IOVAs allocated
above this limit must always be newly allocated, which may
Streaming DMA mappings may be considerably slower when mappings go through
an IOMMU and the total mapping length is somewhat long. This is because the
IOMMU IOVA code allocates and free an IOVA for each mapping, which may
affect performance.
For performance reasons set the request queue max_sector
ATA devices (struct ata_device) have a max_sectors field which is
configured internally in libata. This is then used to (re)configure the
associated sdev request queue max_sectors value from how it is earlier set
in __scsi_init_queue(). In __scsi_init_queue() the max_sectors value is set
according
On 28/06/2022 00:24, Damien Le Moal wrote:
On 6/28/22 00:25, John Garry wrote:
ATA devices (struct ata_device) have a max_sectors field which is
configured internally in libata. This is then used to (re)configure the
associated sdev request queue max_sectors value from how it is earlier set
in _
On 28/06/2022 12:23, Robin Murphy wrote:
+
+ size_t
+ dma_opt_mapping_size(struct device *dev);
+
+Returns the maximum optimal size of a mapping for the device. Mapping
large
+buffers may take longer so device drivers are advised to limit total DMA
+streaming mappings length to the return
On 28/06/2022 10:14, Damien Le Moal wrote:
BTW, this patch has no real dependency on the rest of the series, so
could be taken separately if you prefer.
Sure, you can send it separately. Adding it through the scsi tree is fine too.
Well Christoph originally offered to take this series via the
On 29/06/2022 06:58, Damien Le Moal wrote:
On 6/29/22 14:40, Christoph Hellwig wrote:
On Tue, Jun 28, 2022 at 12:33:58PM +0100, John Garry wrote:
Well Christoph originally offered to take this series via the dma-mapping
tree.
@Christoph, is that still ok with you? If so, would you rather I sen
On 28/06/2022 12:27, John Garry via iommu wrote:
On 28/06/2022 12:23, Robin Murphy wrote:
+
+ size_t
+ dma_opt_mapping_size(struct device *dev);
+
+Returns the maximum optimal size of a mapping for the device.
Mapping large
+buffers may take longer so device drivers are advised to limit
On 30/06/2022 10:02, Robin Murphy wrote:
On 2022-06-30 08:33, Feng Tang wrote:
kmalloc will round up the request size to power of 2, and current
iova_magazine's size is 1032 (1024+8) bytes, so each instance
allocated will get 2048 bytes from kmalloc, causing around 1KB
waste.
And in some exstre
[ 4.319253] iommu: Adding device :06:00.2 to group 5
[ 4.325869] iommu: Adding device :20:01.0 to group 15
[ 4.332648] iommu: Adding device :20:02.0 to group 16
[ 4.338946] swapper/0 invoked oom-killer:
gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null
As reported in [0], DMA mappings whose size exceeds the IOMMU IOVA caching
limit may see a big performance hit.
This series introduces a new DMA mapping API, dma_opt_mapping_size(), so
that drivers may know this limit when performance is a factor in the
mapping.
The SCSI SAS transport code is mod
Streaming DMA mapping involving an IOMMU may be much slower for larger
total mapping size. This is because every IOMMU DMA mapping requires an
IOVA to be allocated and freed. IOVA sizes above a certain limit are not
cached, which can have a big impact on DMA mapping performance.
Provide an API for
Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which
allows the drivers to know the optimal mapping limit and thus limit the
requested IOVA lengths.
This value is based on the IOVA rcache range limit, as IOVAs allocated
above this limit must always be newly allocated, which may
The shost->max_sectors is repeatedly capped according to the host DMA
mapping limit for each sdev in __scsi_init_queue(). This is unnecessary, so
set only once when adding the host.
Signed-off-by: John Garry
---
drivers/scsi/hosts.c| 5 +
drivers/scsi/scsi_lib.c | 4
2 files changed
Streaming DMA mappings may be considerably slower when mappings go through
an IOMMU and the total mapping length is somewhat long. This is because the
IOMMU IOVA code allocates and free an IOVA for each mapping, which may
affect performance.
For performance reasons set the request queue max_sector
ATA devices (struct ata_device) have a max_sectors field which is
configured internally in libata. This is then used to (re)configure the
associated sdev request queue max_sectors value from how it is earlier set
in __scsi_init_queue(). In __scsi_init_queue() the max_sectors value is set
according
On 01/07/2022 00:41, Damien Le Moal wrote:
shost->dma_dev = dma_dev;
+ if (dma_dev->dma_mask) {
+ shost->max_sectors = min_t(unsigned int, shost->max_sectors,
+ dma_max_mapping_size(dma_dev) >> SECTOR_SHIFT);
+ }
Nit: you could remove th
On 01/07/2022 00:49, Damien Le Moal wrote:
+ if (dma_dev) {
+ shost->max_sectors = min_t(unsigned int, shost->max_sectors,
+ dma_opt_mapping_size(dma_dev) >> SECTOR_SHIFT);
+ }
Hi Damien,
> Hmm... shost->max_sectors becomes the max_hw_sector
On 01/07/2022 04:56, Feng Tang wrote:
inclination.
ok, what you are saying sounds reasonable. I just remember that when we
analyzed the longterm aging issue that we concluded that the FQ size and its
relation to the magazine size was a factor and this change makes me a little
worried about new
On 23/11/2021 14:10, Robin Murphy wrote:
As promised, this series cleans up the flush queue code and streamlines
it directly into iommu-dma. Since we no longer have per-driver DMA ops
implementations, a lot of the abstraction is now no longer necessary, so
there's a nice degree of simplification
On 23/11/2021 14:10, Robin Murphy wrote:
ruct iommu_dma_msi_page {
struct list_headlist;
@@ -41,7 +43,19 @@ struct iommu_dma_cookie {
enum iommu_dma_cookie_type type;
union {
/* Full allocator for IOMMU_DMA_IOVA_COOKIE */
- struct
On 17/11/2021 14:48, Jean-Philippe Brucker wrote:
From: Robin Murphy
The SMMU_PMCG_IIDR register was not present in older revisions of the
Arm SMMUv3 spec. On Arm Ltd. implementations, the IIDR value consists of
fields from several PIDR registers, allowing us to present a
standardized identifier
It really is a property of the IOVA rcache code that we need to alloc a
power-of-2 size, so relocate the functionality to resize into
alloc_iova_fast(), rather than the callsites.
Signed-off-by: John Garry
Acked-by: Will Deacon
Reviewed-by: Xie Yongji
Acked-by: Jason Wang
Acked-by: Michael S.
On 07/12/2021 12:04, Robin Murphy wrote:
So is there some userspace part to go with this now?
FWIW I've not looked into it - is it just a case of someone knocking out
some JSON from the MMU-600/700 TRMs, or is there still mroe to do?
That should just be it.
I had
the impression that *so
On 07/12/2021 13:59, Leo Yan wrote:
Whether other implementers might retroactively define "equivalent" IIDR
values for their existing implementations in a way we could potentially
quirk in the driver is an orthogonal question.
Agreed, it makes sense that supports the standard IP modules in
the m
On 07/12/2021 09:41, Zhen Lei via iommu wrote:
Although the parameter 'cmd' is always passed by a local array variable,
and only this function modifies it, the compiler does not know this. Every
time the 'cmd' variable is updated, a memory write operation is generated.
This generates many useless
Did you notice any performance change with this change?
Hi John:
Thanks for the tip. I wrote a test case today, and I found that the
performance did not go up but down.
I very quickly tested on a DMA mapping benchmark very similar to the
kernel DMA benchmark module - I got mixed results. F
On 10/12/2021 17:54, Robin Murphy wrote:
From: Xiongfeng Wang
It turns out to be possible for hotplugging out a device to reach the
stage of tearing down the device's group and default domain before the
domain's flush queue has drained naturally. At this point, it is then
possible for the timeou
On 10/12/2021 18:13, Robin Murphy wrote:
possible for the timeout to expire just*before* the del_timer() call
super nit: "just*before* the" - needs a whitespace before "before" :)
Weird... the original patch file here and the copy received by lore via
linux-iommu look fine, gremlins in you
On 10/12/2021 17:54, Robin Murphy wrote:
All flush queues are driven by iommu-dma now, so there is no need to
abstract entry_dtor or its data any more. Squash the now-canonical
implementation directly into the IOVA code to get it out of the way.
Signed-off-by: Robin Murphy
Seems pretty straigh
On 10/12/2021 17:54, Robin Murphy wrote:
Once again, with iommu-dma now being the only flush queue user, we no
longer need the extra level of indirection through flush_cb. Squash that
and let the flush queue code call the domain method directly.
Signed-off-by: Robin Murphy
Again seems pretty s
On 10/12/2021 17:54, Robin Murphy wrote:
Squash and simplify some of the freeing code, and move the init
and free routines down into the rest of the flush queue code to
obviate the forward declarations.
It would be good to get rid of all of these eventually...
Signed-off-by: Robin Murphy
On 10/12/2021 17:54, Robin Murphy wrote:
Flush queues are specific to DMA ops, which are now handled exclusively
by iommu-dma. As such, now that the historical artefacts from being
shared directly with drivers have been cleaned up, move the flush queue
code into iommu-dma itself to get it out of
On 10/12/2021 17:54, Robin Murphy wrote:
Complete the move into iommu-dma by refactoring the flush queues
themselves to belong to the DMA cookie rather than the IOVA domain.
The refactoring may as well extend to some minor cosmetic aspects
too, to help us stay one step ahead of the style police.
On 10/12/2021 17:54, Robin Murphy wrote:
+ iovad->fq_domain = fq_domain;
+ iovad->fq = queue;
+
+ timer_setup(&iovad->fq_timer, fq_flush_timeout, 0);
+ atomic_set(&iovad->fq_timer_on, 0);
+
+ return 0;
+}
+
+
nit: a single blank line is standard, I think
Cheers
On 24/09/2021 11:01, John Garry wrote:
Only dma-iommu.c and vdpa actually use the "fast" mode of IOVA alloc and
free. As such, it's wasteful that all other IOVA domains hold the rcache
memories.
In addition, the current IOVA domain init implementation is poor
(init_iova_domain()), in that errors
On 20/12/2021 13:57, Robin Murphy wrote:
Do you have any thoughts on this patch? The decision is whether we
stick with a single iova domain structure or support this super
structure for iova domains which support the rcache. I did not try the
former - it would be do-able but I am not sure on ho
Currently the rcache structures are allocated for all IOVA domains, even if
they do not use "fast" alloc+free interface. This is wasteful of memory.
In addition, fails in init_iova_rcaches() are not handled safely, which is
less than ideal.
Make "fast" users call a separate rcache init explicitly
Hi Robin,
Signed-off-by: John Garry
Mangled patch? (no "---" separator here)
hmm... not sure. As an experiment, I just downloaded this patch from
lore.kernel.org and it applies ok.
Overall this looks great, just a few comments further down...
...
+}
+EXPORT_SYMBOL_GPL(iova_domain_i
The code is mostly free of W=1 warning, so fix the following:
drivers/iommu/iommu.c:996: warning: expecting prototype for
iommu_group_for_each_dev(). Prototype was for __iommu_group_for_each_dev()
instead
drivers/iommu/iommu.c:3048: warning: Function parameter or member 'drvdata' not
described
On 26/01/2022 17:00, Robin Murphy wrote:
As above, I vote for just forward-declaring the free routine in iova.c
and keeping it entirely private.
BTW, speaking of forward declarations, it's possible to remove all the
forward declarations in iova.c now that the FQ code is gone - but with a
good
On 31/01/2022 16:17, Joerg Roedel wrote:
From: Joerg Roedel
The polling loop for the register change in iommu_ga_log_enable() needs
to have a udelay() in it. Otherwise the CPU might be faster than the
IOMMU hardware and wrongly trigger the WARN_ON() further down the code
stream.
Fixes: 8bda0c
Currently the rcache structures are allocated for all IOVA domains, even if
they do not use "fast" alloc+free interface. This is wasteful of memory.
In addition, fails in init_iova_rcaches() are not handled safely, which is
less than ideal.
Make "fast" users call a separate rcache init explicitly
On 07/02/2022 06:41, Lu Baolu wrote:
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 583ec0fa4ac1..e8d58654361c 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3348,9 +3348,6 @@ static inline int iommu_devinfo_cache_init(void)
static i
On 07/02/2022 03:23, Akeem G Abodunrin wrote:
iommu_dev_has_feature() api has been removed by the commit 262948f8ba573
("iommu: Delete iommu_dev_has_feature()") - So this patch removes comment
about the api to avoid any confusion.
Signed-off-by: Akeem G Abodunrin
Cc: Lu Baolu
Reviewed-by: Chri
On 24/01/2022 13:11, Yicong Yang wrote:
HiSilicon PCIe tune and trace device(PTT) is a PCIe Root Complex
integrated Endpoint(RCiEP) device, providing the capability
to dynamically monitor and tune the PCIe traffic, and trace
the TLP headers.
Add the driver for the device to enable the trace func
1 - 100 of 121 matches
Mail list logo