[PATCH 1/3] iommu/ipmmu-vmsa: Document R-Car M3-W IPMMU DT bindings

2017-09-21 Thread Magnus Damm
From: Magnus Damm Update the IPMMU DT binding documentation to include the r8a7796 compat string for the IPMMU devices included in the R-Car M3-W SoC. Signed-off-by: Magnus Damm Acked-by: Laurent Pinchart Acked-by: Rob Herring Acked-by: Simon Horman Acked-by: Geert Uytterhoeven --- Docume

[PATCH 2/3] iommu/ipmmu-vmsa: Document R-Car V3M IPMMU DT bindings

2017-09-21 Thread Magnus Damm
From: Magnus Damm Update the IPMMU DT binding documentation to include the r8a77970 compat string for the IPMMU devices included in the R-Car V3M SoC. Signed-off-by: Magnus Damm --- Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt |1 + 1 file changed, 1 insertion(+) --- 00

[PATCH 3/3] iommu/ipmmu-vmsa: Document R-Car D3 IPMMU DT bindings

2017-09-21 Thread Magnus Damm
From: Magnus Damm Update the IPMMU DT binding documentation to include the r8a77995 compat string for the IPMMU devices included in the R-Car D3 SoC. Signed-off-by: Magnus Damm --- Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt |1 + 1 file changed, 1 insertion(+) --- 000

[PATCH 0/3] iommu/ipmmu-vmsa: R-Car Gen3 IPMMU DT binding update

2017-09-21 Thread Magnus Damm
iommu/ipmmu-vmsa: R-Car Gen3 IPMMU DT binding update [PATCH 1/3] iommu/ipmmu-vmsa: Document R-Car M3-W IPMMU DT bindings [PATCH 2/3] iommu/ipmmu-vmsa: Document R-Car V3M IPMMU DT bindings [PATCH 3/3] iommu/ipmmu-vmsa: Document R-Car D3 IPMMU DT bindings This series documents IPMMU DT bindings for

[PATCH v5 6/6] iommu/iova: Simplify cached node logic

2017-09-21 Thread Robin Murphy
The logic of __get_cached_rbnode() is a little obtuse, but then __get_prev_node_of_cached_rbnode_or_last_node_and_update_limit_pfn() wouldn't exactly roll off the tongue... Now that we have the invariant that there is always a valid node to start searching downwards from, everything gets a bit eas

[PATCH v5 4/6] iommu/iova: Make dma_32bit_pfn implicit

2017-09-21 Thread Robin Murphy
From: Zhen Lei Now that the cached node optimisation can apply to all allocations, the couple of users which were playing tricks with dma_32bit_pfn in order to benefit from it can stop doing so. Conversely, there is also no need for all the other users to explicitly calculate a 'real' 32-bit PFN,

[PATCH v5 5/6] iommu/iova: Add rbtree anchor node

2017-09-21 Thread Robin Murphy
Add a permanent dummy IOVA reservation to the rbtree, such that we can always access the top of the address space instantly. The immediate benefit is that we remove the overhead of the rb_last() traversal when not using the cached node, but it also paves the way for further simplifications. Signed

[PATCH v5 1/6] iommu/iova: Optimise rbtree searching

2017-09-21 Thread Robin Murphy
From: Zhen Lei Checking the IOVA bounds separately before deciding which direction to continue the search (if necessary) results in redundantly comparing both pfns twice each. GCC can already determine that the final comparison op is redundant and optimise it down to 3 in total, but we can go one

[PATCH v5 3/6] iommu/iova: Extend rbtree node caching

2017-09-21 Thread Robin Murphy
The cached node mechanism provides a significant performance benefit for allocations using a 32-bit DMA mask, but in the case of non-PCI devices or where the 32-bit space is full, the loss of this benefit can be significant - on large systems there can be many thousands of entries in the tree, such

[PATCH v5 2/6] iommu/iova: Optimise the padding calculation

2017-09-21 Thread Robin Murphy
From: Zhen Lei The mask for calculating the padding size doesn't change, so there's no need to recalculate it every loop iteration. Furthermore, Once we've done that, it becomes clear that we don't actually need to calculate a padding size at all - by flipping the arithmetic around, we can just c

[PATCH v5 0/6] Optimise 64-bit IOVA allocations

2017-09-21 Thread Robin Murphy
v4: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1493704.html Right, this is hopefully the last version - I've put things back in a sensible order with the new additions at the end, so if they prove contentious the first 4 previously-tested patches can still get their time in -next

[PATCH v2] iommu/iova: Use raw_cpu_ptr() instead of get_cpu_ptr() for ->fq

2017-09-21 Thread Sebastian Andrzej Siewior
get_cpu_ptr() disabled preemption and returns the ->fq object of the current CPU. raw_cpu_ptr() does the same except that it not disable preemption which means the scheduler can move it to another CPU after it obtained the per-CPU object. In this case this is not bad because the data structure itse

Re: [PATCH] iommu/amd: Use raw_cpu_ptr() instead of get_cpu_ptr() for ->flush_queue

2017-09-21 Thread Sebastian Andrzej Siewior
On 2017-09-11 22:22:11 [-0400], Vinod Adhikary wrote: > Dear all, Hi, > Thank you for the great community support and support from Sebastian to > provide me this patch. I wanted to send this email to inform you and > perhaps get some information on how I could keep myself updated on updates > in r

Re: [PATCH 3/4] iommu/arm-smmu-v3: Use NUMA memory allocations for stream tables and comamnd queues

2017-09-21 Thread Christoph Hellwig
On Thu, Sep 21, 2017 at 12:58:04PM +0100, Robin Murphy wrote: > Christoph, Marek; how reasonable do you think it is to expect > dma_alloc_coherent() to be inherently NUMA-aware on NUMA-capable > systems? SWIOTLB looks fairly straightforward to fix up (for the simple > allocation case; I'm not sure

[PATCH v2] iommu/arm-smmu-v3: Avoid ILLEGAL setting of STE.S1STALLD and CD.S

2017-09-21 Thread Yisheng Xie
According to Spec, it is ILLEGAL to set STE.S1STALLD if STALL_MODEL is not 0b00, which means we should not disable stall mode if stall or terminate mode is not configuable. Meanwhile, it is also ILLEGAL when STALL_MODEL==0b10 && CD.S==0 which means if stall mode is force we should always set CD.S.

Re: [PATCH 3/4] iommu/arm-smmu-v3: Use NUMA memory allocations for stream tables and comamnd queues

2017-09-21 Thread Robin Murphy
[+Christoph and Marek] On 21/09/17 09:59, Ganapatrao Kulkarni wrote: > Introduce smmu_alloc_coherent and smmu_free_coherent functions to > allocate/free dma coherent memory from NUMA node associated with SMMU. > Replace all calls of dmam_alloc_coherent with smmu_alloc_coherent > for SMMU stream ta

Re: [PATCH 4/4] iommu/dma, numa: Use NUMA aware memory allocations in __iommu_dma_alloc_pages

2017-09-21 Thread Robin Murphy
On 21/09/17 09:59, Ganapatrao Kulkarni wrote: > Change function __iommu_dma_alloc_pages to allocate memory/pages > for dma from respective device numa node. > > Signed-off-by: Ganapatrao Kulkarni > --- > drivers/iommu/dma-iommu.c | 17 ++--- > 1 file changed, 10 insertions(+), 7 dele

Re: [PATCH 2/4] numa, iommu/io-pgtable-arm: Use NUMA aware memory allocation for smmu translation tables

2017-09-21 Thread Robin Murphy
On 21/09/17 09:59, Ganapatrao Kulkarni wrote: > function __arm_lpae_alloc_pages is used to allcoated memory for smmu > translation tables. updating function to allocate memory/pages > from the proximity domain of SMMU device. AFAICS, data->pgd_size always works out to a power-of-two number of page

Re: [PATCH v2] iommu/of: Remove PCI host bridge node check

2017-09-21 Thread Jean-Philippe Brucker
On 21/09/17 11:20, Robin Murphy wrote: > of_pci_iommu_init() tries to be clever and stop its alias walk at the > device represented by master_np, in case of weird PCI topologies where > the bridge to the IOMMU and the rest of the system is not at the root. > It turns out this is a bit short-sighted

Re: [RFC] virtio-iommu version 0.4

2017-09-21 Thread Jean-Philippe Brucker
On 20/09/17 10:37, Auger Eric wrote: > Hi Jean, > On 19/09/2017 12:47, Jean-Philippe Brucker wrote: >> Hi Eric, >> >> On 12/09/17 18:13, Auger Eric wrote: >>> 2.6.7 >>> - As I am currently integrating v0.4 in QEMU here are some other comments: >>> At the moment struct virtio_iommu_req_probe flags i

[PATCH v2] iommu/of: Remove PCI host bridge node check

2017-09-21 Thread Robin Murphy
of_pci_iommu_init() tries to be clever and stop its alias walk at the device represented by master_np, in case of weird PCI topologies where the bridge to the IOMMU and the rest of the system is not at the root. It turns out this is a bit short-sighted, since there are plenty of other callers of pc

[PATCH] iommu/of: Remove PCI host bridge node check

2017-09-21 Thread Robin Murphy
of_pci_iommu_init() tries to be clever and stop its alias walk at the device represented by master_np, in case of weird PCI topologies where the bridge to the IOMMU and the rest of the system is not at the root. It turns out this is a bit short-sighted, since there are plenty of other callers of pc

[PATCH 3/4] iommu/arm-smmu-v3: Use NUMA memory allocations for stream tables and comamnd queues

2017-09-21 Thread Ganapatrao Kulkarni
Introduce smmu_alloc_coherent and smmu_free_coherent functions to allocate/free dma coherent memory from NUMA node associated with SMMU. Replace all calls of dmam_alloc_coherent with smmu_alloc_coherent for SMMU stream tables and command queues. Signed-off-by: Ganapatrao Kulkarni --- drivers/iom

[PATCH 4/4] iommu/dma, numa: Use NUMA aware memory allocations in __iommu_dma_alloc_pages

2017-09-21 Thread Ganapatrao Kulkarni
Change function __iommu_dma_alloc_pages to allocate memory/pages for dma from respective device numa node. Signed-off-by: Ganapatrao Kulkarni --- drivers/iommu/dma-iommu.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/i

[PATCH 0/4] numa, iommu/smmu: IOMMU/SMMU driver optimization for NUMA systems

2017-09-21 Thread Ganapatrao Kulkarni
Adding numa aware memory allocations used for iommu dma allocation and memory allocated for SMMU stream tables, page walk tables and command queues. With this patch, iperf testing on ThunderX2, with 40G NIC card on NODE 1 PCI shown same performance(around 30% improvement) as NODE 0. Ganapatrao Ku

[PATCH 2/4] numa, iommu/io-pgtable-arm: Use NUMA aware memory allocation for smmu translation tables

2017-09-21 Thread Ganapatrao Kulkarni
function __arm_lpae_alloc_pages is used to allcoated memory for smmu translation tables. updating function to allocate memory/pages from the proximity domain of SMMU device. Signed-off-by: Ganapatrao Kulkarni --- drivers/iommu/io-pgtable-arm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletio

[PATCH 1/4] mm: move function alloc_pages_exact_nid out of __meminit

2017-09-21 Thread Ganapatrao Kulkarni
This function can be used on NUMA systems in place of alloc_pages_exact Adding code to export and to remove __meminit section tagging. Signed-off-by: Ganapatrao Kulkarni --- include/linux/gfp.h | 2 +- mm/page_alloc.c | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/in