Re: [RFC] arm64: Ensure proper addressing for ldnp/stnp

2016-09-19 Thread Robin Murphy
On 19/09/16 18:36, Brent DeGraaf wrote:
> According to section 6.3.8 of the ARM Programmer's Guide, non-temporal
> loads and stores do not verify that address dependency is met between a
> load of an address to a register and a subsequent non-temporal load or
> store using that address on the executing PE. Therefore, context switch
> code and subroutine calls that use non-temporally accessed addresses as
> parameters that might depend on a load of an address into an argument
> register must ensure that ordering requirements are met by introducing
> a barrier prior to the successive non-temporal access.  Add appropriate
> barriers whereever this specific situation comes into play.
> 
> Signed-off-by: Brent DeGraaf 
> ---
>  arch/arm64/kernel/entry.S  | 1 +
>  arch/arm64/lib/copy_page.S | 2 ++
>  2 files changed, 3 insertions(+)
> 
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index 441420c..982c4d3 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -679,6 +679,7 @@ ENTRY(cpu_switch_to)
>   ldp x27, x28, [x8], #16
>   ldp x29, x9, [x8], #16
>   ldr lr, [x8]
> + dmb nshld   // Existence of instructions with loose load-use 
> dependencies (e.g. ldnp/stnp) make this barrier necessary
>   mov sp, x9
>   and x9, x9, #~(THREAD_SIZE - 1)
>   msr sp_el0, x9
> diff --git a/arch/arm64/lib/copy_page.S b/arch/arm64/lib/copy_page.S
> index 4c1e700..21c6892 100644
> --- a/arch/arm64/lib/copy_page.S
> +++ b/arch/arm64/lib/copy_page.S
> @@ -47,6 +47,8 @@ alternative_endif
>   ldp x14, x15, [x1, #96]
>   ldp x16, x17, [x1, #112]
>  
> + dmb nshld // In case x0 (for stnp) is dependent on a load

The ARMv8 ARM (B2.7.2 in issue j) says that when an address dependency
exists between a load and a subsequent LDNP, *other* observers may
observe those accesses in any order. How's that related to an STNP on
the same CPU?

Robin.

> +
>   mov x18, #(PAGE_SIZE - 128)
>   add x1, x1, #128
>  1:
> 



Re: [RFC] arm64: Ensure proper addressing for ldnp/stnp

2016-09-20 Thread Robin Murphy
On 19/09/16 19:25, bdegr...@codeaurora.org wrote:
> On 2016-09-19 14:01, Robin Murphy wrote:
>> On 19/09/16 18:36, Brent DeGraaf wrote:
>>> According to section 6.3.8 of the ARM Programmer's Guide, non-temporal
>>> loads and stores do not verify that address dependency is met between a
>>> load of an address to a register and a subsequent non-temporal load or
>>> store using that address on the executing PE. Therefore, context switch
>>> code and subroutine calls that use non-temporally accessed addresses as
>>> parameters that might depend on a load of an address into an argument
>>> register must ensure that ordering requirements are met by introducing
>>> a barrier prior to the successive non-temporal access.  Add appropriate
>>> barriers whereever this specific situation comes into play.
>>>
>>> Signed-off-by: Brent DeGraaf 
>>> ---
>>>  arch/arm64/kernel/entry.S  | 1 +
>>>  arch/arm64/lib/copy_page.S | 2 ++
>>>  2 files changed, 3 insertions(+)
>>>
>>> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
>>> index 441420c..982c4d3 100644
>>> --- a/arch/arm64/kernel/entry.S
>>> +++ b/arch/arm64/kernel/entry.S
>>> @@ -679,6 +679,7 @@ ENTRY(cpu_switch_to)
>>>  ldpx27, x28, [x8], #16
>>>  ldpx29, x9, [x8], #16
>>>  ldrlr, [x8]
>>> +dmbnshld// Existence of instructions with loose load-use
>>> dependencies (e.g. ldnp/stnp) make this barrier necessary
>>>  movsp, x9
>>>  andx9, x9, #~(THREAD_SIZE - 1)
>>>  msrsp_el0, x9
>>> diff --git a/arch/arm64/lib/copy_page.S b/arch/arm64/lib/copy_page.S
>>> index 4c1e700..21c6892 100644
>>> --- a/arch/arm64/lib/copy_page.S
>>> +++ b/arch/arm64/lib/copy_page.S
>>> @@ -47,6 +47,8 @@ alternative_endif
>>>  ldpx14, x15, [x1, #96]
>>>  ldpx16, x17, [x1, #112]
>>>
>>> +dmbnshld // In case x0 (for stnp) is dependent on a load
>>
>> The ARMv8 ARM (B2.7.2 in issue j) says that when an address dependency
>> exists between a load and a subsequent LDNP, *other* observers may
>> observe those accesses in any order. How's that related to an STNP on
>> the same CPU?
>>
>> Robin.
>>
>>> +
>>>  movx18, #(PAGE_SIZE - 128)
>>>  addx1, x1, #128
>>>  1:
>>>
> 
> Yes, I have seen the section in the ARM ARM about this. But the
> Programmer's Guide goes further, even providing a concrete example:
> 
> "Non-temporal loads and stores relax the memory ordering
> requirements...the LDNP instruction might
> be observed before the preceding LDR instruction, which can result in
> reading from an unpredictable
> address in X0.
> 
> For example:
> LDR X0, [X3]
> LDNP X2, X1, [X0]
> To correct the above, you need an explicit load barrier:
> LDR X0, [X3]
> DMB NSHLD
> LDNP X2, X1, [X0]"
> 
> Did the ARM ARM leave this out?  Or is the Programmer's Guide section
> incorrect?

If the ARM ARM and the Programmer's Guide don't agree, then the
Programmer's Guide is wrong (I'll raise a bug against it).

All the ARM ARM says is that in this situation:

 P1P2
 STP x0, x1, [x2]  1: LDR x0, 
 DMB ISH  CBZ x0, 1b
 STR x2, LDNP x1, x2, [x0]

P2's address dependency still very much exists from the point of view of
P2's execution, it just may not guarantee order against the DMB on P1,
so P2's LDNP isn't guaranteed to see the data from P1's STP (as opposed
to how a regular LDP *is*), and may still load older stale data instead.

Robin.

> 
> Thanks for your comments,
> Brent
> 



Re: [PATCH] dma-mapping.h: Preserve unmap info for CONFIG_DMA_API_DEBUG

2016-09-21 Thread Robin Murphy
On 20/09/16 16:58, Andrey Smirnov wrote:
> When CONFIG_DMA_API_DEBUG is enabled we need to preserve unmapping
> address even if "unmap" is a no-op for our architecutre because we need
> debug_dma_unmap_page() to correctly cleanup all of the debug
> bookkeeping. Failing to do so results in a false positive warnings about
> previously mapped areas never being unmapped.

Makes sense, although I guess it might be even clearer to simply have
DMA_API_DEBUG select NEED_DMA_MAP_STATE. Either way, though,

Reviewed-by: Robin Murphy 

> 
> Signed-off-by: Andrey Smirnov 
> ---
>  include/linux/dma-mapping.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 71c1b21..41c9875 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -678,7 +678,7 @@ static inline int dma_mmap_wc(struct device *dev,
>  #define dma_mmap_writecombine dma_mmap_wc
>  #endif
>  
> -#ifdef CONFIG_NEED_DMA_MAP_STATE
> +#if defined(CONFIG_NEED_DMA_MAP_STATE) || defined(CONFIG_DMA_API_DEBUG)
>  #define DEFINE_DMA_UNMAP_ADDR(ADDR_NAME)dma_addr_t ADDR_NAME
>  #define DEFINE_DMA_UNMAP_LEN(LEN_NAME)  __u32 LEN_NAME
>  #define dma_unmap_addr(PTR, ADDR_NAME)   ((PTR)->ADDR_NAME)
> 



Re: [PATCH 1/2] armv8: aarch32: Execute 32-bit Linux for LayerScape platforms

2016-09-23 Thread Robin Murphy
Hi Alison,

On 23/09/16 03:19, Alison Wang wrote:
> The ARMv8 architecture supports:
> 1. 64-bit execution state, AArch64.
> 2. 32-bit execution state, AArch32, that is compatible with previous
> versions of the ARM architecture.
> 
> LayerScape platforms are compliant with ARMv8 architecture. This patch
> is to support running 32-bit Linux kernel for LayerScape platforms.
> 
> Verified on LayerScape LS1043ARDB, LS1012ARDB, LS1046ARDB boards.
> 
> Signed-off-by: Ebony Zhu 
> Signed-off-by: Alison Wang 
> ---
>  arch/arm/Kconfig|  9 +
>  arch/arm/mach-imx/Kconfig   | 14 ++
>  arch/arm/mach-imx/Makefile  |  4 +++-
>  arch/arm/mach-imx/mach-layerscape.c | 23 +++
>  4 files changed, 49 insertions(+), 1 deletion(-)
>  create mode 100644 arch/arm/mach-imx/mach-layerscape.c
> 
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index f0c8068..e8d470e 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -294,6 +294,15 @@ config PGTABLE_LEVELS
>   default 3 if ARM_LPAE
>   default 2
>  
> +config ARCH_AARCH32_ES_SUPPORT
> + def_bool n
> + help
> +  The ARMv8 architecture supports 64-bit execution state, AArch64
> +  and 32-bit execution state, AArch32, that is compatible with
> +  previous versions of the ARM architecture.
> +
> +  Enable AArch32 execution state support for ARMv8 architecture.

What's this supposed to do, exactly? I've been running 32-bit kernels on
my Juno with very little issue (beyond a couple of DT tweaks, and some
firmware hacks with a corresponding bit of A64 assembly tacked on the
front of the zImage to switch into AArch32 state).

Robin.

> +
>  source "init/Kconfig"
>  
>  source "kernel/Kconfig.freezer"
> diff --git a/arch/arm/mach-imx/Kconfig b/arch/arm/mach-imx/Kconfig
> index 0ac05a0..fda4f5f 100644
> --- a/arch/arm/mach-imx/Kconfig
> +++ b/arch/arm/mach-imx/Kconfig
> @@ -549,6 +549,20 @@ config SOC_LS1021A
>   help
> This enables support for Freescale LS1021A processor.
>  
> +config ARCH_LAYERSCAPE_AARCH32
> + bool "Freescale Layerscape SoC AArch32 ES support"
> + select ARCH_AARCH32_ES_SUPPORT
> + select ARM_AMBA
> + select ARM_GIC
> + select ARM_ARCH_TIMER
> + select ARCH_DMA_ADDR_T_64BIT if ARM_LPAE
> + select PCI_LAYERSCAPE if PCI
> + select PCI_DOMAINS if PCI
> +
> + help
> +   This enables support for Freescale Layerscape SoC family in
> +   in AArch32 execution state.
> +
>  endif
>  
>  comment "Cortex-A/Cortex-M asymmetric multiprocessing platforms"
> diff --git a/arch/arm/mach-imx/Makefile b/arch/arm/mach-imx/Makefile
> index 737450f..7ded4fa 100644
> --- a/arch/arm/mach-imx/Makefile
> +++ b/arch/arm/mach-imx/Makefile
> @@ -69,7 +69,7 @@ obj-$(CONFIG_HAVE_IMX_ANATOP) += anatop.o
>  obj-$(CONFIG_HAVE_IMX_GPC) += gpc.o
>  obj-$(CONFIG_HAVE_IMX_MMDC) += mmdc.o
>  obj-$(CONFIG_HAVE_IMX_SRC) += src.o
> -ifneq ($(CONFIG_SOC_IMX6)$(CONFIG_SOC_LS1021A),)
> +ifneq 
> ($(CONFIG_SOC_IMX6)$(CONFIG_SOC_LS1021A)$(CONFIG_ARCH_LAYERSCAPE_AARCH32),)
>  AFLAGS_headsmp.o :=-Wa,-march=armv7-a
>  obj-$(CONFIG_SMP) += headsmp.o platsmp.o
>  obj-$(CONFIG_HOTPLUG_CPU) += hotplug.o
> @@ -96,4 +96,6 @@ obj-$(CONFIG_SOC_VF610) += mach-vf610.o
>  
>  obj-$(CONFIG_SOC_LS1021A) += mach-ls1021a.o
>  
> +obj-$(CONFIG_ARCH_LAYERSCAPE_AARCH32) += mach-layerscape.o
> +
>  obj-y += devices/
> diff --git a/arch/arm/mach-imx/mach-layerscape.c 
> b/arch/arm/mach-imx/mach-layerscape.c
> new file mode 100644
> index 000..acfb2a2
> --- /dev/null
> +++ b/arch/arm/mach-imx/mach-layerscape.c
> @@ -0,0 +1,23 @@
> +/*
> + * Copyright 2015-2016 Freescale Semiconductor, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +
> +#include 
> +
> +#include "common.h"
> +
> +static const char * const layerscape_dt_compat[] __initconst = {
> + "fsl,ls1012a",
> + "fsl,ls1043a",
> + "fsl,ls1046a",
> + NULL,
> +};
> +
> +DT_MACHINE_START(LAYERSCAPE_AARCH32, "Freescale LAYERSCAPE")
> + .dt_compat  = layerscape_dt_compat,
> +MACHINE_END
> 



Re: [PATCH 1/2] armv8: aarch32: Execute 32-bit Linux for LayerScape platforms

2016-09-23 Thread Robin Murphy
On 23/09/16 14:13, Stuart Yoder wrote:
> 
> 
>> -Original Message-----
>> From: Robin Murphy [mailto:robin.mur...@arm.com]
>> Sent: Friday, September 23, 2016 7:17 AM
>> To: Alison Wang ; shawn...@kernel.org; 
>> ker...@pengutronix.de; Fabio Estevam
>> Estevam ; li...@armlinux.org.uk; 
>> linux-arm-ker...@lists.infradead.org; linux-
>> ker...@vger.kernel.org; Scott Wood ; Stuart Yoder 
>> ; Leo Li
>> 
>> Cc: Jason Jin 
>> Subject: Re: [PATCH 1/2] armv8: aarch32: Execute 32-bit Linux for LayerScape 
>> platforms
>>
>> Hi Alison,
>>
>> On 23/09/16 03:19, Alison Wang wrote:
>>> The ARMv8 architecture supports:
>>> 1. 64-bit execution state, AArch64.
>>> 2. 32-bit execution state, AArch32, that is compatible with previous
>>> versions of the ARM architecture.
>>>
>>> LayerScape platforms are compliant with ARMv8 architecture. This patch
>>> is to support running 32-bit Linux kernel for LayerScape platforms.
>>>
>>> Verified on LayerScape LS1043ARDB, LS1012ARDB, LS1046ARDB boards.
>>>
>>> Signed-off-by: Ebony Zhu 
>>> Signed-off-by: Alison Wang 
>>> ---
>>>  arch/arm/Kconfig|  9 +
>>>  arch/arm/mach-imx/Kconfig   | 14 ++
>>>  arch/arm/mach-imx/Makefile  |  4 +++-
>>>  arch/arm/mach-imx/mach-layerscape.c | 23 +++
>>>  4 files changed, 49 insertions(+), 1 deletion(-)
>>>  create mode 100644 arch/arm/mach-imx/mach-layerscape.c
>>>
>>> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
>>> index f0c8068..e8d470e 100644
>>> --- a/arch/arm/Kconfig
>>> +++ b/arch/arm/Kconfig
>>> @@ -294,6 +294,15 @@ config PGTABLE_LEVELS
>>> default 3 if ARM_LPAE
>>> default 2
>>>
>>> +config ARCH_AARCH32_ES_SUPPORT
>>> +   def_bool n
>>> +   help
>>> +The ARMv8 architecture supports 64-bit execution state, AArch64
>>> +and 32-bit execution state, AArch32, that is compatible with
>>> +previous versions of the ARM architecture.
>>> +
>>> +Enable AArch32 execution state support for ARMv8 architecture.
>>
>> What's this supposed to do, exactly? I've been running 32-bit kernels on
>> my Juno with very little issue (beyond a couple of DT tweaks, and some
>> firmware hacks with a corresponding bit of A64 assembly tacked on the
>> front of the zImage to switch into AArch32 state).
> 
> Which arch/arm/mach-* platform are you using for Juno?

I don't even know! :) I just start with a multi_v7_defconfig plus a few
extra bits (LPAE, KVM, sil24, sky2, etc.) and it works. I guess it's the
combination of mach-vexpress and mach-virt.

Robin.

> 
> Thanks,
> Stuart
> 



Re: [PATCH 1/2] armv8: aarch32: Execute 32-bit Linux for LayerScape platforms

2016-09-23 Thread Robin Murphy
On 23/09/16 15:01, Stuart Yoder wrote:
> 
> 
>> -Original Message-----
>> From: Robin Murphy [mailto:robin.mur...@arm.com]
>> Sent: Friday, September 23, 2016 8:19 AM
>> To: Stuart Yoder ; Alison Wang ; 
>> shawn...@kernel.org;
>> ker...@pengutronix.de; Fabio Estevam Estevam ; 
>> li...@armlinux.org.uk; linux-arm-
>> ker...@lists.infradead.org; linux-kernel@vger.kernel.org; Scott Wood 
>> ; Leo Li
>> 
>> Cc: Jason Jin 
>> Subject: Re: [PATCH 1/2] armv8: aarch32: Execute 32-bit Linux for LayerScape 
>> platforms
>>
>> On 23/09/16 14:13, Stuart Yoder wrote:
>>>
>>>
>>>> -Original Message-
>>>> From: Robin Murphy [mailto:robin.mur...@arm.com]
>>>> Sent: Friday, September 23, 2016 7:17 AM
>>>> To: Alison Wang ; shawn...@kernel.org; 
>>>> ker...@pengutronix.de; Fabio Estevam
>>>> Estevam ; li...@armlinux.org.uk; 
>>>> linux-arm-ker...@lists.infradead.org; linux-
>>>> ker...@vger.kernel.org; Scott Wood ; Stuart Yoder 
>>>> ; Leo Li
>>>> 
>>>> Cc: Jason Jin 
>>>> Subject: Re: [PATCH 1/2] armv8: aarch32: Execute 32-bit Linux for 
>>>> LayerScape platforms
>>>>
>>>> Hi Alison,
>>>>
>>>> On 23/09/16 03:19, Alison Wang wrote:
>>>>> The ARMv8 architecture supports:
>>>>> 1. 64-bit execution state, AArch64.
>>>>> 2. 32-bit execution state, AArch32, that is compatible with previous
>>>>> versions of the ARM architecture.
>>>>>
>>>>> LayerScape platforms are compliant with ARMv8 architecture. This patch
>>>>> is to support running 32-bit Linux kernel for LayerScape platforms.
>>>>>
>>>>> Verified on LayerScape LS1043ARDB, LS1012ARDB, LS1046ARDB boards.
>>>>>
>>>>> Signed-off-by: Ebony Zhu 
>>>>> Signed-off-by: Alison Wang 
>>>>> ---
>>>>>  arch/arm/Kconfig|  9 +
>>>>>  arch/arm/mach-imx/Kconfig   | 14 ++
>>>>>  arch/arm/mach-imx/Makefile  |  4 +++-
>>>>>  arch/arm/mach-imx/mach-layerscape.c | 23 +++
>>>>>  4 files changed, 49 insertions(+), 1 deletion(-)
>>>>>  create mode 100644 arch/arm/mach-imx/mach-layerscape.c
>>>>>
>>>>> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
>>>>> index f0c8068..e8d470e 100644
>>>>> --- a/arch/arm/Kconfig
>>>>> +++ b/arch/arm/Kconfig
>>>>> @@ -294,6 +294,15 @@ config PGTABLE_LEVELS
>>>>>   default 3 if ARM_LPAE
>>>>>   default 2
>>>>>
>>>>> +config ARCH_AARCH32_ES_SUPPORT
>>>>> + def_bool n
>>>>> + help
>>>>> +  The ARMv8 architecture supports 64-bit execution state, AArch64
>>>>> +  and 32-bit execution state, AArch32, that is compatible with
>>>>> +  previous versions of the ARM architecture.
>>>>> +
>>>>> +  Enable AArch32 execution state support for ARMv8 architecture.
>>>>
>>>> What's this supposed to do, exactly? I've been running 32-bit kernels on
>>>> my Juno with very little issue (beyond a couple of DT tweaks, and some
>>>> firmware hacks with a corresponding bit of A64 assembly tacked on the
>>>> front of the zImage to switch into AArch32 state).
>>>
>>> Which arch/arm/mach-* platform are you using for Juno?
>>
>> I don't even know! :) I just start with a multi_v7_defconfig plus a few
>> extra bits (LPAE, KVM, sil24, sky2, etc.) and it works. I guess it's the
>> combination of mach-vexpress and mach-virt.
> 
> Thanks.  A question about the switch into aarch32 state... our assumption
> was that the kernel starts at EL2.  In this proof of concept we're doing the
> switch to aarch32/EL2 in firmware.  And what I'm being told is that the
> firmware aarch64 EL2 code cannot switch to aarch32 EL2 without some 
> assistance from EL3 firmware.  This is leading us to invent a new
> SMC call to do this.
> 
> Did you face this?

Yes, the only way to enter in Hyp is to have the firmware twiddle
SCR_EL3.RW (I simply stuck a disgusting hack directly in ATF's exception
handler, which my dodgy 64-bit header then calls). Otherwise you can
always simply run your own shim at EL2 to drive an AArch32 EL1 (it'll
need to trap and translate subsequent SMC calls for e.g. PSCI).

> If there is such a requirement, it's something begging for standardization.
> Doesn't make sense for multiple divergent approaches for switching from
> aarch64/EL2 to aarch32/EL2.

Perhaps - I did briefly look into how hard it would be to write a proper
SMC service handler to do this (since ATF does have a framework for such
things), but concluded it would be more than 10 minutes' work and just
cheated instead. It's certainly something which could be raised with the
firmware folks.

Robin.

> 
> Thanks,
> Stuart
> 



Re: [PATCH 1/2] armv8: aarch32: Execute 32-bit Linux for LayerScape platforms

2016-09-23 Thread Robin Murphy
On 23/09/16 15:44, Arnd Bergmann wrote:
> On Friday, September 23, 2016 3:24:12 PM CEST Robin Murphy wrote:
>> On 23/09/16 15:01, Stuart Yoder wrote:
>> Otherwise you can
>> always simply run your own shim at EL2 to drive an AArch32 EL1 (it'll
>> need to trap and translate subsequent SMC calls for e.g. PSCI).
>>
>>> If there is such a requirement, it's something begging for standardization.
>>> Doesn't make sense for multiple divergent approaches for switching from
>>> aarch64/EL2 to aarch32/EL2.
>>
>> Perhaps - I did briefly look into how hard it would be to write a proper
>> SMC service handler to do this (since ATF does have a framework for such
>> things), but concluded it would be more than 10 minutes' work and just
>> cheated instead. It's certainly something which could be raised with the
>> firmware folks.
> 
> If we end up allowing all arm64 platforms to be enabled in arch/arm,
> we could perhaps create a generic implementation that does both of
> those things, i.e.
> 
> - Take the arm32 kernel Image or zImage file, wrap it inside of a binary
>   that implements the arm64 boot protocol.
> - When that starts up, try to use the new PSCI call to jump into
>   the arm32 kernel
> - If PSCI failed and we are running in EL2, implement the shim
>   and start the arm32 kernel in EL1 mode

Really, though, the firmware call thing is an incredibly niche use-case.
Beyond development, the only real benefit of starting an AArch32 kernel
in Hyp is that you can run AArch32 KVM guests, which you can do equally
well (if not better) under an AArch64 kernel.

The standalone EL2 shim probably has legs, though, if the one currently
found in arch/arm64 is too heavyweight ;) My first version was
essentially that, albeit implemented by a load of .inst directives in
head.S.

Robin.

> 
>   Arnd
> 



Re: [PATCH] ARM: decompressor: reset ttbcr fields to use TTBR0 on ARMv7

2016-09-27 Thread Robin Murphy
On 27/09/16 13:16, Srinivas Ramana wrote:
> Hi Robin,

Sorry! This one had slipped my mind already...

> On 09/13/2016 08:22 PM, Srinivas Ramana wrote:
>> On 09/12/2016 11:21 PM, Robin Murphy wrote:
>>> On 12/09/16 07:57, Srinivas Ramana wrote:
>>>> If the bootloader uses the long descriptor format and jumps to
>>>> kernel decompressor code, TTBCR may not be in a right state.
>>>> Before enabling the MMU, it is required to clear the TTBCR.PD0
>>>> field to use TTBR0 for translation table walks.
>>>>
>>>> The 'commit dbece45894d3a ("ARM: 7501/1: decompressor:
>>>> reset ttbcr for VMSA ARMv7 cores")' does the reset of TTBCR.N, but
>>>> doesn't consider all the bits for the size of TTBCR.N.
>>>>
>>>> Clear TTBCR.PD0 field and reset all the three bits of TTBCR.N to
>>>> indicate the use of TTBR0 and the correct base address width.
>>>>
>>>> Signed-off-by: Srinivas Ramana 
>>>> ---
>>>>   arch/arm/boot/compressed/head.S | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/arm/boot/compressed/head.S
>>>> b/arch/arm/boot/compressed/head.S
>>>> index af11c2f8f3b7..fc6d541549a2 100644
>>>> --- a/arch/arm/boot/compressed/head.S
>>>> +++ b/arch/arm/boot/compressed/head.S
>>>> @@ -779,7 +779,7 @@ __armv7_mmu_cache_on:
>>>>   orrner0, r0, #1@ MMU enabled
>>>>   movner1, #0xfffd@ domain 0 = client
>>>>   bic r6, r6, #1 << 31@ 32-bit translation system
>>>
>>> Hmm, if TTBCR.EAE _was_ actually set...
>>>
>>>> -bic r6, r6, #3 << 0 @ use only ttbr0
>>>> +bic r6, r6, #(7 << 0) | (1 << 4)@ use only ttbr0
>>>>   mcrnep15, 0, r3, c2, c0, 0@ load page table pointer
>>>>   mcrnep15, 0, r1, c3, c0, 0@ load domain access
>>>> control
>>>>   mcrne   p15, 0, r6, c2, c0, 2   @ load ttb control
>>>
>>> ...then strictly the TLBIALL needs to happen after the ISB following
>>> this update. Otherwise per B3.10.2 of DDI406C.c I think we might be into
>>> unpredictable territory - i.e. if the TLB happens to treat long- and
>>> short-descriptor entries differently then the TLBI beforehand (with EAE
>>> set) may be at liberty to only discard long-descriptor entries and leave
>>> bogus short-descriptor entries sitting around.
>> Yes, it seems this has to be taken care of, along with resetting
>> TTBCR.PD0 and TTBCR.N. Do you say that this needs to be done in the same
>> patch or a different one?
>>>
>>> In other words, something like (completely untested):
>>>
>>> ---8<---
>>> diff --git a/arch/arm/boot/compressed/head.S
>>> b/arch/arm/boot/compressed/head.S
>>> index af11c2f8f3b7..536b7781024a 100644
>>> --- a/arch/arm/boot/compressed/head.S
>>> +++ b/arch/arm/boot/compressed/head.S
>>> @@ -764,7 +764,6 @@ __armv7_mmu_cache_on:
>>>  mov r0, #0
>>>  mcr p15, 0, r0, c7, c10, 4  @ drain write buffer
>>>  tst r11, #0xf   @ VMSA
>>> -   mcrne   p15, 0, r0, c8, c7, 0   @ flush I,D TLBs
>>
>> Shouldn't this be still there for the same reason you explained above? I
>> mean to discard the long descriptor entries when EAE was 1 (before we
>> reset it).
>>>   #endif
>>>  mrc p15, 0, r0, c1, c0, 0   @ read control reg
>>>  bic r0, r0, #1 << 28@ clear SCTLR.TRE
>>> @@ -783,8 +782,11 @@ __armv7_mmu_cache_on:
>>>  mcrne   p15, 0, r3, c2, c0, 0   @ load page table
>>> pointer
>>>  mcrne   p15, 0, r1, c3, c0, 0   @ load domain access
>>> control
>>>  mcrne   p15, 0, r6, c2, c0, 2   @ load ttb control
>>> -#endif
>>>  mcr p15, 0, r0, c7, c5, 4   @ ISB
>>> +   mcrne   p15, 0, r0, c8, c7, 0   @ flush I,D TLBs
>>> +#else
>>> +   mcr p15, 0, r0, c7, c5, 4   @ ISB
>>> +#endif
>>>      mcr p15, 0, r0, c1, c0, 0   @ load control register
>>>  mrc p15, 0, r0, c1, c0, 0   @ and read it back
>>> ---8<---
>>>
>>> Robin.
>>>
>> i have tested this change (flush I, D, TLBs after TTB control is
>> written) and don't see any issue. But on my setup decompression is
>> successful even without this (probably not hitting the case in
>> discussion).
>>
>>
>> Thanks,
>> -- Srinivas R
>>
> 
> Would like your feedback on the above. Can we get the TTBCR fix merged
> first?(will send final patch with Russell Kings comments fixed)
> 
> For testing the TLB flush change we may have to check if we can create a
> failure case.

Yeah, the TLBI being in the wrong place is a separate, pre-existing
problem; as far as this patch goes, it does what it claims to do, and
matches what the ARMv7 (and ARMv6) docs say, so:

Acked-by: Robin Murphy 

> 
> Thanks,
> -- Srinivas R
> 



Re: [PATCH v2] arm64: mm: convert __dma_* routines to use start, size

2016-07-29 Thread Robin Murphy
On 28/07/16 01:08, kwangwoo@sk.com wrote:
>> -Original Message-
>> From: Robin Murphy [mailto:robin.mur...@arm.com]
>> Sent: Wednesday, July 27, 2016 6:56 PM
>> To: 이광우(LEE KWANGWOO) MS SW; Russell King - ARM Linux; Catalin Marinas; Will 
>> Deacon; Mark Rutland;
>> linux-arm-ker...@lists.infradead.org
>> Cc: 김현철(KIM HYUNCHUL) MS SW; linux-kernel@vger.kernel.org; 정우석(CHUNG WOO 
>> SUK) MS SW
>> Subject: Re: [PATCH v2] arm64: mm: convert __dma_* routines to use start, 
>> size
>>
>> On 27/07/16 02:55, kwangwoo@sk.com wrote:
>> [...]
>>>>>  /*
>>>>> - *   __dma_clean_range(start, end)
>>>>> + *   __dma_clean_area(start, size)
>>>>>   *   - start   - virtual start address of region
>>>>> - *   - end - virtual end address of region
>>>>> + *   - size- size in question
>>>>>   */
>>>>> -__dma_clean_range:
>>>>> - dcache_line_size x2, x3
>>>>> - sub x3, x2, #1
>>>>> - bic x0, x0, x3
>>>>> -1:
>>>>> +__dma_clean_area:
>>>>>  alternative_if_not ARM64_WORKAROUND_CLEAN_CACHE
>>>>> - dc  cvac, x0
>>>>> + dcache_by_line_op cvac, sy, x0, x1, x2, x3
>>>>>  alternative_else
>>>>> - dc  civac, x0
>>>>> + dcache_by_line_op civac, sy, x0, x1, x2, x3
>>>>
>>>> dcache_by_line_op is a relatively large macro - is there any way we can
>>>> still apply the alternative to just the one instruction which needs it,
>>>> as opposed to having to patch the entire mostly-identical routine?
>>>
>>> I agree with your opinion. Then, how do you think about using CONFIG_* 
>>> options
>>> like below? I think that alternative_* macros seems to keep the space for
>>> unused instruction. Is it necessary? Please, share your thought about the
>>> space. Thanks!
>>>
>>> +__dma_clean_area:
>>> +#ifdefined(CONFIG_ARM64_ERRATUM_826319) || \
>>> +   defined(CONFIG_ARM64_ERRATUM_827319) || \
>>> +   defined(CONFIG_ARM64_ERRATUM_824069) || \
>>> +   defined(CONFIG_ARM64_ERRATUM_819472)
>>> +   dcache_by_line_op civac, sy, x0, x1, x2, x3
>>> +#else
>>> +   dcache_by_line_op cvac, sy, x0, x1, x2, x3
>>> +#endif
>>
>> That's not ideal, because we still only really want to use the
>> workaround if we detect a CPU which needs it, rather than baking it in
>> at compile time. I was thinking more along the lines of pushing the
>> alternative down into dcache_by_line_op, something like the idea below
>> (compile-tested only, may not actually be viable).
> 
> OK. Using the capability of CPU features seems to be preferred.
> 
>> Robin.
>>
>> -8<-
>> diff --git a/arch/arm64/include/asm/assembler.h
>> b/arch/arm64/include/asm/assembler.h
>> index 10b017c4bdd8..1c005c90387e 100644
>> --- a/arch/arm64/include/asm/assembler.h
>> +++ b/arch/arm64/include/asm/assembler.h
>> @@ -261,7 +261,16 @@ lr  .reqx30 // link register
>>  add \size, \kaddr, \size
>>  sub \tmp2, \tmp1, #1
>>  bic \kaddr, \kaddr, \tmp2
>> -9998:   dc  \op, \kaddr
>> +9998:
>> +.ifeqs "\op", "cvac"
>> +alternative_if_not ARM64_WORKAROUND_CLEAN_CACHE
>> +dc  cvac, \kaddr
>> +alternative_else
>> +dc  civac, \kaddr
>> +alternative_endif
>> +.else
>> +dc  \op, \kaddr
>> +.endif
>>  add \kaddr, \kaddr, \tmp1
>>  cmp \kaddr, \size
>>  b.lo9998b
> 
> I agree that it looks not viable because it makes the macro bigger and
> conditional specifically with CVAC op.

Actually, having had a poke around in the resulting disassembly, it
looks like this does work correctly. I can't think of a viable reason
for the whole dcache_by_line_op to ever be wrapped in yet another
alternative (which almost certainly would go horribly wrong), and it
would mean that any other future users are automatically covered for
free. It's just horrible to look at at the source level.

Robin.

> 
> Then.. if the number of the usage of alternative_* macros for erratum is
> few (just one in this case for cache clean), I think only small change like
> below seems to be optimal and there is no need to create a variant macro of
> dcache_cache_by_line_op. How do you think about it?
> 
> /*
> - * __dma_clean_range(start,

Re: [PATCH v2] arm64: mm: convert __dma_* routines to use start, size

2016-08-01 Thread Robin Murphy
On 01/08/16 00:45, kwangwoo@sk.com wrote:
[...]
 -8<-
 diff --git a/arch/arm64/include/asm/assembler.h
 b/arch/arm64/include/asm/assembler.h
 index 10b017c4bdd8..1c005c90387e 100644
 --- a/arch/arm64/include/asm/assembler.h
 +++ b/arch/arm64/include/asm/assembler.h
 @@ -261,7 +261,16 @@ lr.reqx30 // link register
add \size, \kaddr, \size
sub \tmp2, \tmp1, #1
bic \kaddr, \kaddr, \tmp2
 -9998: dc  \op, \kaddr
 +9998:
 +  .ifeqs "\op", "cvac"
 +alternative_if_not ARM64_WORKAROUND_CLEAN_CACHE
 +  dc  cvac, \kaddr
 +alternative_else
 +  dc  civac, \kaddr
 +alternative_endif
 +  .else
 +  dc  \op, \kaddr
 +  .endif
add \kaddr, \kaddr, \tmp1
cmp \kaddr, \size
b.lo9998b
>>>
>>> I agree that it looks not viable because it makes the macro bigger and
>>> conditional specifically with CVAC op.
>>
>> Actually, having had a poke around in the resulting disassembly, it
>> looks like this does work correctly. I can't think of a viable reason
>> for the whole dcache_by_line_op to ever be wrapped in yet another
>> alternative (which almost certainly would go horribly wrong), and it
>> would mean that any other future users are automatically covered for
>> free. It's just horrible to look at at the source level.
> 
> Then, Are you going to send a patch for this? Or should I include this change?

I'll do a bit more testing just to make sure, then spin a separate patch
(and try to remember to keep you on CC..)

Robin.



Re: [PATCH v2] arm64: mm: convert __dma_* routines to use start, size

2016-08-01 Thread Robin Murphy
On 01/08/16 14:36, Robin Murphy wrote:
> On 01/08/16 00:45, kwangwoo@sk.com wrote:
> [...]
>>>>> -8<-
>>>>> diff --git a/arch/arm64/include/asm/assembler.h
>>>>> b/arch/arm64/include/asm/assembler.h
>>>>> index 10b017c4bdd8..1c005c90387e 100644
>>>>> --- a/arch/arm64/include/asm/assembler.h
>>>>> +++ b/arch/arm64/include/asm/assembler.h
>>>>> @@ -261,7 +261,16 @@ lr   .reqx30 // link register
>>>>>   add \size, \kaddr, \size
>>>>>   sub \tmp2, \tmp1, #1
>>>>>   bic \kaddr, \kaddr, \tmp2
>>>>> -9998:dc  \op, \kaddr
>>>>> +9998:
>>>>> + .ifeqs "\op", "cvac"
>>>>> +alternative_if_not ARM64_WORKAROUND_CLEAN_CACHE
>>>>> + dc  cvac, \kaddr
>>>>> +alternative_else
>>>>> + dc  civac, \kaddr
>>>>> +alternative_endif
>>>>> + .else
>>>>> + dc  \op, \kaddr
>>>>> + .endif
>>>>>   add \kaddr, \kaddr, \tmp1
>>>>>   cmp \kaddr, \size
>>>>>   b.lo9998b
>>>>
>>>> I agree that it looks not viable because it makes the macro bigger and
>>>> conditional specifically with CVAC op.
>>>
>>> Actually, having had a poke around in the resulting disassembly, it
>>> looks like this does work correctly. I can't think of a viable reason
>>> for the whole dcache_by_line_op to ever be wrapped in yet another
>>> alternative (which almost certainly would go horribly wrong), and it
>>> would mean that any other future users are automatically covered for
>>> free. It's just horrible to look at at the source level.
>>
>> Then, Are you going to send a patch for this? Or should I include this 
>> change?
> 
> I'll do a bit more testing just to make sure, then spin a separate patch
> (and try to remember to keep you on CC..)

...and said patch turns out to conflict with 823066d9edcd, since I
hadn't realised it's already been fixed! So you can go ahead with the
dcache_by_line_op cleanup as well, just rebase onto arm64/for-next/core
(or linux/master, since it's been pulled already).

Robin.


[PATCH v6.1] iommu/dma: Add support for mapping MSIs

2016-09-07 Thread Robin Murphy
When an MSI doorbell is located downstream of an IOMMU, attaching
devices to a DMA ops domain and switching on translation leads to a rude
shock when their attempt to write to the physical address returned by
the irqchip driver faults (or worse, writes into some already-mapped
buffer) and no interrupt is forthcoming.

Address this by adding a hook for relevant irqchip drivers to call from
their compose_msi_msg() callback, to swizzle the physical address with
an appropriatly-mapped IOVA for any device attached to one of our DMA
ops domains.

CC: Thomas Gleixner 
CC: Jason Cooper 
CC: Marc Zyngier 
CC: linux-kernel@vger.kernel.org
Signed-off-by: Robin Murphy 

---

- Rework map_page() helper function plus insane lookup logic into
  straightforward get_page() helper
- Use phys_addr_t to further simplify address matching
- Fix the bit where I neglected to actually round the doorbell
  address to a page boundary (oops!)
- Make the locking hardirq-safe to satisfy lockdep
---
 drivers/iommu/dma-iommu.c| 136 ++-
 drivers/irqchip/irq-gic-v2m.c|   3 +
 drivers/irqchip/irq-gic-v3-its.c |   3 +
 include/linux/dma-iommu.h|   9 +++
 4 files changed, 136 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 00c8a08d56e7..4329d18080cf 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -25,10 +25,28 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 
+struct iommu_dma_msi_page {
+   struct list_headlist;
+   dma_addr_t  iova;
+   phys_addr_t phys;
+};
+
+struct iommu_dma_cookie {
+   struct iova_domain  iovad;
+   struct list_headmsi_page_list;
+   spinlock_t  msi_lock;
+};
+
+static inline struct iova_domain *cookie_iovad(struct iommu_domain *domain)
+{
+   return &((struct iommu_dma_cookie *)domain->iova_cookie)->iovad;
+}
+
 int iommu_dma_init(void)
 {
return iova_cache_get();
@@ -43,15 +61,19 @@ int iommu_dma_init(void)
  */
 int iommu_get_dma_cookie(struct iommu_domain *domain)
 {
-   struct iova_domain *iovad;
+   struct iommu_dma_cookie *cookie;
 
if (domain->iova_cookie)
return -EEXIST;
 
-   iovad = kzalloc(sizeof(*iovad), GFP_KERNEL);
-   domain->iova_cookie = iovad;
+   cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
+   if (!cookie)
+   return -ENOMEM;
 
-   return iovad ? 0 : -ENOMEM;
+   spin_lock_init(&cookie->msi_lock);
+   INIT_LIST_HEAD(&cookie->msi_page_list);
+   domain->iova_cookie = cookie;
+   return 0;
 }
 EXPORT_SYMBOL(iommu_get_dma_cookie);
 
@@ -63,14 +85,20 @@ EXPORT_SYMBOL(iommu_get_dma_cookie);
  */
 void iommu_put_dma_cookie(struct iommu_domain *domain)
 {
-   struct iova_domain *iovad = domain->iova_cookie;
+   struct iommu_dma_cookie *cookie = domain->iova_cookie;
+   struct iommu_dma_msi_page *msi, *tmp;
 
-   if (!iovad)
+   if (!cookie)
return;
 
-   if (iovad->granule)
-   put_iova_domain(iovad);
-   kfree(iovad);
+   if (cookie->iovad.granule)
+   put_iova_domain(&cookie->iovad);
+
+   list_for_each_entry_safe(msi, tmp, &cookie->msi_page_list, list) {
+   list_del(&msi->list);
+   kfree(msi);
+   }
+   kfree(cookie);
domain->iova_cookie = NULL;
 }
 EXPORT_SYMBOL(iommu_put_dma_cookie);
@@ -88,7 +116,7 @@ EXPORT_SYMBOL(iommu_put_dma_cookie);
  */
 int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, u64 
size)
 {
-   struct iova_domain *iovad = domain->iova_cookie;
+   struct iova_domain *iovad = cookie_iovad(domain);
unsigned long order, base_pfn, end_pfn;
 
if (!iovad)
@@ -155,7 +183,7 @@ int dma_direction_to_prot(enum dma_data_direction dir, bool 
coherent)
 static struct iova *__alloc_iova(struct iommu_domain *domain, size_t size,
dma_addr_t dma_limit)
 {
-   struct iova_domain *iovad = domain->iova_cookie;
+   struct iova_domain *iovad = cookie_iovad(domain);
unsigned long shift = iova_shift(iovad);
unsigned long length = iova_align(iovad, size) >> shift;
 
@@ -171,7 +199,7 @@ static struct iova *__alloc_iova(struct iommu_domain 
*domain, size_t size,
 /* The IOVA allocator knows what we mapped, so just unmap whatever that was */
 static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t dma_addr)
 {
-   struct iova_domain *iovad = domain->iova_cookie;
+   struct iova_domain *iovad = cookie_iovad(domain);
unsigned long shift = iova_shift(iovad);
unsigned long pfn = dma_addr >> shift;
struct iova *iova = find_iova(iovad, pfn);
@@ -294,7 +322,7 @@ struct page **iommu_dma_alloc(struct device *dev, size_t 
size, gfp_t gfp,
   

Re: [PATCH] usb: dwc3: host: inherit dma configuration from parent dev

2016-09-07 Thread Robin Murphy
On 07/09/16 10:55, Peter Chen wrote:
[...]
>> Regarding the DMA configuration that you mention in ci_hdrc_add_device(),
>> I think we should replace 
>>
>> pdev->dev.dma_mask = dev->dma_mask;
>> pdev->dev.dma_parms = dev->dma_parms;
>> dma_set_coherent_mask(&pdev->dev, dev->coherent_dma_mask);
>>
>> with of_dma_configure(), which has the chance to configure more than
>> just those three, as the dma API might look into different aspects:
>>
>> - iommu specific configuration
>> - cache coherency information
>> - bus type
>> - dma offset
>> - dma_map_ops pointer
>>
>> We try to handle everything in of_dma_configure() at configuration
>> time, and that would be the place to add anything else that we might
>> need in the future.
>>
> 
> Yes, I agree with you, but just like Felipe mentioned, we also need to
> consider PCI device, can we do something like gpiod_get_index does? Are
> there any similar APIs like of_dma_configure for ACPI?

Not yet, but Lorenzo has one in progress[1], primarily for the sake of
abstracting away the IOMMU configuration.

Robin.

[1]:http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1209911.html



Re: [PATCH] pcie: qcom: add support to msm8996 PCIE controller

2016-09-07 Thread Robin Murphy
On 07/09/16 12:06, Srinivas Kandagatla wrote:
> This patch adds support to msm8996/apq8096 pcie, MSM8996 supports
> Gen 1/2, One lane, 3 pcie root-complex with support to MSI and
> legacy interrupts and it conforms to PCI Express Base 2.1 specification.
> 
> This patch adds post_init callback to qcom_pcie_ops, as this is pcie
> pipe clocks and smmu configuration are only setup after the phy is
> powered on. It also adds ltssm_enable callback as it is very much
> different to other supported SOCs in the driver.
> 
> Signed-off-by: Srinivas Kandagatla 
> ---
> 
> I tested this patch along with phy driver patch on DB820c based on
> APQ8096 on port A and port B using sata and ethernet controllers.
> 
> Thanks
> srini
> 
>  .../devicetree/bindings/pci/qcom,pcie.txt  |  88 +++-
>  drivers/pci/host/pcie-qcom.c   | 237 
> -
>  2 files changed, 318 insertions(+), 7 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/pci/qcom,pcie.txt 
> b/Documentation/devicetree/bindings/pci/qcom,pcie.txt
> index 4059a6f..1ddfcb4 100644
> --- a/Documentation/devicetree/bindings/pci/qcom,pcie.txt
> +++ b/Documentation/devicetree/bindings/pci/qcom,pcie.txt
> @@ -92,6 +92,18 @@
>   - "aux" Auxiliary (AUX) clock
>   - "bus_master"  Master AXI clock
>   - "bus_slave"   Slave AXI clock
> +
> +- clock-names:
> + Usage: required for msm8996/apq8096
> + Value type: 
> + Definition: Should contain the following entries
> + - "aux" Auxiliary (AUX) clock
> + - "bus_master"  Master AXI clock
> + - "bus_slave"   Slave AXI clock
> + - "pipe"Pipe Clock driving internal logic.
> + - "cfg" Configuration clk.
> + - "snoc_axi"SNOC AXI clk
> + - "cnoc_ahb"CNOC AHB clk
>  - resets:
>   Usage: required
>   Value type: 
> @@ -114,8 +126,15 @@
>   Definition: Should contain the following entries
>   - "core" Core reset
>  
> +- iommus:
> + Usage: required for msm8996/apq8096
> + Value type: 
> + Definition: Should point to the respective IOMMU block with master port
> + as argument, see Documentation/devicetree/bindings/iommu/
> + mediatek,iommu.txt for details.

Qualcomm have cross-licensed the M4U!? :P

In seriousness, though, it seems a bit odd to list "iommus" as a
required property - unless the IOMMU is actually an integral part of the
PCIe root complex being described (in which case it wants its own
binding documenting as well), then the relationship is merely a
circumstance of integration of those particular SoCs.

> +
>  - power-domains:
> - Usage: required for apq8084
> + Usage: required for apq8084 and msm8996/apq8096
>   Value type: 
>   Definition: A phandle and power domain specifier pair to the
>   power domain which is responsible for collapsing
> @@ -137,6 +156,11 @@
>   Definition: A phandle to the analog power supply for IC which generates
>   reference clock
>  
> +- vdda_1p8-supply:
> + Usage: required for msm8996/apq8096
> + Value type: 
> + Definition: A phandle to the analog power supply for PCIE_1P8
> +
>  - phys:
>   Usage: required for apq8084
>   Value type: 
> @@ -231,3 +255,65 @@
>   pinctrl-0 = <&pcie0_pins_default>;
>   pinctrl-names = "default";
>   };
> +
> +* Example for apq8096:
> + pcie1: qcom,pcie@00608000 {
> + compatible = "qcom,pcie-msm8996", "snps,dw-pcie";
> + power-domains = <&gcc PCIE1_GDSC>;
> + bus-range = <0x00 0xff>;
> + num-lanes = <1>;
> +
> + status  = "disabled";
> +
> + reg = <0x00608000 0x2000>,
> +   <0x0d00 0xf1d>,
> +   <0x0d000f20 0xa8>,
> +   <0x0d10 0x10>;
> +
> + reg-names = "parf", "dbi", "elbi","config";
> +
> + phys = <&pcie_phy 1>;
> + phy-names = "pciephy";
> +
> + #address-cells = <3>;
> + #size-cells = <2>;
> + ranges = <0x0100 0x0 0x0d20 0x0d20 0x0 0x10>,
> + <0x0200 0x0 0x0d30 0x0d30 0x0 0xd0>;
> +
> + interrupts = ;
> + interrupt-names = "msi";
> + #interrupt-cells = <1>;
> + interrupt-map-mask = <0 0 0 0x7>;
> + interrupt-map = <0 0 0 1 &intc 0 272 IRQ_TYPE_LEVEL_HIGH>, /* 
> int_a */
> + <0 0 0 2 &intc 0 273 IRQ_TYPE_LEVEL_HIGH>, /* 
> int_b */
> + <0 0 0 3 &intc 0 274 IRQ_TYPE_LEVEL_HIGH>, /* 
> int_c */
> + <0 0 0 4 &intc 0 275 IRQ_TYPE_LEVEL_HIGH>; /* 
> int_d */
> +
> + pinctrl-names = "d

Re: [PATCH v5 05/14] drivers: iommu: make iommu_fwspec OF agnostic

2016-09-13 Thread Robin Murphy
Hi Lorenzo,

On 09/09/16 15:23, Lorenzo Pieralisi wrote:
> The iommu_fwspec structure, used to hold per device iommu configuration
> data is not OF specific and therefore can be moved to a generic
> and OF independent compilation unit.
> 
> In particular, the iommu_fwspec handling hinges on the device_node
> pointer to identify the IOMMU device associated with the iommu_fwspec
> structure, that is easily converted to a more generic fwnode_handle
> pointer that can cater for OF and non-OF (ie ACPI) systems.
> 
> Create the files and related Kconfig entry to decouple iommu_fwspec
> structure from the OF iommu kernel layer.
> 
> Given that the current iommu_fwspec implementation relies on
> the arch specific struct device.archdata.iommu field in its
> implementation, by making the code standalone and independent
> of the OF layer this patch makes sure that the iommu_fwspec
> kernel code can be selected only on arches implementing the
> struct device.archdata.iommu field by adding an explicit
> arch dependency in its config entry.
> 
> Current drivers using the iommu_fwspec for streamid translation
> are converted to the new iommu_fwspec API by simply converting
> the device_node to its fwnode_handle pointer.
> 
> Signed-off-by: Lorenzo Pieralisi 
> Cc: Will Deacon 
> Cc: Hanjun Guo 
> Cc: Robin Murphy 
> Cc: Joerg Roedel 
> ---
>  drivers/iommu/Kconfig|   4 ++
>  drivers/iommu/Makefile   |   1 +
>  drivers/iommu/arm-smmu-v3.c  |  16 --
>  drivers/iommu/arm-smmu.c |  17 +++---
>  drivers/iommu/iommu-fwspec.c | 126 
> +++
>  drivers/iommu/of_iommu.c |  93 
>  include/linux/iommu-fwspec.h |  70 
>  include/linux/of_iommu.h |  38 -
>  8 files changed, 234 insertions(+), 131 deletions(-)
>  create mode 100644 drivers/iommu/iommu-fwspec.c
>  create mode 100644 include/linux/iommu-fwspec.h
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 101cb17..873bd41 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -70,6 +70,10 @@ config OF_IOMMU
>  config HAVE_IOMMU_FWSPEC
>   bool
>  
> +config IOMMU_FWSPEC
> +   def_bool y
> +   depends on IOMMU_API
> +
>  # IOMMU-agnostic DMA-mapping layer
>  config IOMMU_DMA
>   bool
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 195f7b9..bbbc6d6 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -6,6 +6,7 @@ obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
>  obj-$(CONFIG_IOMMU_IOVA) += iova.o
> +obj-$(CONFIG_IOMMU_FWSPEC) += iommu-fwspec.o
>  obj-$(CONFIG_OF_IOMMU)   += of_iommu.o
>  obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
>  obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index be293b5..a7e9de9 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1720,13 +1721,18 @@ static struct platform_driver arm_smmu_driver;
>  
>  static int arm_smmu_match_node(struct device *dev, void *data)
>  {
> - return dev->of_node == data;
> + struct fwnode_handle *fwnode;
> +
> + fwnode = dev->of_node ? &dev->of_node->fwnode : dev->fwnode;
> +
> + return fwnode == data;
>  }

Maybe we should hoist the dev_fwnode() helper from property.c up to
property.h so we can just have "return dev_fwnode(dev) == data;" here?

>  
> -static struct arm_smmu_device *arm_smmu_get_by_node(struct device_node *np)
> +static struct arm_smmu_device *
> +arm_smmu_get_by_fwnode(struct fwnode_handle *fwnode)
>  {
>   struct device *dev = driver_find_device(&arm_smmu_driver.driver, NULL,
> - np, arm_smmu_match_node);
> + fwnode, arm_smmu_match_node);
>   put_device(dev);
>   return dev ? dev_get_drvdata(dev) : NULL;
>  }
> @@ -1762,7 +1768,7 @@ static int arm_smmu_add_device(struct device *dev)
>   master = fwspec->iommu_priv;
>   smmu = master->smmu;
>   } else {
> - smmu = arm_smmu_get_by_node(fwspec->iommu_np);
> + smmu = arm_smmu_get_by_fwnode(fwspec->iommu_fwnode);
>   if (!smmu)
>   return -ENODEV;
>   master = kzalloc(sizeof(*master), GFP_KERNEL);
> @@ 

Re: [PATCH v5 06/14] drivers: acpi: implement acpi_dma_configure

2016-09-13 Thread Robin Murphy
On 09/09/16 15:23, Lorenzo Pieralisi wrote:
> On DT based systems, the of_dma_configure() API implements DMA
> configuration for a given device. On ACPI systems an API equivalent to
> of_dma_configure() is missing which implies that it is currently not
> possible to set-up DMA operations for devices through the ACPI generic
> kernel layer.
> 
> This patch fills the gap by introducing acpi_dma_configure/deconfigure()
> calls that for now are just wrappers around arch_setup_dma_ops() and
> arch_teardown_dma_ops() and also updates ACPI and PCI core code to use
> the newly introduced acpi_dma_configure/acpi_dma_deconfigure functions.
> 
> The DMA range size passed to arch_setup_dma_ops() is sized according
> to the device coherent_dma_mask (starting at address 0x0), mirroring the
> DT probing path behaviour when a dma-ranges property is not provided
> for the device being probed; this changes the current arch_setup_dma_ops()
> call parameters in the ACPI probing case, but since arch_setup_dma_ops()
> is a NOP on all architectures but ARM/ARM64 this patch does not change
> the current kernel behaviour on them.
> 
> Signed-off-by: Lorenzo Pieralisi 
> Acked-by: Bjorn Helgaas  [pci]
> Cc: Bjorn Helgaas 
> Cc: Robin Murphy 
> Cc: Tomasz Nowicki 
> Cc: Joerg Roedel 
> Cc: "Rafael J. Wysocki" 
> ---
>  drivers/acpi/glue.c |  4 ++--
>  drivers/acpi/scan.c | 24 
>  drivers/pci/probe.c |  3 +--
>  include/acpi/acpi_bus.h |  2 ++
>  include/linux/acpi.h|  5 +
>  5 files changed, 34 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/acpi/glue.c b/drivers/acpi/glue.c
> index 5ea5dc2..f8d6564 100644
> --- a/drivers/acpi/glue.c
> +++ b/drivers/acpi/glue.c
> @@ -227,8 +227,7 @@ int acpi_bind_one(struct device *dev, struct acpi_device 
> *acpi_dev)
>  
>   attr = acpi_get_dma_attr(acpi_dev);
>   if (attr != DEV_DMA_NOT_SUPPORTED)
> - arch_setup_dma_ops(dev, 0, 0, NULL,
> -attr == DEV_DMA_COHERENT);
> + acpi_dma_configure(dev, attr);
>  
>   acpi_physnode_link_name(physical_node_name, node_id);
>   retval = sysfs_create_link(&acpi_dev->dev.kobj, &dev->kobj,
> @@ -251,6 +250,7 @@ int acpi_bind_one(struct device *dev, struct acpi_device 
> *acpi_dev)
>   return 0;
>  
>   err:
> + acpi_dma_deconfigure(dev);
>   ACPI_COMPANION_SET(dev, NULL);
>   put_device(dev);
>   put_device(&acpi_dev->dev);
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index e878fc7..9614232 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -1370,6 +1370,30 @@ enum dev_dma_attr acpi_get_dma_attr(struct acpi_device 
> *adev)
>   return DEV_DMA_NON_COHERENT;
>  }
>  
> +/**
> + * acpi_dma_configure - Set-up DMA configuration for the device.
> + * @dev: The pointer to the device
> + * @attr: device dma attributes
> + */
> +void acpi_dma_configure(struct device *dev, enum dev_dma_attr attr)
> +{
> + /*
> +  * Assume dma valid range starts at 0 and covers the whole
> +  * coherent_dma_mask.
> +  */
> + arch_setup_dma_ops(dev, 0, dev->coherent_dma_mask + 1, NULL,
> +attr == DEV_DMA_COHERENT);

This looks a bit hairy - if we're setting up the DMA configuration at
this point can we really always rely on the device already having a
valid DMA mask? I think it would make sense to at least check, and apply
the standard default 32-bit mask  if not.

> +}
> +
> +/**
> + * acpi_dma_deconfigure - Tear-down DMA configuration for the device.
> + * @dev: The pointer to the device
> + */
> +void acpi_dma_deconfigure(struct device *dev)
> +{
> + arch_teardown_dma_ops(dev);
> +}

As touched upon in the dwc-usb3 thread, is it worth exporting these for
the benefit of modular bus/glue code, to match of_dma_configure()?

> +
>  static void acpi_init_coherency(struct acpi_device *adev)
>  {
>   unsigned long long cca = 0;
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 93f280d..e96d482 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1734,8 +1734,7 @@ static void pci_dma_configure(struct pci_dev *dev)
>   if (attr == DEV_DMA_NOT_SUPPORTED)
>   dev_warn(&dev->dev, "DMA not supported.\n");
>   else
> - arch_setup_dma_ops(&dev->dev, 0, 0, NULL,
> -attr == DEV_DMA_COHERENT);
> + acpi_dma_configure(&dev->dev, attr);
>   }

What about non-PCI stuff? I see there's at least an
acpi_create_platform_device() wh

Re: [PATCH v5 07/14] drivers: acpi: iort: add support for ARM SMMU platform devices creation

2016-09-13 Thread Robin Murphy
On 09/09/16 15:23, Lorenzo Pieralisi wrote:
> In ARM ACPI systems, IOMMU components are specified through static
> IORT table entries. In order to create platform devices for the
> corresponding ARM SMMU components, IORT kernel code should be made
> able to parse IORT table entries and create platform devices
> dynamically.
> 
> This patch adds the generic IORT infrastructure required to create
> platform devices for ARM SMMUs.
> 
> ARM SMMU versions have different resources requirement therefore this
> patch also introduces an IORT specific structure (ie iort_iommu_config)
> that contains hooks (to be defined when the corresponding ARM SMMU
> driver support is added to the kernel) to be used to define the
> platform devices names, init the IOMMUs, count their resources and
> finally initialize them.
> 
> Signed-off-by: Lorenzo Pieralisi 
> Cc: Hanjun Guo 
> Cc: Tomasz Nowicki 
> Cc: "Rafael J. Wysocki" 
> ---
>  drivers/acpi/arm64/iort.c | 131 
> ++
>  1 file changed, 131 insertions(+)
> 
> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> index b89b3d3..e0a9b16 100644
> --- a/drivers/acpi/arm64/iort.c
> +++ b/drivers/acpi/arm64/iort.c
> @@ -22,6 +22,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  struct iort_its_msi_chip {
> @@ -424,6 +425,135 @@ struct irq_domain *iort_get_device_domain(struct device 
> *dev, u32 req_id)
>   return irq_find_matching_fwnode(handle, DOMAIN_BUS_PCI_MSI);
>  }
>  
> +struct iort_iommu_config {
> + const char *name;
> + int (*iommu_init)(struct acpi_iort_node *node);
> + bool (*iommu_is_coherent)(struct acpi_iort_node *node);
> + int (*iommu_count_resources)(struct acpi_iort_node *node);
> + void (*iommu_init_resources)(struct resource *res,
> +  struct acpi_iort_node *node);
> +};
> +
> +static __init
> +const struct iort_iommu_config *iort_get_iommu_cfg(struct acpi_iort_node 
> *node)
> +{
> + return NULL;
> +}
> +
> +/**
> + * iort_add_smmu_platform_device() - Allocate a platform device for SMMU
> + * @fwnode: IORT node associated fwnode handle
> + * @node: Pointer to SMMU ACPI IORT node
> + *
> + * Returns: 0 on success, <0 failure
> + */
> +static int __init iort_add_smmu_platform_device(struct fwnode_handle *fwnode,
> + struct acpi_iort_node *node)
> +{
> + struct platform_device *pdev;
> + struct resource *r;
> + enum dev_dma_attr attr;
> + int ret, count;
> + const struct iort_iommu_config *ops = iort_get_iommu_cfg(node);
> +
> + if (!ops)
> + return -ENODEV;
> +
> + pdev = platform_device_alloc(ops->name, PLATFORM_DEVID_AUTO);
> + if (!pdev)
> + return PTR_ERR(pdev);
> +
> + count = ops->iommu_count_resources(node);
> +
> + r = kcalloc(count, sizeof(*r), GFP_KERNEL);
> + if (!r) {
> + ret = -ENOMEM;
> + goto dev_put;
> + }
> +
> + ops->iommu_init_resources(r, node);
> +
> + ret = platform_device_add_resources(pdev, r, count);
> + /*
> +  * Resources are duplicated in platform_device_add_resources,
> +  * free their allocated memory
> +  */
> + kfree(r);
> +
> + if (ret)
> + goto dev_put;
> +
> + /*
> +  * Add a copy of IORT node pointer to platform_data to
> +  * be used to retrieve IORT data information.
> +  */
> + ret = platform_device_add_data(pdev, &node, sizeof(node));
> + if (ret)
> + goto dev_put;
> +
> + pdev->dev.dma_mask = kmalloc(sizeof(*pdev->dev.dma_mask), GFP_KERNEL);
> + if (!pdev->dev.dma_mask) {
> + ret = -ENOMEM;
> + goto dev_put;
> + }

Since this is exclusively for creating SMMUs, and we know they should
never have weird shenanigans going on requiring different masks, I'd be
inclined to just point dev.dma_mask at dev.coherent_dma_mask and be done
with it.

> +
> + pdev->dev.fwnode = fwnode;
> +
> + /*
> +  * Set default dma mask value for the table walker,
> +  * to be overridden on probing with correct value.
> +  */
> + *pdev->dev.dma_mask = DMA_BIT_MASK(32);
> + pdev->dev.coherent_dma_mask = *pdev->dev.dma_mask;
> +
> + attr = ops->iommu_is_coherent(node) ?
> +  DEV_DMA_COHERENT : DEV_DMA_NON_COHERENT;
> +
> + /* Configure DMA for the page table walker */
> + acpi_dma_configure(&pdev->dev, attr);

Oh look, some more code which would be simpler if acpi_dma_configure()
set the default mask itself ;)

Robin.

> +
> + ret = platform_device_add(pdev);
> + if (ret)
> + goto dma_deconfigure;
> +
> + return 0;
> +
> +dma_deconfigure:
> + acpi_dma_deconfigure(&pdev->dev);
> + kfree(pdev->dev.dma_mask);
> +
> +dev_put:
> + platform_device_put(pdev);
> +
> + return ret;
> +}
> +
> +static acpi_status __init iort_match_iommu_callback(struct acpi_iort_n

Re: [PATCH v5 09/14] drivers: iommu: arm-smmu-v3: add IORT configuration

2016-09-13 Thread Robin Murphy
On 09/09/16 15:23, Lorenzo Pieralisi wrote:
> In ACPI bases systems, in order to be able to create platform
> devices and initialize them for ARM SMMU v3 components, the IORT
> kernel implementation requires a set of static functions to be
> used by the IORT kernel layer to configure platform devices for
> ARM SMMU v3 components.
> 
> Add static configuration functions to the IORT kernel layer for
> the ARM SMMU v3 components, so that the ARM SMMU v3 driver can
> initialize its respective platform device by relying on the IORT
> kernel infrastructure and by adding a corresponding ACPI device
> early probe section entry.
> 
> Signed-off-by: Lorenzo Pieralisi 
> Cc: Will Deacon 
> Cc: Robin Murphy 
> Cc: Joerg Roedel 
> ---
>  drivers/acpi/arm64/iort.c   | 103 
> +++-
>  drivers/iommu/arm-smmu-v3.c |  95 +++-
>  2 files changed, 195 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> index e0a9b16..a2ad102 100644
> --- a/drivers/acpi/arm64/iort.c
> +++ b/drivers/acpi/arm64/iort.c
> @@ -425,6 +425,95 @@ struct irq_domain *iort_get_device_domain(struct device 
> *dev, u32 req_id)
>   return irq_find_matching_fwnode(handle, DOMAIN_BUS_PCI_MSI);
>  }
>  
> +static void __init acpi_iort_register_irq(int hwirq, const char *name,
> +   int trigger,
> +   struct resource *res)
> +{
> + int irq = acpi_register_gsi(NULL, hwirq, trigger,
> + ACPI_ACTIVE_HIGH);
> +
> + if (irq < 0) {

irq <= 0 ?

> + pr_err("could not register gsi hwirq %d name [%s]\n", hwirq,
> +   name);
> + return;
> + }
> +
> + res->start = irq;
> + res->end = irq;
> + res->flags = IORESOURCE_IRQ;
> + res->name = name;
> +}
> +
> +static int __init arm_smmu_v3_count_resources(struct acpi_iort_node *node)
> +{
> + struct acpi_iort_smmu_v3 *smmu;
> + /* Always present mem resource */
> + int num_res = 1;
> +
> + /* Retrieve SMMUv3 specific data */
> + smmu = (struct acpi_iort_smmu_v3 *)node->node_data;
> +
> + if (smmu->event_gsiv)
> + num_res++;
> +
> + if (smmu->pri_gsiv)
> + num_res++;
> +
> + if (smmu->gerr_gsiv)
> + num_res++;
> +
> + if (smmu->sync_gsiv)
> + num_res++;
> +
> + return num_res;
> +}
> +
> +static void __init arm_smmu_v3_init_resources(struct resource *res,
> +   struct acpi_iort_node *node)
> +{
> + struct acpi_iort_smmu_v3 *smmu;
> + int num_res = 0;
> +
> + /* Retrieve SMMUv3 specific data */
> + smmu = (struct acpi_iort_smmu_v3 *)node->node_data;
> +
> + res[num_res].start = smmu->base_address;
> + res[num_res].end = smmu->base_address + SZ_128K - 1;
> + res[num_res].flags = IORESOURCE_MEM;
> +
> + num_res++;
> +
> + if (smmu->event_gsiv)
> + acpi_iort_register_irq(smmu->event_gsiv, "eventq",
> +ACPI_EDGE_SENSITIVE,
> +&res[num_res++]);
> +
> + if (smmu->pri_gsiv)
> + acpi_iort_register_irq(smmu->pri_gsiv, "priq",
> +ACPI_EDGE_SENSITIVE,
> +&res[num_res++]);
> +
> + if (smmu->gerr_gsiv)
> + acpi_iort_register_irq(smmu->gerr_gsiv, "gerror",
> +ACPI_EDGE_SENSITIVE,
> +&res[num_res++]);
> +
> + if (smmu->sync_gsiv)
> + acpi_iort_register_irq(smmu->sync_gsiv, "cmdq-sync",
> +ACPI_EDGE_SENSITIVE,
> +&res[num_res++]);
> +}
> +
> +static bool __init arm_smmu_v3_is_coherent(struct acpi_iort_node *node)
> +{
> + struct acpi_iort_smmu_v3 *smmu;
> +
> + /* Retrieve SMMUv3 specific data */
> + smmu = (struct acpi_iort_smmu_v3 *)node->node_data;
> +
> + return smmu->flags & ACPI_IORT_SMMU_V3_COHACC_OVERRIDE;
> +}
> +
>  struct iort_iommu_config {
>   const char *name;
>   int (*iommu_init)(struct acpi_iort_node *node);
> @@ -434,10 +523,22 @@ struct iort_iommu_config {
>struct acpi_iort_node *node);
>  };
>  
> +stati

[PATCH] of/platform: Initialise dev->fwnode appropriately

2016-09-14 Thread Robin Murphy
Whilst we're some of the way towards a universal firmware property
interface, drivers which deal with both OF and ACPI probing end up
having to do things like this:

dev->of_node ? &dev->of_node->fwnode : dev->fwnode

This seems unnecessary, when the OF code could instead simply fill in
the device's fwnode when binding the of_node, and let the drivers use
dev->fwnode either way. Let's give it a go and see what falls out.

Signed-off-by: Robin Murphy 
---
 drivers/of/platform.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index f39ccd5aa701..f811d2796437 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -142,6 +142,7 @@ struct platform_device *of_device_alloc(struct device_node 
*np,
}
 
dev->dev.of_node = of_node_get(np);
+   dev->dev.fwnode = &np->fwnode;
dev->dev.parent = parent ? : &platform_bus;
 
if (bus_id)
@@ -241,6 +242,7 @@ static struct amba_device *of_amba_device_create(struct 
device_node *node,
 
/* setup generic device info */
dev->dev.of_node = of_node_get(node);
+   dev->dev.fwnode = &node->fwnode;
dev->dev.parent = parent ? : &platform_bus;
dev->dev.platform_data = platform_data;
if (bus_id)
-- 
2.8.1.dirty



Re: [PATCH] usb: dwc3: host: inherit dma configuration from parent dev

2016-09-02 Thread Robin Murphy
On 02/09/16 11:53, Felipe Balbi wrote:
> 
> Hi,
> 
> Arnd Bergmann  writes:
>> On Thursday, September 1, 2016 5:14:28 PM CEST Leo Li wrote:
>>>
>>> Hi Felipe and Arnd,
>>>
>>> It has been a while since the last response to this discussion, but we
>>> haven't reached an agreement yet!  Can we get to a conclusion on if it
>>> is valid to create child platform device for abstraction purpose?  If
>>> yes, can this child device do DMA by itself?
>>
>> I'd say it's no problem for a driver to create child devices in order
>> to represent different aspects of a device, but you should not rely on
>> those devices working when used with the dma-mapping interfaces.
> 
> heh, that looks like an excuse to me :-)
> 
> This will always be a problem for e.g. MFD, for example. Are you saying
> MFD child-devices shouldn't be allowed to do DMA? It becomes silly when
> you read it that way, right?
> 
>> This used to be simpler back when we could configure the kernel for
>> only one SoC platform at a time, and the platforms could provide their
>> own overrides for the dma-mapping interfaces. These days, we rely on
> 
> right, so we have a very old regression that just took a complex driver
> such as dwc3 to trigger ;-)
> 
>> firmware or bootloader to describe various aspects of how DMA is done,
> 
> there's no DMA description in DT. Every OF device gets the same 32-bit
> DMA mask and that is, itself, wrong for several devices.

Huh? There's only no DMA description in DT if the device can be assumed
to be happy with the defaults. Anything else should be using
"dma-ranges", "dma-coherent", etc. to describe non-default integration
aspects. For devices with an inherent fixed addressing capability !=32
bits, then it's down to the driver to call dma_set_mask() appropriately
to override the default 32-bit mask (which is not unique to OF-probed
devices either).

Sure, it's by no means a perfect API, but you're railing against
untruths here.

Robin.

>> so you can't assume that passing a device without an of_node pointer
>> or ACPI data into those functions will do the right thing.
> 
> That's not the problem, however. We can very easily pass along
> ACPI_COMPANION() to any platform_device we want, but that's not enough
> because DMA-related bits are passed along with archdata; but archdata
> isn't generic in any way. Some arches (like x86) _do_ use it for DMA,
> but some don't.
> 
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 



Re: [PATCH 10/11] soc: ti: knav_qmss_queue: use of_property_read_bool

2016-08-05 Thread Robin Murphy
Hi Julia,

On 05/08/16 09:56, Julia Lawall wrote:
> Use of_property_read_bool to check for the existence of a property.

This caught my eye since Rob told me off for doing the same recently[1].

> The semantic patch that makes this change is as follows:
> (http://coccinelle.lip6.fr/)
> 
> // 
> @@
> expression e1,e2;
> statement S2,S1;
> @@
> -   if (of_get_property(e1,e2,NULL))
> +   if (of_property_read_bool(e1,e2))
> S1 else S2
> // 
> 
> Signed-off-by: Julia Lawall 
> 
> ---
>  drivers/soc/ti/knav_qmss_queue.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/soc/ti/knav_qmss_queue.c 
> b/drivers/soc/ti/knav_qmss_queue.c
> index b73e353..56b5d7c 100644
> --- a/drivers/soc/ti/knav_qmss_queue.c
> +++ b/drivers/soc/ti/knav_qmss_queue.c
> @@ -1240,7 +1240,7 @@ static int knav_setup_queue_range(struct knav_device 
> *kdev,
>   if (of_get_property(node, "qalloc-by-id", NULL))

According to the binding, "qalloc-by-id" _is_ a boolean property, so
this one really does deserve to be of_property_read_bool()...

>   range->flags |= RANGE_RESERVED;
>  
> - if (of_get_property(node, "accumulator", NULL)) {
> + if (of_property_read_bool(node, "accumulator")) {

...whereas "accumulator" must have a value, so this isn't technically
appropriate. In general, most of these "if the property exists, read the
property and do stuff" checks are probably a sign of code that could be
simplified by refactoring the "do stuff" step to just specifically
handle the "read the property" step returning -EINVAL when it's not present.

Robin.

[1]:https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg13375.html

>   ret = knav_init_acc_range(kdev, node, range);
>   if (ret < 0) {
>   devm_kfree(dev, range);
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 



[PATCH v5 19/19] iommu/dma: Add support for mapping MSIs

2016-08-23 Thread Robin Murphy
When an MSI doorbell is located downstream of an IOMMU, attaching
devices to a DMA ops domain and switching on translation leads to a rude
shock when their attempt to write to the physical address returned by
the irqchip driver faults (or worse, writes into some already-mapped
buffer) and no interrupt is forthcoming.

Address this by adding a hook for relevant irqchip drivers to call from
their compose_msi_msg() callback, to swizzle the physical address with
an appropriatly-mapped IOVA for any device attached to one of our DMA
ops domains.

CC: Thomas Gleixner 
CC: Jason Cooper 
CC: Marc Zyngier 
CC: linux-kernel@vger.kernel.org
Signed-off-by: Robin Murphy 
---
 drivers/iommu/dma-iommu.c| 141 ++-
 drivers/irqchip/irq-gic-v2m.c|   3 +
 drivers/irqchip/irq-gic-v3-its.c |   3 +
 include/linux/dma-iommu.h|   9 +++
 4 files changed, 141 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 00c8a08d56e7..330cce60cad9 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -25,10 +25,29 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 
+struct iommu_dma_msi_page {
+   struct list_headlist;
+   dma_addr_t  iova;
+   u32 phys_lo;
+   u32 phys_hi;
+};
+
+struct iommu_dma_cookie {
+   struct iova_domain  iovad;
+   struct list_headmsi_page_list;
+   spinlock_t  msi_lock;
+};
+
+static inline struct iova_domain *cookie_iovad(struct iommu_domain *domain)
+{
+   return &((struct iommu_dma_cookie *)domain->iova_cookie)->iovad;
+}
+
 int iommu_dma_init(void)
 {
return iova_cache_get();
@@ -43,15 +62,19 @@ int iommu_dma_init(void)
  */
 int iommu_get_dma_cookie(struct iommu_domain *domain)
 {
-   struct iova_domain *iovad;
+   struct iommu_dma_cookie *cookie;
 
if (domain->iova_cookie)
return -EEXIST;
 
-   iovad = kzalloc(sizeof(*iovad), GFP_KERNEL);
-   domain->iova_cookie = iovad;
+   cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
+   if (!cookie)
+   return -ENOMEM;
 
-   return iovad ? 0 : -ENOMEM;
+   spin_lock_init(&cookie->msi_lock);
+   INIT_LIST_HEAD(&cookie->msi_page_list);
+   domain->iova_cookie = cookie;
+   return 0;
 }
 EXPORT_SYMBOL(iommu_get_dma_cookie);
 
@@ -63,14 +86,20 @@ EXPORT_SYMBOL(iommu_get_dma_cookie);
  */
 void iommu_put_dma_cookie(struct iommu_domain *domain)
 {
-   struct iova_domain *iovad = domain->iova_cookie;
+   struct iommu_dma_cookie *cookie = domain->iova_cookie;
+   struct iommu_dma_msi_page *msi, *tmp;
 
-   if (!iovad)
+   if (!cookie)
return;
 
-   if (iovad->granule)
-   put_iova_domain(iovad);
-   kfree(iovad);
+   if (cookie->iovad.granule)
+   put_iova_domain(&cookie->iovad);
+
+   list_for_each_entry_safe(msi, tmp, &cookie->msi_page_list, list) {
+   list_del(&msi->list);
+   kfree(msi);
+   }
+   kfree(cookie);
domain->iova_cookie = NULL;
 }
 EXPORT_SYMBOL(iommu_put_dma_cookie);
@@ -88,7 +117,7 @@ EXPORT_SYMBOL(iommu_put_dma_cookie);
  */
 int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, u64 
size)
 {
-   struct iova_domain *iovad = domain->iova_cookie;
+   struct iova_domain *iovad = cookie_iovad(domain);
unsigned long order, base_pfn, end_pfn;
 
if (!iovad)
@@ -155,7 +184,7 @@ int dma_direction_to_prot(enum dma_data_direction dir, bool 
coherent)
 static struct iova *__alloc_iova(struct iommu_domain *domain, size_t size,
dma_addr_t dma_limit)
 {
-   struct iova_domain *iovad = domain->iova_cookie;
+   struct iova_domain *iovad = cookie_iovad(domain);
unsigned long shift = iova_shift(iovad);
unsigned long length = iova_align(iovad, size) >> shift;
 
@@ -171,7 +200,7 @@ static struct iova *__alloc_iova(struct iommu_domain 
*domain, size_t size,
 /* The IOVA allocator knows what we mapped, so just unmap whatever that was */
 static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t dma_addr)
 {
-   struct iova_domain *iovad = domain->iova_cookie;
+   struct iova_domain *iovad = cookie_iovad(domain);
unsigned long shift = iova_shift(iovad);
unsigned long pfn = dma_addr >> shift;
struct iova *iova = find_iova(iovad, pfn);
@@ -294,7 +323,7 @@ struct page **iommu_dma_alloc(struct device *dev, size_t 
size, gfp_t gfp,
void (*flush_page)(struct device *, const void *, phys_addr_t))
 {
struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
-   struct iova_domain *iovad = domain->iova_cookie;
+   struct iova_domain *iovad = cookie_iovad(domain);

Re: [PATCH v5 19/19] iommu/dma: Add support for mapping MSIs

2016-08-24 Thread Robin Murphy
On 24/08/16 09:16, Thomas Gleixner wrote:
> On Tue, 23 Aug 2016, Robin Murphy wrote:
>> +cookie = domain->iova_cookie;
>> +iovad = &cookie->iovad;
>> +
>> +spin_lock(&cookie->msi_lock);
>> +list_for_each_entry(msi_page, &cookie->msi_page_list, list)
>> +if (msi_page->phys_hi == msg->address_hi &&
>> +msi_page->phys_lo - msg->address_lo < iovad->granule)
>> +goto unlock;
>> +
>> +ret = __iommu_dma_map_msi_page(dev, msg, domain, &msi_page);
>> +unlock:
>> +spin_unlock(&cookie->msi_lock);
>> +
>> +if (!ret) {
>> +msg->address_hi = upper_32_bits(msi_page->iova);
>> +msg->address_lo &= iova_mask(iovad);
>> +msg->address_lo += lower_32_bits(msi_page->iova);
>> +} else {
>> +/*
>> + * We're called from a void callback, so the best we can do is
>> + * 'fail' by filling the message with obviously bogus values.
>> + * Since we got this far due to an IOMMU being present, it's
>> + * not like the existing address would have worked anyway...
>> + */
>> +msg->address_hi = ~0U;
>> +msg->address_lo = ~0U;
>> +msg->data = ~0U;
>> +}
> 
> The above is really horrible to parse. I had to read it five times to
> understand the logic.

Yeah, on reflection it is needlessly hideous. I think we should take
this as a clear lesson that whenever you find yourself thinking "Man, I
wish I had Python's for...else construct here", you're doing it wrong ;)

> static struct iommu_dma_msi_page *
> find_or_map_msi_page(struct iommu_dma_cookie *cookie, struct msi_msg *msg)
> {
>   struct iova_domain *iovad = &cookie->iovad;
>   struct iommu_dma_msi_page *page;
> 
>   list_for_each_entry(*page, &cookie->msi_page_list, list) {
>   if (page->phys_hi == msg->address_hi &&
>   page->phys_lo - msg->address_lo < iovad->granule)
>   return page;
>   }
> 
>   /*
>* FIXME: __iommu_dma_map_msi_page() should return a page or NULL.
>* The integer return value is pretty pointless. If seperate error
>* codes are required that's what ERR_PTR() is for 
>*/
>   ret = __iommu_dma_map_msi_page(dev, msg, domain, &page);
>   return ret ? ERR_PTR(ret) : page;
> }
> 
> So now the code in iommu_dma_map_msi_msg() becomes:
> 
>   spin_lock(&cookie->msi_lock);
>   msi_page = find_or_map_msi_page(cookie, msg);
>   spin_unlock(&cookie->msi_lock);
> 
>   if (!IS_ERR_OR_NULL(msi_page)) {
>   msg->address_hi = upper_32_bits(msi_page->iova);
>   msg->address_lo &= iova_mask(iovad);
>   msg->address_lo += lower_32_bits(msi_page->iova);
>   } else {
>   /*
>* We're called from a void callback, so the best we can do is
>* 'fail' by filling the message with obviously bogus values.
>* Since we got this far due to an IOMMU being present, it's
>* not like the existing address would have worked anyway...
>*/
>   msg->address_hi = ~0U;
>   msg->address_lo = ~0U;
>   msg->data = ~0U;
>   }
> 
> Hmm? 

OK, I've turned map_msi_page into get_msi_page (returning a page) and
just hoisted the list lookup into that, which leads to knock-on
simplifications throughout and is _much_ nicer. I now can't imagine why
I didn't get that far in the first place - thanks for the reality check!

Robin.

> 
> Thanks,
> 
>   tglx
> 



Re: [PATCH v5 19/19] iommu/dma: Add support for mapping MSIs

2016-08-25 Thread Robin Murphy
Hi Eric,

On Fri, 26 Aug 2016 00:25:34 +0200
Auger Eric  wrote:

> Hi Robin,
> 
> On 23/08/2016 21:05, Robin Murphy wrote:
> > When an MSI doorbell is located downstream of an IOMMU, attaching
> > devices to a DMA ops domain and switching on translation leads to a
> > rude shock when their attempt to write to the physical address
> > returned by the irqchip driver faults (or worse, writes into some
> > already-mapped buffer) and no interrupt is forthcoming.
> > 
> > Address this by adding a hook for relevant irqchip drivers to call
> > from their compose_msi_msg() callback, to swizzle the physical
> > address with an appropriatly-mapped IOVA for any device attached to
> > one of our DMA ops domains.
> > 
> > CC: Thomas Gleixner 
> > CC: Jason Cooper 
> > CC: Marc Zyngier 
> > CC: linux-kernel@vger.kernel.org
> > Signed-off-by: Robin Murphy 
> > ---
> >  drivers/iommu/dma-iommu.c| 141
> > ++-
> > drivers/irqchip/irq-gic-v2m.c|   3 +
> > drivers/irqchip/irq-gic-v3-its.c |   3 +
> > include/linux/dma-iommu.h|   9 +++ 4 files changed, 141
> > insertions(+), 15 deletions(-)
> > 
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index 00c8a08d56e7..330cce60cad9 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -25,10 +25,29 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> >  
> > +struct iommu_dma_msi_page {
> > +   struct list_headlist;
> > +   dma_addr_t  iova;
> > +   u32 phys_lo;
> > +   u32 phys_hi;
> > +};
> > +
> > +struct iommu_dma_cookie {
> > +   struct iova_domain  iovad;
> > +   struct list_headmsi_page_list;
> > +   spinlock_t  msi_lock;
> > +};
> > +
> > +static inline struct iova_domain *cookie_iovad(struct iommu_domain
> > *domain) +{
> > +   return &((struct iommu_dma_cookie
> > *)domain->iova_cookie)->iovad; +}
> > +
> >  int iommu_dma_init(void)
> >  {
> > return iova_cache_get();
> > @@ -43,15 +62,19 @@ int iommu_dma_init(void)
> >   */
> >  int iommu_get_dma_cookie(struct iommu_domain *domain)
> >  {
> > -   struct iova_domain *iovad;
> > +   struct iommu_dma_cookie *cookie;
> >  
> > if (domain->iova_cookie)
> > return -EEXIST;
> >  
> > -   iovad = kzalloc(sizeof(*iovad), GFP_KERNEL);
> > -   domain->iova_cookie = iovad;
> > +   cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
> > +   if (!cookie)
> > +   return -ENOMEM;
> >  
> > -   return iovad ? 0 : -ENOMEM;
> > +   spin_lock_init(&cookie->msi_lock);
> > +   INIT_LIST_HEAD(&cookie->msi_page_list);
> > +   domain->iova_cookie = cookie;
> > +   return 0;
> >  }
> >  EXPORT_SYMBOL(iommu_get_dma_cookie);
> >  
> > @@ -63,14 +86,20 @@ EXPORT_SYMBOL(iommu_get_dma_cookie);
> >   */
> >  void iommu_put_dma_cookie(struct iommu_domain *domain)
> >  {
> > -   struct iova_domain *iovad = domain->iova_cookie;
> > +   struct iommu_dma_cookie *cookie = domain->iova_cookie;
> > +   struct iommu_dma_msi_page *msi, *tmp;
> >  
> > -   if (!iovad)
> > +   if (!cookie)
> > return;
> >  
> > -   if (iovad->granule)
> > -   put_iova_domain(iovad);
> > -   kfree(iovad);
> > +   if (cookie->iovad.granule)
> > +   put_iova_domain(&cookie->iovad);
> > +
> > +   list_for_each_entry_safe(msi, tmp, &cookie->msi_page_list,
> > list) {
> > +   list_del(&msi->list);
> > +   kfree(msi);
> > +   }
> > +   kfree(cookie);
> > domain->iova_cookie = NULL;
> >  }
> >  EXPORT_SYMBOL(iommu_put_dma_cookie);
> > @@ -88,7 +117,7 @@ EXPORT_SYMBOL(iommu_put_dma_cookie);
> >   */
> >  int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t
> > base, u64 size) {
> > -   struct iova_domain *iovad = domain->iova_cookie;
> > +   struct iova_domain *iovad = cookie_iovad(domain);
> > unsigned long order, base_pfn, end_pfn;
> >  
> > if (!iovad)
> > @@ -155,7 +184,7 @@ int dma_direction_to_prot(enum
> > dma_data_direction dir, bool coherent) static struct iova
> > *__alloc_iova(struct iommu_domain *domain, size_t size, dma_addr_t
> > dma_limi

Re: [PATCH] irqchip/gic-{v2m, v3-its}: check iommu capable before doing iommu map

2017-08-17 Thread Robin Murphy
On 17/08/17 10:22, Marc Zyngier wrote:
> On 17/08/17 10:01, Shawn Lin wrote:
>> Hi Marc
>>
>> On 2017/8/17 16:52, Marc Zyngier wrote:
>>> On 17/08/17 09:28, Shawn Lin wrote:
 If a PCIe RC use gic-v2m or gic-v3-its as a msi domain but doesn't
 have iommu support, we don't need to do iommu_dma_map_msi_msg to
 get mapped iommu address as all we need is the physical address.
 Otherwise we see the oops shown below as we can't get a iommu_group
 for a device without iommu capable.

 Unable to handle kernel NULL pointer dereference at virtual address
 00d0
 [00d0] user address but active_mm is swapper
 Internal error: Oops: 9604 [#1] PREEMPT SMP
 Modules linked in:
 CPU: 3 PID: 1 Comm: swapper/0 Not tainted
 4.13.0-rc5-next-20170815-1-g0744890-dirty #53
 Hardware name: Firefly-RK3399 Board (DT)
 task: 80007bc7 task.stack: 80007bc6c000
 PC is at iommu_get_domain_for_dev+0x38/0x50
 LR is at iommu_dma_map_msi_msg+0x3c/0x1b8
 pc : [] lr : [] pstate:
 a045

 ...

 [] iommu_get_domain_for_dev+0x38/0x50
 [] iommu_dma_map_msi_msg+0x3c/0x1b8
 [] its_irq_compose_msi_msg+0x44/0x50
 [] irq_chip_compose_msi_msg+0x40/0x58
 [] msi_domain_activate+0x1c/0x48
 [] __irq_domain_activate_irq+0x40/0x58
 [] irq_domain_activate_irq+0x24/0x40
 [] msi_domain_alloc_irqs+0x104/0x190
 [] pci_msi_setup_msi_irqs+0x3c/0x48
 [] __pci_enable_msi_range+0x21c/0x408
 [] pci_alloc_irq_vectors_affinity+0x104/0x168
 [] pcie_port_device_register+0x200/0x488
 [] pcie_portdrv_probe+0x34/0xd0
 [] local_pci_probe+0x3c/0xb8
 [] pci_device_probe+0x138/0x170
 [] driver_probe_device+0x21c/0x2d8
 [] __device_attach_driver+0x9c/0xf8
 [] bus_for_each_drv+0x5c/0x98
 [] __device_attach+0xc4/0x138
 [] device_attach+0x10/0x18
 [] pci_bus_add_device+0x4c/0xa8
 [] pci_bus_add_devices+0x44/0x90
 [] rockchip_pcie_probe+0xc70/0xcc8
 [] platform_drv_probe+0x58/0xc0
 [] driver_probe_device+0x21c/0x2d8
 [] __driver_attach+0xac/0xb0
 [] bus_for_each_dev+0x64/0xa0
 [] driver_attach+0x20/0x28
 [] bus_add_driver+0x110/0x230
 [] driver_register+0x60/0xf8
 [] __platform_driver_register+0x40/0x48
 [] rockchip_pcie_driver_init+0x18/0x20
 [] do_one_initcall+0xb0/0x120
 [] kernel_init_freeable+0x184/0x224
 [] kernel_init+0x10/0x100
 [] ret_from_fork+0x10/0x18

 This patch is to fix the problem exposed by the commit mentioned below.
 Before this commit, iommu has a work around method to fix this but now
 it doesn't. So we could fix this in gic code but maybe still need a fixes
 tag here.

 Fixes: 05f80300dc8b ("iommu: Finish making iommu_group support mandatory")
 Signed-off-by: Shawn Lin 
>>>
>>> That looks pretty horrible. Please see the patch I posted in response to
>>> your initial report:
>>>
>>> http://marc.info/?l=linux-pci&m=150295809430903&w=2
>>
>> Aha, I saw it but I didn't get your point of "I'm not convinced  that
>> playing with the group refcount in an irqchip is the right approach..."
>>
>> So, I'm not sure whether you prefer your patch?
> I really dislike both:
> 
> - yours: because it reinvent the wheel (parsing DT one more time on each
> MSI message creation), and is DT specific while the rest of the code is
> completely firmware agnostic (what about ACPI?).
> 
> - mine: because it relies on an intimate knowledge of the internals of
> the iommu stuff, which is not what was originally intended.
> 
> My comment was an invitation to Joerg to speak his mind.
> 
> The original intent of iommu_dma_map_msi_msg() was that it could silently
> fail without exploding in your face. We can change that, but a minimum of
> coordination wouldn't hurt.
> 
> My personal preference would be to address this in the iommu layer,
> restoring a non-exploding iommu_dma_map_msi_msg():

Indeed, but that still doesn't fix other breakages that simply haven't
been found yet ;)

Thanks for the reports - I see exactly what I've done here. Gimme a
moment to write it up convincingly...

Robin.

> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 44eca1e3243f..5440eae21bea 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -883,12 +883,18 @@ static struct iommu_dma_msi_page 
> *iommu_dma_get_msi_page(struct device *dev,
>  void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
>  {
>   struct device *dev = msi_desc_to_dev(irq_get_msi_desc(irq));
> - struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> + struct iommu_domain *domain;
> + struct iommu_group *group;
>   struct iommu_dma_cookie *cookie;
>   struct iommu_dma_msi_page *msi_page;
>   phys_addr_t msi_addr = (u64)msg->address_hi << 32 | msg->address_lo;
>   unsigned long flags;
>  
> + group = iommu_group_get(dev);
> + if (!group)
> + 

[PATCH] iommu: Avoid NULL group dereference

2017-08-17 Thread Robin Murphy
The recently-removed FIXME in iommu_get_domain_for_dev() turns out to
have been a little misleading, since that check is still worthwhile even
when groups *are* universal. We have a few IOMMU-aware drivers which
only care whether their device is already attached to an existing domain
or not, for which the previous behaviour of iommu_get_domain_for_dev()
was ideal, and who now crash if their device does not have an IOMMU.

With IOMMU groups now serving as a reliable indicator of whether a
device has an IOMMU or not (barring false-positives from VFIO no-IOMMU
mode), drivers could arguably do this:

group = iommu_group_get(dev);
if (group) {
domain = iommu_get_domain_for_dev(dev);
iommu_group_put(group);
}

However, rather than duplicate that code across multiple callsites,
particularly when it's still only the domain they care about, let's skip
straight to the next step and factor out the check into the common place
it applies - in iommu_get_domain_for_dev() itself. Sure, it ends up
looking rather familiar, but now it's backed by the reasoning of having
a robust API able to do the expected thing for all devices regardless.

Fixes: 05f80300dc8b ("iommu: Finish making iommu_group support mandatory")
Reported-by: Shawn Lin 
Signed-off-by: Robin Murphy 
---

As well as dma-iommu, there are at least the Cavium ThunderX and
Freescale DPAA2 ethernet drivers expecting this to work too.

 drivers/iommu/iommu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index af69bf7e035a..5499a0387349 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1352,6 +1352,8 @@ struct iommu_domain *iommu_get_domain_for_dev(struct 
device *dev)
struct iommu_group *group;
 
group = iommu_group_get(dev);
+   if (!group)
+   return NULL;
 
domain = group->domain;
 
-- 
2.13.4.dirty



Re: [PATCH] iommu: Avoid NULL group dereference

2017-08-17 Thread Robin Murphy
On 17/08/17 16:41, Joerg Roedel wrote:
> On Thu, Aug 17, 2017 at 11:40:08AM +0100, Robin Murphy wrote:
>> The recently-removed FIXME in iommu_get_domain_for_dev() turns out to
>> have been a little misleading, since that check is still worthwhile even
>> when groups *are* universal. We have a few IOMMU-aware drivers which
>> only care whether their device is already attached to an existing domain
>> or not, for which the previous behaviour of iommu_get_domain_for_dev()
>> was ideal, and who now crash if their device does not have an IOMMU.
>>
>> With IOMMU groups now serving as a reliable indicator of whether a
>> device has an IOMMU or not (barring false-positives from VFIO no-IOMMU
>> mode), drivers could arguably do this:
>>
>>  group = iommu_group_get(dev);
>>  if (group) {
>>  domain = iommu_get_domain_for_dev(dev);
>>  iommu_group_put(group);
>>  }
> 
> Okay, so just to check I got it right: Drivers do the above to check
> whether a device is managed by an IOMMU, and that crashes now because
> the 'group == NULL' check was removed?

Indeed - the typical context is network descriptors that don't have
space to store the CPU virtual address of the buffer, so when a packet
arrives the driver has to work backwards from the DMA address, in this
sort of pattern:

addr = desc[idx]->addr;
domain = iommu_get_domain_for_dev(dev);
if (domain)
addr = iommu_iova_to_phys(domain, addr)
buf = phys_to_virt(addr)

(the GIC thing is similar but in reverse, with a physical address which
may or may not need replacing with an IOVA). Unless we were to change
the interface to be iommu_get_domain_for_group(), I think it makes sense
for it to remain valid to call for any device.

Robin.


Re: [PATCH] PCI/MSI: Improve MSI alias detection

2017-06-20 Thread Robin Murphy
On 19/06/17 22:52, Bjorn Helgaas wrote:
> [+cc David, Alex]
> 
> On Wed, May 31, 2017 at 06:52:28PM +0100, Robin Murphy wrote:
>> Currently, we consider all DMA aliases when calculating MSI requester
>> IDs. This turns out to be the wrong thing to do in the face of pure DMA
>> quirks like those of Marvell SATA cards, where we can end up configuring
>> the MSI doorbell to expect the phantom function alias, such that it then
>> goes on to happily ignore actual MSI writes coming from the card on the
>> real RID.
> 
> I guess you're making the assumption that DMA might use phantom
> function aliases (aliases on the same bus as the actual device), but
> MSI writes will not?

Indeed - that's certainly the case for the Marvell 88SE9128 that I have
here (and it was a report involving a similar Marvell SATA card that
first brought this to our attention). I appreciate that there's not
necessarily a general right answer, but the generic MSI infrastructure
is rather built around the notion of a device having a single ID, so
this is really a case of picking the most likely compromise to cover the
greatest number of cases.

>> Improve the alias walk to only account for the topological aliases that
>> matter, based on the logic from the Intel IRQ remapping code.
> 
> Can you include a specific function reference here?  It sounds like
> it'd be useful to compare this with the Intel IRQ remapping code.

set_msi_sid() in drivers/iommu/intel_irq_remapping.c - one small
difference from VT-d is that here we don't have an equivalent of the
"verify bus" option, only of considering the full ID.

>> Signed-off-by: Robin Murphy 
>> ---
>>  drivers/pci/msi.c | 18 +-
>>  1 file changed, 13 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
>> index ba44fdfda66b..7b34c434970e 100644
>> --- a/drivers/pci/msi.c
>> +++ b/drivers/pci/msi.c
>> @@ -1468,13 +1468,21 @@ struct irq_domain *pci_msi_create_irq_domain(struct 
>> fwnode_handle *fwnode,
>>  }
>>  EXPORT_SYMBOL_GPL(pci_msi_create_irq_domain);
>>  
>> +/*
>> + * If the alias device/RID is on a different bus, it's a topological alias
>> + * we should care about; otherwise, it's a DMA quirk and we don't.
>> + */
>>  static int get_msi_id_cb(struct pci_dev *pdev, u16 alias, void *data)
>>  {
>>  u32 *pa = data;
>> +u8 bus = PCI_BUS_NUM(*pa);
>> +
>> +if (pdev->bus->number != bus || PCI_BUS_NUM(alias) != bus)
>> +*pa = alias;
>>  
>> -*pa = alias;
> 
> Aliases can arise both because of device defects, e.g., the phantom
> function quirks, and because of bridges, e.g., all the stuff in
> pci_for_each_dma_alias().
> 
> Why is it that the users of get_msi_id_cb() can get away with
> collapsing any of these bridge aliases into the last one that
> pci_for_each_dma_alias() finds?  I guess this is another question
> about exactly why DMA and MSI are different here.

As before, it's mostly just a case of trying to pick the least-worst
option to fit the constraints of the generic MSI layer (assuming a
bridge in the PCI->PCIe direction that is always going to introduce an
alias) - based on the assumption that the VT-d code has presumably
worked well enough in practice doing a similar thing - even if that
doesn't quite seem to match my reading of the bridge spec (AFAICS it may
be true for reads, but writes are somewhat murkier).

>From the DMA angle[1], we do have the ability to simply consider every
alias for translation at the IOMMU, since the IOMMU layer already has
provisions for devices having multiple IDs. Indeed in the Marvell case
it is certainly necessary to consider both the real and phantom
functions so that both DMA and MSIs even get through in the first place.

Robin.

[1]:https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg17979.html

>>  return 0;
>>  }
>> +
>>  /**
>>   * pci_msi_domain_get_msi_rid - Get the MSI requester id (RID)
>>   * @domain: The interrupt domain
>> @@ -1488,7 +1496,7 @@ static int get_msi_id_cb(struct pci_dev *pdev, u16 
>> alias, void *data)
>>  u32 pci_msi_domain_get_msi_rid(struct irq_domain *domain, struct pci_dev 
>> *pdev)
>>  {
>>  struct device_node *of_node;
>> -u32 rid = 0;
>> +u32 rid = PCI_DEVID(pdev->bus->number, pdev->devfn);
>>  
>>  pci_for_each_dma_alias(pdev, get_msi_id_cb, &rid);
>>  
>> @@ -1504,14 +1512,14 @@ u32 pci_msi_domain_get_msi_rid(struct irq_domain 
>> *domain, struct pci_dev *pdev)
>>   * @pdev:   The PCI device
>>   *
>>   * Use the firmwa

Re: [PATCH 1/1] iommu/arm-smmu-v3: replace writel with writel_relaxed in queue_inc_prod

2017-06-20 Thread Robin Murphy
On 20/06/17 12:04, Zhen Lei wrote:
> This function is protected by spinlock, and the latter will do memory
> barrier implicitly. So that we can safely use writel_relaxed. In fact, the
> dmb operation will lengthen the time protected by lock, which indirectly
> increase the locking confliction in the stress scene.

If you remove the DSB between writing the commands (to Normal memory)
and writing the pointer (to Device memory), how can you guarantee that
the complete command is visible to the SMMU and it isn't going to try to
consume stale memory contents? The spinlock is irrelevant since it's
taken *before* the command is written.

Robin.

> Signed-off-by: Zhen Lei 
> ---
>  drivers/iommu/arm-smmu-v3.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 380969a..d2fbee3 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -728,7 +728,7 @@ static void queue_inc_prod(struct arm_smmu_queue *q)
>   u32 prod = (Q_WRP(q, q->prod) | Q_IDX(q, q->prod)) + 1;
> 
>   q->prod = Q_OVF(q, q->prod) | Q_WRP(q, prod) | Q_IDX(q, prod);
> - writel(q->prod, q->prod_reg);
> + writel_relaxed(q->prod, q->prod_reg);
>  }
> 
>  /*
> --
> 2.5.0
> 
> 



Re: new dma-mapping tree, was Re: clean up and modularize arch dma_mapping interface V2

2017-06-20 Thread Robin Murphy
Hi Christoph,

On 20/06/17 13:41, Christoph Hellwig wrote:
> On Fri, Jun 16, 2017 at 08:10:15PM +0200, Christoph Hellwig wrote:
>> I plan to create a new dma-mapping tree to collect all this work.
>> Any volunteers for co-maintainers, especially from the iommu gang?
> 
> Ok, I've created the new tree:
> 
>git://git.infradead.org/users/hch/dma-mapping.git for-next
> 
> Gitweb:
> 
>
> http://git.infradead.org/users/hch/dma-mapping.git/shortlog/refs/heads/for-next
> 
> And below is the patch to add the MAINTAINERS entry, additions welcome.

I'm happy to be a reviewer, since I've been working in this area for
some time, particularly with the dma-iommu code and arm64 DMA ops.

Robin.

> Stephen, can you add this to linux-next?
> 
> ---
> From 335979c41912e6c101a20b719862b2d837370df1 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Tue, 20 Jun 2017 11:17:30 +0200
> Subject: MAINTAINERS: add entry for dma mapping helpers
> 
> This code has been spread between getting in through arch trees, the iommu
> tree, -mm and the drivers tree.  There will be a lot of work in this area,
> including consolidating various arch implementations into more common
> code, so ensure we have a proper git tree that facilitates cooperation with
> the architecture maintainers.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  MAINTAINERS | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 09b5ab6a8a5c..56859d53a424 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2595,6 +2595,19 @@ S: Maintained
>  F:   net/bluetooth/
>  F:   include/net/bluetooth/
>  
> +DMA MAPPING HELPERS
> +M:   Christoph Hellwig 
> +L:   linux-kernel@vger.kernel.org
> +T:   git git://git.infradead.org/users/hch/dma-mapping.git
> +W:   http://git.infradead.org/users/hch/dma-mapping.git
> +S:   Supported
> +F:   lib/dma-debug.c
> +F:   lib/dma-noop.c
> +F:   lib/dma-virt.c
> +F:   drivers/base/dma-mapping.c
> +F:   drivers/base/dma-coherent.c
> +F:   include/linux/dma-mapping.h
> +
>  BONDING DRIVER
>  M:   Jay Vosburgh 
>  M:   Veaceslav Falico 
> 



Re: [PATCH v5 3/7] drivers: dma-coherent: Account dma_pfn_offset when used with device tree

2017-06-20 Thread Robin Murphy
On 20/06/17 14:42, Christoph Hellwig wrote:
> Wouldn't the smal patch below solve the same issue in a simple way?

That assumes that all devices accessing a shared pool will have the same
 dma_pfn_offset as the first one, which cannot strictly be relied upon
(even if it is highly likely in practice).

Robin.

> diff --git a/drivers/base/dma-coherent.c b/drivers/base/dma-coherent.c
> index 640a7e63c453..e8f8447d705b 100644
> --- a/drivers/base/dma-coherent.c
> +++ b/drivers/base/dma-coherent.c
> @@ -290,9 +290,11 @@ EXPORT_SYMBOL(dma_mmap_from_coherent);
>  static int rmem_dma_device_init(struct reserved_mem *rmem, struct device 
> *dev)
>  {
>   struct dma_coherent_mem *mem = rmem->priv;
> + dma_addr_t dev_addr = ((rmem->base >> PAGE_SHIFT) -
> + dev->dma_pfn_offset) << PAGE_SHIFT;
>  
>   if (!mem &&
> - !dma_init_coherent_memory(rmem->base, rmem->base, rmem->size,
> + !dma_init_coherent_memory(rmem->base, dev_addr, rmem->size,
> DMA_MEMORY_MAP | DMA_MEMORY_EXCLUSIVE,
> &mem)) {
>   pr_err("Reserved memory: failed to init DMA memory pool at %pa, 
> size %ld MiB\n",
> 



Re: [PATCH v5 4/7] drivers: dma-coherent: Introduce default DMA pool

2017-06-20 Thread Robin Murphy
On 20/06/17 14:49, Christoph Hellwig wrote:
> On Wed, May 24, 2017 at 11:24:29AM +0100, Vladimir Murzin wrote:
>> This patch introduces default coherent DMA pool similar to default CMA
>> area concept. To keep other users safe code kept under CONFIG_ARM.
> 
> I don't see a CONFIG_ARM in the code, although parts of it are added
> under CONFIG_OF_RESERVED_MEM.

It's in rmem_dma_setup() (line 325) currently enforcing no-map for ARM.

> But overall this code look a bit odd to me.  As far as I can tell
> the dma-coherent.c code is for the case where we have a special
> piece of coherent memory close to a device.

True, but the case here is where we need a special piece of coherent
memory for *all* devices, and it was more complicated *not* to reuse the
existing infrastructure. This would already be achievable by specifying
a separate rmem carveout per device, but the shared pool just makes life
easier, and mirrors the functionality dma-contiguous already supports.

> If you're allocating out of the global allocator the memory should
> come from the normal dma_ops ->alloc allocator - and also take
> the attrs into account (e.g. for DMA_ATTR_NON_CONSISTENT or
> DMA_ATTR_NO_KERNEL_MAPPING requests you don't need coherent memory)

The context here is noMMU but with caches - the problem being that the
normal allocator will give back kernel memory, and there's no way to
make that coherent with devices short of not enabling the caches in the
first place, which is obviously undesirable. The trick is that RAM is
aliased (in hardware) at two addresses, one of which makes CPU accesses
non-cacheable, so by only ever accessing the RAM set aside for the
coherent DMA pool using the non-cacheable alias (represented by the
dma_pfn_offset) we can achieve DMA coherency.

It perhaps seems a bit backwards, but we do actually end up honouring
DMA_ATTR_NON_CONSISTENT to a degree in patch #5, as such requests are
the only ones allowed to fall back to the normal dma_ops allocator.

Robin.


Re: CONFIG_DMA_NOOP_OPS breaks ARM arch

2017-10-16 Thread Robin Murphy
On 16/10/17 09:12, Vladimir Murzin wrote:
> + Robin and Christoph
> 
> On 16/10/17 06:27, Marian Mihailescu wrote:
>> I am using 4.14-rc4 with a patch on top that includes
>> arch/arm/include/asm/dma-mapping.h in a module.
>>
>> I have MMU enabled, so
>> select DMA_NOOP_OPS if !MMU
>> does nothing for me, and I get a compile error because dma_noop_ops is 
>> unknown.
> 
> Can you post an error message here, please?
> 
>>
>> Maybe I should include linux/dma-mapping.h?

That is the *only* dma-mapping header you should be referring to.
Including the arch/*/asm/ version directly is just broken and asking for
trouble.

Robin.

> Where to include? In your driver or what?
> 
>>
>> Thanks for the quick reply.
> 
> with CONFIG_MMU compiler should optimise out dma_noop_ops in:
> 
> return IS_ENABLED(CONFIG_MMU) ? &arm_dma_ops : &dma_noop_ops;
> 
> What toolchain are you using?
> 
> Cheers
> Vladimir
> 
>>
>>
>> On Mon, Oct 16, 2017 at 2:28 PM, Randy Dunlap  wrote:
>>> On 10/15/17 20:29, Randy Dunlap wrote:
 On 10/15/17 20:27, Randy Dunlap wrote:
> On 10/15/17 19:27, Marian Mihailescu wrote:
>> After commit 7844572c633964c864d9f32dc3f2a8ffe5d70371, dma_noop_ops
>> are built only for architectures that use it.
>>
>> For ARM architecture, CONFIG_DMA_NOOP_OPS is not selected, and cannot
>> be selected.
>>>
>>> What kernel version are you looking at?
>>> I see that it is selected:
>>>
>>> --- a/arch/arm/Kconfig
>>> +++ b/arch/arm/Kconfig
>>> @@ -22,6 +22,7 @@ config ARM
>>> select CLONE_BACKWARDS
>>> select CPU_PM if (SUSPEND || CPU_IDLE)
>>> select DCACHE_WORD_ACCESS if HAVE_EFFICIENT_UNALIGNED_ACCESS
>>> +   select DMA_NOOP_OPS if !MMU
>>> select EDAC_SUPPORT
>>> select EDAC_ATOMIC_SCRUB
>>> select GENERIC_ALLOCATOR
>>>
>>>
>>> That's in commit ID 1c51c429f30ea10428337f3a33c12059ba59f668 from May 24, 
>>> 2017.
>>>
>> However, arch/arm/include/asm/dma-mapping.h is referencing dma_noop_ops:
>>
>> static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type 
>> *bus)
>> {
>> return IS_ENABLED(CONFIG_MMU) ? &arm_dma_ops : &dma_noop_ops;
>> }
>>
>> I will let a maintainer suggest the best resolution for this :)
>>
>
> add Bart and iommu mailing list.
>

 and add Vladimir.


>>>
>>>
>>> --
>>> ~Randy
> 



Re: ARM64: Regression with commit e3067861ba66 ("arm64: add basic VMAP_STACK support")

2017-10-16 Thread Robin Murphy
On 16/10/17 14:48, Mark Rutland wrote:
> Hi Leo,
> 
> On Mon, Oct 16, 2017 at 09:17:23AM +0800, Leo Yan wrote:
>> On Tue, Oct 10, 2017 at 05:03:44PM +0100, Robin Murphy wrote:
>>> On 10/10/17 16:45, Mark Rutland wrote:
>>>> On Tue, Oct 10, 2017 at 10:27:25PM +0800, Leo Yan wrote:
>>>>> I work mainline kernel on Hikey620 board, I find it's easily to
>>>>> introduce the panic and report the log as below. So I bisect the kernel
>>>>> and finally narrow down the commit e3067861ba66 ("arm64: add basic
>>>>> VMAP_STACK support") which introduce this issue.
>>>>>
>>>>> I tried to remove 'select HAVE_ARCH_VMAP_STACK' from
>>>>> arch/arm64/Kconfig, then I can see the panic issue will dismiss. So
>>>>> could you check this and have insight for this issue?
>>>>
>>>> Given the stuff in the backtrace, my suspicion is something is trying to
>>>> perform DMA to/from the stack, getting junk addresses form the attempted
>>>> virt<->phys conversions.
>>>>
>>>> Could you try enabling both VMAP_STACK and CONFIG_DEBUG_VIRTUAL?
>>>
>>> CONFIG_DMA_API_DEBUG should scream about drivers trying to use stack
>>> addresses either way, too.
>>
>> Thanks for suggestions, Mark & Robin.
>>
>> I enabled these debugging configs but cannot get clue from it; but
>> occasionally found this issue is quite likely related with CA53 errata,
>> especialy ERRATA_A53_855873 is the relative one. So I changed to use
>> ARM-TF mainline code with ERRATA fixing, this issue can be dismissed.
> 
> Thanks for the update.
> 
> Just to confirm, with the updated firmware you no longer see the issue?
> 
> I can't immediately see how that would be related.

Cores up to r0p2 have the other errata to which
ARM64_WORKAROUND_CLEAN_CACHE also applies anyway; r3p0+ have an ACTLR
bit to do thee CVAC->CIVAC upgrade in hardware, and our policy is that
we expect firmware to enable such hardware workarounds where possible. I
assume that's why we don't explicitly document 855873 anywhere in Linux.

Robin.


Re: ARM64: Regression with commit e3067861ba66 ("arm64: add basic VMAP_STACK support")

2017-10-16 Thread Robin Murphy
On 16/10/17 15:26, Mark Rutland wrote:
> On Mon, Oct 16, 2017 at 03:12:45PM +0100, Robin Murphy wrote:
>> On 16/10/17 14:48, Mark Rutland wrote:
>>> Hi Leo,
>>>
>>> On Mon, Oct 16, 2017 at 09:17:23AM +0800, Leo Yan wrote:
>>>> On Tue, Oct 10, 2017 at 05:03:44PM +0100, Robin Murphy wrote:
>>>>> On 10/10/17 16:45, Mark Rutland wrote:
>>>>>> On Tue, Oct 10, 2017 at 10:27:25PM +0800, Leo Yan wrote:
>>>>>>> I work mainline kernel on Hikey620 board, I find it's easily to
>>>>>>> introduce the panic and report the log as below. So I bisect the kernel
>>>>>>> and finally narrow down the commit e3067861ba66 ("arm64: add basic
>>>>>>> VMAP_STACK support") which introduce this issue.
>>>>>>>
>>>>>>> I tried to remove 'select HAVE_ARCH_VMAP_STACK' from
>>>>>>> arch/arm64/Kconfig, then I can see the panic issue will dismiss. So
>>>>>>> could you check this and have insight for this issue?
>>>>>>
>>>>>> Given the stuff in the backtrace, my suspicion is something is trying to
>>>>>> perform DMA to/from the stack, getting junk addresses form the attempted
>>>>>> virt<->phys conversions.
>>>>>>
>>>>>> Could you try enabling both VMAP_STACK and CONFIG_DEBUG_VIRTUAL?
>>>>>
>>>>> CONFIG_DMA_API_DEBUG should scream about drivers trying to use stack
>>>>> addresses either way, too.
>>>>
>>>> Thanks for suggestions, Mark & Robin.
>>>>
>>>> I enabled these debugging configs but cannot get clue from it; but
>>>> occasionally found this issue is quite likely related with CA53 errata,
>>>> especialy ERRATA_A53_855873 is the relative one. So I changed to use
>>>> ARM-TF mainline code with ERRATA fixing, this issue can be dismissed.
>>>
>>> Thanks for the update.
>>>
>>> Just to confirm, with the updated firmware you no longer see the issue?
>>>
>>> I can't immediately see how that would be related.
>>
>> Cores up to r0p2 have the other errata to which
>> ARM64_WORKAROUND_CLEAN_CACHE also applies anyway; r3p0+ have an ACTLR
>> bit to do thee CVAC->CIVAC upgrade in hardware, and our policy is that
>> we expect firmware to enable such hardware workarounds where possible. I
>> assume that's why we don't explicitly document 855873 anywhere in Linux.
> 
> Sure, I also looked it up. ;)
> 
> I meant that I couldn't immediately see why VMAP'd stacks were likely to
> tickle issues with that more reliably.

Ah, right - in context, "that" appeared to refer to "updated firmware",
not "VMAP_STACK". Sorry.

I guess the vmap addresses might tickle the "same L2 set" condition
differently to when both stack and DMA buffer are linear map addresses.

Robin.


Re: [PATCH V2] ARM: dts: rockchip:add sd card support for firefly reload board

2017-05-26 Thread Robin Murphy
On 26/05/17 08:07, Eddie Cai wrote:
> firefly reload board not support sd card yet. so support it.

I'm confused... According to pictures and the schematic the microSD
socket and vcc_sd supply are on the baseboard, not the core module, and
these nodes already exist in rk3288-firefly-reload.dts :/

Robin.

> Signed-off-by: Eddie Cai 
> Reviewed-by: Shawn Lin 
> ---
>  arch/arm/boot/dts/rk3288-firefly-reload-core.dtsi | 31 
> +++
>  1 file changed, 31 insertions(+)
> 
> diff --git a/arch/arm/boot/dts/rk3288-firefly-reload-core.dtsi 
> b/arch/arm/boot/dts/rk3288-firefly-reload-core.dtsi
> index 413b61f..2f41209 100644
> --- a/arch/arm/boot/dts/rk3288-firefly-reload-core.dtsi
> +++ b/arch/arm/boot/dts/rk3288-firefly-reload-core.dtsi
> @@ -57,6 +57,17 @@
>   clock-output-names = "ext_gmac";
>   };
>  
> + vcc_sd: sdmmc-regulator {
> + compatible = "regulator-fixed";
> + gpio = <&gpio7 RK_PB3 GPIO_ACTIVE_LOW>;
> + pinctrl-names = "default";
> + pinctrl-0 = <&sdmmc_pwr>;
> + regulator-name = "vcc_sd";
> + regulator-min-microvolt = <330>;
> + regulator-max-microvolt = <330>;
> + startup-delay-us = <10>;
> + vin-supply = <&vcc_io>;
> + };
>  
>   vcc_flash: flash-regulator {
>   compatible = "regulator-fixed";
> @@ -281,6 +292,26 @@
>   rockchip,pins = <4 8 RK_FUNC_GPIO &pcfg_output_high>;
>   };
>   };
> +
> + sdmmc {
> + sdmmc_pwr: sdmmc-pwr {
> + rockchip,pins = <7 11 RK_FUNC_GPIO &pcfg_pull_none>;
> + };
> + };
> +};
> +
> +&sdmmc {
> + bus-width = <4>;
> + cap-mmc-highspeed;
> + cap-sd-highspeed;
> + card-detect-delay = <200>;
> + disable-wp;
> + num-slots = <1>;
> + pinctrl-names = "default";
> + pinctrl-0 = <&sdmmc_clk>, <&sdmmc_cmd>, <&sdmmc_cd>, <&sdmmc_bus4>;
> + vmmc-supply = <&vcc_sd>;
> + vqmmc-supply = <&vccio_sd>;
> + status = "okay";
>  };
>  
>  &tsadc {
> 



Re: Device address specific mapping of arm,mmu-500

2017-05-30 Thread Robin Murphy
On 30/05/17 17:49, Ray Jui wrote:
> Hi Will,
> 
> On 5/30/17 8:14 AM, Will Deacon wrote:
>> On Mon, May 29, 2017 at 06:18:45PM -0700, Ray Jui wrote:
>>> I'm writing to check with you to see if the latest arm-smmu.c driver in
>>> v4.12-rc Linux for smmu-500 can support mapping that is only specific to
>>> a particular physical address range while leave the rest still to be
>>> handled by the client device. I believe this can already be supported by
>>> the device tree binding of the generic IOMMU framework; however, it is
>>> not clear to me whether or not the arm-smmu.c driver can support it.
>>>
>>> To give you some background information:
>>>
>>> We have a SoC that has PCIe root complex that has a build-in logic block
>>> to forward MSI writes to ARM GICv3 ITS. Unfortunately, this logic block
>>> has a HW bug that causes the MSI writes not parsed properly and can
>>> potentially corrupt data in the internal FIFO. A workaround is to have
>>> ARM MMU-500 takes care of all inbound transactions. I found that is
>>> working after hooking up our PCIe root complex to MMU-500; however, even
>>> with this optimized arm-smmu driver in v4.12, I'm still seeing a
>>> significant Ethernet throughput drop in both the TX and RX directions.
>>> The throughput drop is very significant at around 50% (but is already
>>> much improved compared to other prior kernel versions at 70~90%).
>>
>> Did Robin's experiments help at all with this?
>>
>> http://www.linux-arm.org/git?p=linux-rm.git;a=shortlog;h=refs/heads/iommu/perf
>>
> 
> It looks like these are new optimizations that have not yet been merged
> in v4.12? I'm going to give it a try.

Actually, most of the stuff there did land in 4.12 - only the
iommu/pgtable part is experimental stuff which hasn't been on the list
yet (but hopefully should be soon).

>>> One alternative is to only use MMU-500 for MSI writes towards
>>> GITS_TRANSLATER register in the GICv3, i.e., if I can define a specific
>>> region of physical address that I want MMU-500 to act on and leave the
>>> rest of inbound transactions to be handled directly by our PCIe
>>> controller, it can potentially work around the HW bug we have and at the
>>> same time achieve optimal throughput.
>>
>> I don't think you can bypass the SMMU for MSIs unless you give them their
>> own StreamIDs, which is likely to break things horribly in the kernel. You
>> could try to create an identity mapping, but you'll still have the
>> translation overhead and you'd probably end up having to supply your own DMA
>> ops to manage the address space. I'm assuming that you need to prevent the
>> physical address of the ITS from being allocated as an IOVA?
> 
> Will, is that a HW limitation that the SMMU cannot be used, only for MSI
> writes, in which case, the physical address range is very specific in
> our ASIC that falls in the device memory region (e.g., below 0x8000)?

Yes, either translation is enabled or it isn't - we don't have
GART-style apertures. To segregate by address the best you can do is set
up the page tables to identity-map all of the "untranslated" address
space. As Will mentioned, if MSI writes could be distinguished from DMA
writes by Stream ID, rather than by address, then there would be more
options, but in the PCI case at least that's not generally possible.

Robin.

> In fact, what I need in this case is a static mapping from IOMMU on the
> physical address of the GITS_TRANSLATER of the GICv3 ITS, which is the
> address that MSI writes go to. This is to bypass the MSI forwarding
> logic in our PCIe controller. At the same time, I can leave the rest of
> inbound transactions to be handled by our PCIe controller without going
> through the MMU.
> 
>>
>>> Any feedback from you is greatly appreciated!
>>
>> Fix the hardware ;)
> 
> Indeed that has to happen with the next revision of the ASIC. But as you
> can see I'm getting quite desperate here trying to find an interim solution.
> 
>>
>> Will
>>
> 
> Thanks for the help!
> 
> Ray
> 



Re: [PATCH v1 0/3] manually add Error Synchronization Barrier at exception handler entry and exit

2017-11-01 Thread Robin Murphy
On 01/11/17 19:14, Dongjiu Geng wrote:
> Some hardware platform can support RAS Extension, but not support IESB,
> such as Huawei's platform, so software need to insert Synchronization Barrier
> operations at exception handler entry.
> 
> This series patches are based on  James's series patches "SError rework +
> RAS&IESB for firmware first support". In Huawei's platform, we do not
> support IESB, so software needs to insert that.
> 
> 
> Dongjiu Geng (3):
>   arm64: add a macro for SError synchronization
>   arm64: add error synchronization barrier in kernel_entry/kernel_exit
>   KVM: arm64: add ESB in exception handler entry and exit.
> 
> James Morse (18):
>   arm64: explicitly mask all exceptions
>   arm64: introduce an order for exceptions
>   arm64: Move the async/fiq helpers to explicitly set process context
> flags
>   arm64: Mask all exceptions during kernel_exit
>   arm64: entry.S: Remove disable_dbg
>   arm64: entry.S: convert el1_sync
>   arm64: entry.S convert el0_sync
>   arm64: entry.S: convert elX_irq
>   KVM: arm/arm64: mask/unmask daif around VHE guests
>   arm64: kernel: Survive corrected RAS errors notified by SError
>   arm64: cpufeature: Enable IESB on exception entry/return for
> firmware-first
>   arm64: kernel: Prepare for a DISR user
>   KVM: arm64: Set an impdef ESR for Virtual-SError using VSESR_EL2.
>   KVM: arm64: Save/Restore guest DISR_EL1
>   KVM: arm64: Save ESR_EL2 on guest SError
>   KVM: arm64: Handle RAS SErrors from EL1 on guest exit
>   KVM: arm64: Handle RAS SErrors from EL2 on guest exit
>   KVM: arm64: Take any host SError before entering the guest
> 
> Xie XiuQi (2):
>   arm64: entry.S: move SError handling into a C function for future
> expansion
>   arm64: cpufeature: Detect CPU RAS Extentions
> 
>  arch/arm64/Kconfig   | 33 +-
>  arch/arm64/include/asm/assembler.h   | 59 +---
>  arch/arm64/include/asm/barrier.h |  1 +
>  arch/arm64/include/asm/cpucaps.h |  4 +-
>  arch/arm64/include/asm/daifflags.h   | 61 +
>  arch/arm64/include/asm/esr.h | 17 +++
>  arch/arm64/include/asm/exception.h   | 14 ++
>  arch/arm64/include/asm/irqflags.h| 40 ++--
>  arch/arm64/include/asm/kvm_emulate.h | 10 
>  arch/arm64/include/asm/kvm_host.h| 16 +++
>  arch/arm64/include/asm/processor.h   |  2 +
>  arch/arm64/include/asm/sysreg.h  |  6 +++
>  arch/arm64/include/asm/traps.h   | 36 +++
>  arch/arm64/kernel/asm-offsets.c  |  1 +
>  arch/arm64/kernel/cpufeature.c   | 43 ++
>  arch/arm64/kernel/debug-monitors.c   |  5 +-
>  arch/arm64/kernel/entry.S| 88 
> +---
>  arch/arm64/kernel/hibernate.c|  5 +-
>  arch/arm64/kernel/machine_kexec.c|  4 +-
>  arch/arm64/kernel/process.c  |  3 ++
>  arch/arm64/kernel/setup.c|  8 ++--
>  arch/arm64/kernel/signal.c   |  8 +++-
>  arch/arm64/kernel/smp.c  | 12 ++---
>  arch/arm64/kernel/suspend.c  |  7 +--
>  arch/arm64/kernel/traps.c| 64 +-
>  arch/arm64/kvm/handle_exit.c | 19 +++-
>  arch/arm64/kvm/hyp-init.S|  3 ++
>  arch/arm64/kvm/hyp/entry.S   | 15 ++
>  arch/arm64/kvm/hyp/hyp-entry.S   |  1 +
>  arch/arm64/kvm/hyp/switch.c  | 19 ++--
>  arch/arm64/kvm/hyp/sysreg-sr.c   |  6 +++
>  arch/arm64/kvm/inject_fault.c| 13 +-
>  arch/arm64/kvm/sys_regs.c|  1 +
>  arch/arm64/mm/proc.S | 14 --
>  virt/kvm/arm/arm.c   |  4 ++
>  35 files changed, 527 insertions(+), 115 deletions(-)
>  create mode 100644 arch/arm64/include/asm/daifflags.h

If you're sending a patch series, please have the cover letter describe
*those patches*, rather than dozens of other patches that aren't part of
it. This diffstat and summary is very confusing.

Robin.


Re: [PATCH v1 1/3] arm64: add a macro for SError synchronization

2017-11-01 Thread Robin Murphy
On 01/11/17 19:14, Dongjiu Geng wrote:
> ARMv8.2 adds a control bit to each SCTLR_ELx to insert implicit
> Error Synchronization Barrier(IESB) operations at exception handler entry
> and exit. But not all hardware platform which support RAS Extension
> can support IESB. So for this case, software needs to manually insert
> Error Synchronization Barrier(ESB) operations.
> 
> In this macros, if system supports RAS Extensdddon instead of IESB,
> it will insert an ESB instruction.
> 
> Signed-off-by: Dongjiu Geng 
> ---
>  arch/arm64/include/asm/assembler.h | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/assembler.h 
> b/arch/arm64/include/asm/assembler.h
> index d4c0adf..e6c79c4 100644
> --- a/arch/arm64/include/asm/assembler.h
> +++ b/arch/arm64/include/asm/assembler.h
> @@ -517,4 +517,13 @@
>  #endif
>   .endm
>  
> + .macro  error_synchronize
> +alternative_if ARM64_HAS_IESB
> + b   1f
> +alternative_else_nop_endif
> +alternative_if ARM64_HAS_RAS_EXTN
> + esb
> +alternative_else_nop_endif
> +1:
> + .endm

Having a branch in here is pretty horrible, and furthermore using label
number 1 has a pretty high chance of subtly breaking code where this
macro is inserted.

Can we not somehow nest or combine the alternative conditions here?

Robin.

>  #endif   /* __ASM_ASSEMBLER_H */
> 



Re: [PATCH for-next 2/4] RDMA/hns: Add IOMMU enable support in hip08

2017-11-01 Thread Robin Murphy
On 01/11/17 07:46, Wei Hu (Xavier) wrote:
> 
> 
> On 2017/10/12 20:59, Robin Murphy wrote:
>> On 12/10/17 13:31, Wei Hu (Xavier) wrote:
>>>
>>> On 2017/10/1 0:10, Leon Romanovsky wrote:
>>>> On Sat, Sep 30, 2017 at 05:28:59PM +0800, Wei Hu (Xavier) wrote:
>>>>> If the IOMMU is enabled, the length of sg obtained from
>>>>> __iommu_map_sg_attrs is not 4kB. When the IOVA is set with the sg
>>>>> dma address, the IOVA will not be page continuous. and the VA
>>>>> returned from dma_alloc_coherent is a vmalloc address. However,
>>>>> the VA obtained by the page_address is a discontinuous VA. Under
>>>>> these circumstances, the IOVA should be calculated based on the
>>>>> sg length, and record the VA returned from dma_alloc_coherent
>>>>> in the struct of hem.
>>>>>
>>>>> Signed-off-by: Wei Hu (Xavier) 
>>>>> Signed-off-by: Shaobo Xu 
>>>>> Signed-off-by: Lijun Ou 
>>>>> ---
>>>> Doug,
>>>>
>>>> I didn't invest time in reviewing it, but having "is_vmalloc_addr" in
>>>> driver code to deal with dma_alloc_coherent is most probably wrong.
>>>>
>>>> Thanks
>>> Hi,  Leon & Doug
>>> We refered the function named __ttm_dma_alloc_page in the kernel
>>> code as below:
>>> And there are similar methods in bch_bio_map and mem_to_page
>>> functions in current 4.14-rcx.
>>>
>>> static struct dma_page *__ttm_dma_alloc_page(struct dma_pool *pool)
>>> {
>>> struct dma_page *d_page;
>>>
>>> d_page = kmalloc(sizeof(struct dma_page), GFP_KERNEL);
>>> if (!d_page)
>>> return NULL;
>>>
>>> d_page->vaddr = dma_alloc_coherent(pool->dev, pool->size,
>>>&d_page->dma,
>>>pool->gfp_flags);
>>> if (d_page->vaddr) {
>>> if (is_vmalloc_addr(d_page->vaddr))
>>> d_page->p = vmalloc_to_page(d_page->vaddr);
>>> else
>>> d_page->p = virt_to_page(d_page->vaddr);
>> There are cases on various architectures where neither of those is
>> right. Whether those actually intersect with TTM or RDMA use-cases is
>> another matter, of course.
>>
>> What definitely is a problem is if you ever take that page and end up
>> accessing it through any virtual address other than the one explicitly
>> returned by dma_alloc_coherent(). That can blow the coherency wide open
>> and invite data loss, right up to killing the whole system with a
>> machine check on certain architectures.
>>
>> Robin.
> Hi, Robin
> Thanks for your comment.
> 
> We have one problem and the related code as below.
> 1. call dma_alloc_coherent function  serval times to alloc memory.
> 2. vmap the allocated memory pages.
> 3. software access memory by using the return virt addr of vmap
> and hardware using the dma addr of dma_alloc_coherent.

The simple answer is "don't do that". Seriously. dma_alloc_coherent()
gives you a CPU virtual address and a DMA address with which to access
your buffer, and that is the limit of what you may infer about it. You
have no guarantee that the virtual address is either in the linear map
or vmalloc, and not some other special place. You have no guarantee that
the underlying memory even has an associated struct page at all.

> When IOMMU is disabled in ARM64 architecture, we use virt_to_page()
> before vmap(), it works. And when IOMMU is enabled using
> virt_to_page() will cause calltrace later, we found the return
> addr of dma_alloc_coherent is vmalloc addr, so we add the
> condition judgement statement as below, it works. 
> for (i = 0; i < buf->nbufs; ++i)
> pages[i] =
> is_vmalloc_addr(buf->page_list[i].buf) ?
> vmalloc_to_page(buf->page_list[i].buf) :
> virt_to_page(buf->page_list[i].buf);
> Can you give us suggestion? better method?

Oh my goodness, having now taken a closer look at this driver, I'm lost
for words in disbelief. To pick just one example:

u32 bits_per_long = BITS_PER_LONG;
...
if (bits_per_long == 64) {
/* memory mapping nonsense */
}

WTF does the size of a long have to do with DMA buffer management!?

Of cou

Re: [PATCH v1 1/3] arm64: add a macro for SError synchronization

2017-11-01 Thread Robin Murphy
On 01/11/17 12:54, gengdongjiu wrote:
> Hi Robin,
> 
> On 2017/11/1 19:24, Robin Murphy wrote:
>>> +   esb
>>> +alternative_else_nop_endif
>>> +1:
>>> +   .endm
>> Having a branch in here is pretty horrible, and furthermore using label
>> number 1 has a pretty high chance of subtly breaking code where this
>> macro is inserted.
>>
>> Can we not somehow nest or combine the alternative conditions here?
> 
> I found it will report error if combine the alternative conditions here.
> 
> For example:
> 
> + .macro  error_synchronize
> +alternative_if ARM64_HAS_IESB
> +alternative_if ARM64_HAS_RAS_EXTN
> + esb
> +alternative_else_nop_endif
> +alternative_else_nop_endif
> + .endm
> 
> And even using b.eq/cbz instruction in the alternative instruction in 
> arch/arm64/kernel/entry.S,
> it will report Error.
> 
> For example below
> 
> alternative_if ARM64_HAS_PAN
>   
> b.eqx
> alternative_else_nop_endif
> 
> I do not dig it deeply, do you know the reason about it or good suggestion 
> about that?
> Thanks a lot in advance.

Actually, on second look ARM64_HAS_RAS_EXTN doesn't even matter - ESB is
a hint, so if the CPU doesn't have RAS it should behave as a NOP anyway.

On which note, since I don't see one here - are any of those other
patches defining an "esb" assembly macro similar to the inline asm case?
If not then this isn't going to build with older toolchains - perhaps we
should just use the raw hint syntax directly.

Robin.


Re: [PATCH] arm64: write_sysreg asm illegal for aarch32

2017-11-01 Thread Robin Murphy
On 01/11/17 16:58, Mark Salyzyn wrote:
> Cross compiling to aarch32 (for vdso32) using clang correctly
> identifies that (the unused) write_sysreg inline asm directive is
> illegal in that architectural context:
> 
> arch/arm64/include/asm/arch_timer.h: error: invalid input constraint 'rZ' in 
> asm
> write_sysreg(cntkctl, cntkctl_el1);
> ^
> arch/arm64/include/asm/sysreg.h: note: expanded from macro 'write_sysreg'
>  : : "rZ" (__val));
>  ^
> 
> GCC normally checks for correctness everywhere. But uniquely for
> unused asm, will optimize out and suppress the error report.

It sounds more like some paths are wrong in the compat vDSO build if
it's pulling in this header in the first place - nothing in this file is
relevant to AArch32.

Robin.

> Signed-off-by: Mark Salyzyn 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Christoffer Dall 
> Cc: Mark Rutland 
> Cc: Marc Zyngier 
> Cc: Suzuki K Poulose 
> Cc: Stefan Traby 
> Cc: Dave Martin 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  arch/arm64/include/asm/sysreg.h | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index f707fed5886f..a7b61c9327db 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -492,11 +492,15 @@ asm(
>   * The "Z" constraint normally means a zero immediate, but when combined with
>   * the "%x0" template means XZR.
>   */
> +#if defined(__aarch64__)
>  #define write_sysreg(v, r) do {  \
>   u64 __val = (u64)(v);   \
>   asm volatile("msr " __stringify(r) ", %x0"  \
>: : "rZ" (__val)); \
>  } while (0)
> +#else
> +#define write_sysreg(v, r) BUG()
> +#endif
>  
>  /*
>   * For registers without architectural names, or simply unsupported by
> 



Re: [PATCH] arm64: mm: Set MAX_PHYSMEM_BITS based on ARM64_VA_BITS

2017-11-13 Thread Robin Murphy

On 13/11/17 10:32, Suzuki K Poulose wrote:

On 12/11/17 17:55, Jerome Glisse wrote:

On Fri, Nov 10, 2017 at 03:11:15PM +, Robin Murphy wrote:

On 09/11/17 22:58, Krishna Reddy wrote:

MAX_PHYSMEM_BITS greater than ARM64_VA_BITS is causing memory
access fault, when HMM_DMIRROR test is enabled.
In the failing case, ARM64_VA_BITS=39 and MAX_PHYSMEM_BITS=48.
HMM_DMIRROR test selects phys memory range from end based on
MAX_PHYSMEM_BITS and gets mapped into VA space linearly.
As VA space is 39-bit and phys space is 48-bit, this has caused
incorrect mapping and leads to memory access fault.

Limiting the MAX_PHYSMEM_BITS to ARM64_VA_BITS fixes the issue and is
the right thing instead of hard coding it as 48-bit always.

[    3.378655] Unable to handle kernel paging request at virtual 
address 3befd00

[    3.378662] pgd = ff800a04b000
[    3.378900] [3befd00] *pgd=81fa3003, 
*pud=81fa3003, *pmd=006268200711

[    3.378933] Internal error: Oops: 9644 [#1] PREEMPT SMP
[    3.378938] Modules linked in:
[    3.378948] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
4.9.52-tegra-g91402fdc013b-dirty #51

[    3.378950] Hardware name: quill (DT)
[    3.378954] task: ffc1ebac task.stack: ffc1eba64000
[    3.378967] PC is at __memset+0x1ac/0x1d0
[    3.378976] LR is at sparse_add_one_section+0xf8/0x174
[    3.378981] pc : [] lr : [] 
pstate: 404000c5

[    3.378983] sp : ffc1eba67a40
[    3.378993] x29: ffc1eba67a40 x28: 
[    3.378999] x27: 0003 x26: 0040
[    3.379005] x25: 03ff x24: ffc1e9f6cf80
[    3.379010] x23: ff8009ecb2d4 x22: 03befd00
[    3.379015] x21: ffc1e9923ff0 x20: 0003
[    3.379020] x19: ffef x18: 
[    3.379025] x17: 24d7 x16: 
[    3.379030] x15: ff8009cd8690 x14: ffc1e9f6c70c
[    3.379035] x13: ffc1e9f6c70b x12: 0030
[    3.379039] x11: 0040 x10: 0101010101010101
[    3.379044] x9 :  x8 : 03befd00
[    3.379049] x7 :  x6 : 003f
[    3.379053] x5 : 0040 x4 : 
[    3.379058] x3 : 0004 x2 : 00c0
[    3.379063] x1 :  x0 : 03befd00
[    3.379064]
[    3.379069] Process swapper/0 (pid: 1, stack limit = 
0xffc1eba64028)

[    3.379071] Call trace:
[    3.379079] [] __memset+0x1ac/0x1d0


What's the deal with this memset? AFAICS we're in __add_pages() from
hmm_devmem_pages_create() calling add_pages() for private memory 
which it

does not expect to be in the linear map anyway :/


FWIW I did keep looking, and I now see that, thanks to confusing 
inlining, this is probably the clearing of the vmemmap section in 
sparse_add_one_section(), rather than any touching of the new memory 
itself. The fact that the commit message doesn't even try to explain the 
real problem (seemingly that the index has overflowed the vmemmap area 
and wrapped past the top of the address space) only emphasises my 
concern that this is a bit of a hack, though.



There appears to be a more fundamental problem being papered over here.


Following some discussion with Suzuki and Catalin, there does seem to be 
a more general issue of the interaction between vmemmap and memory 
hotplug. Of course, arm64 doesn't support memory hotplug in mainline 
(it's something I've started looking at), nor other HMM dependencies, so 
there's already more going on here than meets the eye.



Yes i think the dummy driver is use badly, if you want to test CDM memory
with dummy driver you need to steal regular memory to act as CDM memory.
You can take a look at following 2 patches:

https://cgit.freedesktop.org/~glisse/linux/commit/?h=hmm-cdm-next&id=fcc1e94027dbee9525f75b2a9ad88b2e6279558a 

https://cgit.freedesktop.org/~glisse/linux/commit/?h=hmm-cdm-next&id=84204c5be742186236b371ea2f7ad39bf1770fe6 



Note that this is only if your device have its own memory that is not
reported as regular ram to kernel resource and if that memory is
accessible by CPU in cache coherent way.

For tegra platform i don't think you have any such memory. Thus you
do not need to register any memory to use HMM. But we can talk about
your platform in private mail under NDA if it is not the case.

Note that no matter what i still think it make sense to properly define
MAX_PHYSMEM_BITS like on x86 or powerpc.


Its a bit tricky on arm64. Essentially the VMEMMAP can cover a region
of (1 << (VA_BITS - 1)), the size of the linear map. However, it does
allow the physical memory to be above VA_BITS. e.g, one can boot a
39bit VA kernel on a system where the RAM is at 40bits. We adjust the
"vmemmap" to make sure that it points to the first "valid" pfn (based
on the start address of the memory).

If we reduce the MAX_PHYSMEM_BITS to VA_BITS, that could 

Re: [PATCH 3/3] arm64: Add software workaround for Falkor erratum 1041

2017-11-03 Thread Robin Murphy
On 03/11/17 03:27, Shanker Donthineni wrote:
> The ARM architecture defines the memory locations that are permitted
> to be accessed as the result of a speculative instruction fetch from
> an exception level for which all stages of translation are disabled.
> Specifically, the core is permitted to speculatively fetch from the
> 4KB region containing the current program counter and next 4KB.
> 
> When translation is changed from enabled to disabled for the running
> exception level (SCTLR_ELn[M] changed from a value of 1 to 0), the
> Falkor core may errantly speculatively access memory locations outside
> of the 4KB region permitted by the architecture. The errant memory
> access may lead to one of the following unexpected behaviors.
> 
> 1) A System Error Interrupt (SEI) being raised by the Falkor core due
>to the errant memory access attempting to access a region of memory
>that is protected by a slave-side memory protection unit.
> 2) Unpredictable device behavior due to a speculative read from device
>memory. This behavior may only occur if the instruction cache is
>disabled prior to or coincident with translation being changed from
>enabled to disabled.
> 
> To avoid the errant behavior, software must execute an ISB immediately
> prior to executing the MSR that will change SCTLR_ELn[M] from 1 to 0.
> 
> Signed-off-by: Shanker Donthineni 
> ---
>  Documentation/arm64/silicon-errata.txt |  1 +
>  arch/arm64/Kconfig | 10 ++
>  arch/arm64/include/asm/assembler.h | 17 +
>  arch/arm64/include/asm/cpucaps.h   |  3 ++-
>  arch/arm64/kernel/cpu_errata.c | 16 
>  arch/arm64/kernel/efi-entry.S  |  4 ++--
>  arch/arm64/kernel/head.S   |  4 ++--
>  7 files changed, 50 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/arm64/silicon-errata.txt 
> b/Documentation/arm64/silicon-errata.txt
> index 66e8ce1..704770c0 100644
> --- a/Documentation/arm64/silicon-errata.txt
> +++ b/Documentation/arm64/silicon-errata.txt
> @@ -74,3 +74,4 @@ stable kernels.
>  | Qualcomm Tech. | Falkor v1   | E1003   | 
> QCOM_FALKOR_ERRATUM_1003|
>  | Qualcomm Tech. | Falkor v1   | E1009   | 
> QCOM_FALKOR_ERRATUM_1009|
>  | Qualcomm Tech. | QDF2400 ITS | E0065   | 
> QCOM_QDF2400_ERRATUM_0065   |
> +| Qualcomm Tech. | Falkor v{1,2}   | E1041   | 
> QCOM_FALKOR_ERRATUM_1041|
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 0df64a6..7e933fb 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -539,6 +539,16 @@ config QCOM_QDF2400_ERRATUM_0065
>  
> If unsure, say Y.
>  
> +config QCOM_FALKOR_ERRATUM_1041
> + bool "Falkor E1041: Speculative instruction fetches might cause errant 
> memory access"
> + default y
> + help
> +   Falkor CPU may speculatively fetch instructions from an improper
> +   memory location when MMU translation is changed from SCTLR_ELn[M]=1
> +   to SCTLR_ELn[M]=0. Prefix an ISB instruction to fix the problem.
> +
> +   If unsure, say Y.
> +
>  endmenu
>  
>  
> diff --git a/arch/arm64/include/asm/assembler.h 
> b/arch/arm64/include/asm/assembler.h
> index b6dfb4f..4c91efb 100644
> --- a/arch/arm64/include/asm/assembler.h
> +++ b/arch/arm64/include/asm/assembler.h
> @@ -30,6 +30,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  /*
>   * Enable and disable interrupts.
> @@ -514,6 +515,22 @@
>   *   reg: the value to be written.
>   */
>   .macro  write_sctlr, eln, reg
> +#ifdef CONFIG_QCOM_FALKOR_ERRATUM_1041
> +alternative_if ARM64_WORKAROUND_QCOM_FALKOR_E1041
> + tbnz\reg, #0, 8000f  // enable MMU?

Do we really need the branch here? It's not like enabling the MMU is
something we do on the syscall fastpath, and I can't imagine an extra
ISB hurts much (and is probably comparable to a mispredicted branch
anyway). In fact, is there any noticeable hit on other
microarchitectures if we save the alternative bother and just do it
unconditionally always?

Robin.

> + isb
> +8000:
> +alternative_else_nop_endif
> +#endif
> + msr sctlr_\eln, \reg
> + .endm
> +
> + .macro  early_write_sctlr, eln, reg
> +#ifdef CONFIG_QCOM_FALKOR_ERRATUM_1041
> + tbnz\reg, #0, 8000f  // enable MMU?
> + isb
> +8000:
> +#endif
>   msr sctlr_\eln, \reg
>   .endm
>  
> diff --git a/arch/arm64/include/asm/cpucaps.h 
> b/arch/arm64/include/asm/cpucaps.h
> index 8da6216..7f7a59d 100644
> --- a/arch/arm64/include/asm/cpucaps.h
> +++ b/arch/arm64/include/asm/cpucaps.h
> @@ -40,7 +40,8 @@
>  #define ARM64_WORKAROUND_858921  19
>  #define ARM64_WORKAROUND_CAVIUM_3011520
>  #define ARM64_HAS_DCPOP  21
> +#define ARM64_WORKAROUND_QCOM_FALKOR_E1041   22
>  
> -#define ARM64_NCAPS  22
> +#define ARM64_NCAPS  23
>  
>  #endif /* __A

Re: [PATCH v2] iommu/iova: Use raw_cpu_ptr() instead of get_cpu_ptr() for ->fq

2017-11-06 Thread Robin Murphy
On 06/11/17 15:43, Alex Williamson wrote:
> [cc +robin]
> 
> On Thu, 2 Nov 2017 18:33:50 +0100
> Sebastian Andrzej Siewior  wrote:
> 
>> On 2017-09-21 17:21:40 [+0200], Sebastian Andrzej Siewior wrote:
>>> get_cpu_ptr() disabled preemption and returns the ->fq object of the
>>> current CPU. raw_cpu_ptr() does the same except that it not disable
>>> preemption which means the scheduler can move it to another CPU after it
>>> obtained the per-CPU object.
>>> In this case this is not bad because the data structure itself is
>>> protected with a spin_lock. This change shouldn't matter however on RT
>>> it does because the sleeping lock can't be accessed with disabled
>>> preemption.  
>>
>> Did this make to your tree Jörg?
> 
> Hi Sebastian,
> 
> Joerg is out on paternity leave through the end of the year, I'm
> filling in in the interim.  I hadn't looked for patches this far back,
> so thanks for pointing it out.  Robin, any comments?  Thanks,

The reasoning seems sound - assuming we can't get preempted once the
lock is actually taken, the worst side-effect of getting moved to
another CPU in that slim window looks to be a little bit of lock
contention. Operating on "someone else's" queue should have no
correctness impact, and the cmpxchg after the lock is released isn't
even working on percpu data so doesn't really care anyway.

AFAICS it's more or less the direct equivalent of aaffaa8a3b59
("iommu/iova: Don't disable preempt around this_cpu_ptr()"), which seems
to have been working out OK.

Robin.

> Alex
> 
>>> Cc: Joerg Roedel 
>>> Cc: io...@lists.linux-foundation.org
>>> Reported-by: vina...@gmail.com
>>> Signed-off-by: Sebastian Andrzej Siewior 
>>> ---
>>> On 2017-09-19 11:41:19 [+0200], Joerg Roedel wrote:  
 Hi Sebastian,  
>>> Hi Jörg,
>>>   
 I moved the flushing to driver/iommu/iova.c to share it with the Intel
 IOMMU and possibly other drivers too, so this patch does no longer apply
 to v4.14-rc1. Can you update the patch to these changes?  
>>>
>>> Sure.
>>>
>>> v1…v2: move the change from amd_iommu.c to iova.c
>>>
>>>  drivers/iommu/iova.c | 4 +---
>>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>>> index 33edfa794ae9..b30900025c62 100644
>>> --- a/drivers/iommu/iova.c
>>> +++ b/drivers/iommu/iova.c
>>> @@ -570,7 +570,7 @@ void queue_iova(struct iova_domain *iovad,
>>> unsigned long pfn, unsigned long pages,
>>> unsigned long data)
>>>  {
>>> -   struct iova_fq *fq = get_cpu_ptr(iovad->fq);
>>> +   struct iova_fq *fq = raw_cpu_ptr(iovad->fq);
>>> unsigned long flags;
>>> unsigned idx;
>>>  
>>> @@ -600,8 +600,6 @@ void queue_iova(struct iova_domain *iovad,
>>> if (atomic_cmpxchg(&iovad->fq_timer_on, 0, 1) == 0)
>>> mod_timer(&iovad->fq_timer,
>>>   jiffies + msecs_to_jiffies(IOVA_FQ_TIMEOUT));
>>> -
>>> -   put_cpu_ptr(iovad->fq);
>>>  }
>>>  EXPORT_SYMBOL_GPL(queue_iova);
>>>  
>>> -- 
>>> 2.14.1
>>>   
>>
>> Sebastian
>> ___
>> iommu mailing list
>> io...@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu



Re: [PATCH 2/3] dmaengine: qcom_hidma: add support for the new revision

2017-11-06 Thread Robin Murphy
On 06/11/17 17:26, Sinan Kaya wrote:
> Add support for probing the newer HW.
> 
> Signed-off-by: Sinan Kaya 
> ---
>  drivers/dma/qcom/hidma.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/dma/qcom/hidma.c b/drivers/dma/qcom/hidma.c
> index e366985..29d6aaa 100644
> --- a/drivers/dma/qcom/hidma.c
> +++ b/drivers/dma/qcom/hidma.c
> @@ -749,9 +749,13 @@ static bool hidma_msi_capable(struct device *dev)
>   return false;
>  
>   ret = strcmp(of_compat, "qcom,hidma-1.1");
> + if (ret)
> + ret = strcmp(of_compat, "qcom,hidma-1.2");
>   } else {
>  #ifdef CONFIG_ACPI
>   ret = strcmp(acpi_device_hid(adev), "QCOM8062");
> + if (ret)
> + ret = strcmp(acpi_device_hid(adev), "QCOM8063");

This string-juggling looks to have already hit the point at which it
doesn't scale well - it would be a lot nicer to make use of
of_device_get_match_data() and the ACPI equivalent to abstract the
version-specific data appropriately.

Robin.

>  #endif
>   }
>   return ret == 0;
> @@ -954,6 +958,7 @@ static int hidma_remove(struct platform_device *pdev)
>  static const struct acpi_device_id hidma_acpi_ids[] = {
>   {"QCOM8061"},
>   {"QCOM8062"},
> + {"QCOM8063"},
>   {},
>  };
>  MODULE_DEVICE_TABLE(acpi, hidma_acpi_ids);
> @@ -962,6 +967,7 @@ static int hidma_remove(struct platform_device *pdev)
>  static const struct of_device_id hidma_match[] = {
>   {.compatible = "qcom,hidma-1.0",},
>   {.compatible = "qcom,hidma-1.1",},
> + {.compatible = "qcom,hidma-1.2",},
>   {},
>  };
>  MODULE_DEVICE_TABLE(of, hidma_match);
> 



Re: [PATCH 3/4] iommu/arm-smmu-v3: Use NUMA memory allocations for stream tables and comamnd queues

2017-10-18 Thread Robin Murphy
On 04/10/17 14:53, Ganapatrao Kulkarni wrote:
> Hi Robin,
> 
> 
> On Thu, Sep 21, 2017 at 5:28 PM, Robin Murphy  wrote:
>> [+Christoph and Marek]
>>
>> On 21/09/17 09:59, Ganapatrao Kulkarni wrote:
>>> Introduce smmu_alloc_coherent and smmu_free_coherent functions to
>>> allocate/free dma coherent memory from NUMA node associated with SMMU.
>>> Replace all calls of dmam_alloc_coherent with smmu_alloc_coherent
>>> for SMMU stream tables and command queues.
>>
>> This doesn't work - not only do you lose the 'managed' aspect and risk
>> leaking various tables on probe failure or device removal, but more
>> importantly, unless you add DMA syncs around all the CPU accesses to the
>> tables, you lose the critical 'coherent' aspect, and that's a horribly
>> invasive change that I really don't want to make.
> 
> this implementation is similar to function used to allocate memory for
> translation tables.

The concept is similar, yes, and would work if implemented *correctly*
with the aforementioned comprehensive and hugely invasive changes. The
implementation as presented in this patch, however, is incomplete and
badly broken.

By way of comparison, the io-pgtable implementations contain all the
necessary dma_sync_* calls, never relied on devres, and only have one
DMA direction to worry about (hint: the queues don't all work
identically). There are also a couple of practical reasons for using
streaming mappings with the DMA == phys restriction there - tracking
both the CPU and DMA addresses for each table would significantly
increase the memory overhead, and using the cacheable linear map address
in all cases sidesteps any potential problems with the atomic PTE
updates. Neither of those concerns apply to the SMMUv3 data structures,
which are textbook coherent DMA allocations (being tied to the lifetime
of the device, rather than transient).

> why do you see it affects to stream tables and not to page tables.
> at runtime, both tables are accessed by SMMU only.
> 
> As said in cover letter, having stream table from respective NUMA node
> is yielding
> around 30% performance!
> please suggest, if there is any better way to address this issue?

I fully agree that NUMA-aware allocations are a worthwhile thing that we
want. I just don't like the idea of going around individual drivers
replacing coherent API usage with bodged-up streaming mappings - I
really think it's worth making the effort to to tackle it once, in the
proper place, in a way that benefits all users together.

Robin.

>>
>> Christoph, Marek; how reasonable do you think it is to expect
>> dma_alloc_coherent() to be inherently NUMA-aware on NUMA-capable
>> systems? SWIOTLB looks fairly straightforward to fix up (for the simple
>> allocation case; I'm not sure it's even worth it for bounce-buffering),
>> but the likes of CMA might be a little trickier...
>>
>> Robin.
>>
>>> Signed-off-by: Ganapatrao Kulkarni 
>>> ---
>>>  drivers/iommu/arm-smmu-v3.c | 57 
>>> -
>>>  1 file changed, 51 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>>> index e67ba6c..bc4ba1f 100644
>>> --- a/drivers/iommu/arm-smmu-v3.c
>>> +++ b/drivers/iommu/arm-smmu-v3.c
>>> @@ -1158,6 +1158,50 @@ static void arm_smmu_init_bypass_stes(u64 *strtab, 
>>> unsigned int nent)
>>>   }
>>>  }
>>>
>>> +static void *smmu_alloc_coherent(struct arm_smmu_device *smmu, size_t size,
>>> + dma_addr_t *dma_handle, gfp_t gfp)
>>> +{
>>> + struct device *dev = smmu->dev;
>>> + void *pages;
>>> + dma_addr_t dma;
>>> + int numa_node = dev_to_node(dev);
>>> +
>>> + pages = alloc_pages_exact_nid(numa_node, size, gfp | __GFP_ZERO);
>>> + if (!pages)
>>> + return NULL;
>>> +
>>> + if (!(smmu->features & ARM_SMMU_FEAT_COHERENCY)) {
>>> + dma = dma_map_single(dev, pages, size, DMA_TO_DEVICE);
>>> + if (dma_mapping_error(dev, dma))
>>> + goto out_free;
>>> + /*
>>> +  * We depend on the SMMU being able to work with any physical
>>> +  * address directly, so if the DMA layer suggests otherwise by
>>> +  * translating or truncating them, that bodes very badly...
>>> +  */
>>> + if (dma != virt_to_phys(pages))
>>> + goto out_unmap;
>

Re: [PATCH 4/4] dma-debug: Allow poisoning nonzero allocations

2015-10-07 Thread Robin Murphy

On 29/09/15 22:27, Andrew Morton wrote:
[...]

If I'm understanding things correctly, some allocators zero the memory
by default and others do not.  And we have an unknown number of drivers
which are assuming that the memory is zeroed.

Correct?


That's precisely the motivation here, yes.


If so, our options are

a) audit all callers, find the ones which expect zeroed memory but
aren't passing __GFP_ZERO and fix them.

b) convert all allocators to zero the memory by default.

Obviously, a) is better.  How big a job is it?


This I'm not so sure of, hence the very tentative first step. For a very 
crude guess at an an upper bound:


$ git grep -E '(dma|pci)_alloc_co(her|nsist)ent' drivers/ | wc -l
1148

vs.

$ git grep -E '(dma|pci)_zalloc_co(her|nsist)ent' drivers/ | wc -l
234

noting that the vast majority of the former are still probably benign, 
but picking out those which aren't from the code alone without knowledge 
of and/or access to the hardware might be non-trivial.



This patch will help the process, if people use it.


+   if (IS_ENABLED(CONFIG_DMA_API_DEBUG_POISON) && !(flags & __GFP_ZERO))
+   memset(virt, DMA_ALLOC_POISON, size);
+


This is likely to be slow in the case of non-cached memory and large
allocations.  The config option should come with a warning.


It depends on DMA_API_DEBUG, which already has a stern performance
warning, is additionally hidden behind EXPERT, and carries a slightly
flippant yet largely truthful warning that actually using it could break
pretty much every driver in your system; is that not enough?


It might be helpful to provide a runtime knob as well - having to
rebuild&reinstall just to enable/disable this feature is a bit painful.


Good point - there's always the global DMA debug disable knob, but this 
particular feature probably does warrant finer-grained control to be 
really practical. Having thought about it some more, it's also probably 
wrong that this doesn't respect the dma_debug_driver filter, given that 
it is actually invasive; in fixing that, how about if it also *only* 
applied when a specific driver is filtered? Then there would be no 
problematic "break anything and everything" mode, and the existing 
debugfs controls should suffice.


Robin.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 2/3] Add iommu driver for hi6220 SoC platform

2015-10-20 Thread Robin Murphy

On 20/10/15 09:45, Chen Feng wrote:

iommu/hisilicon: Add hi6220-SoC smmu driver


A brief description of the smmu and what capabilities it provides 
wouldn't go amiss here.



Signed-off-by: Chen Feng 
Signed-off-by: Yu Dongbin 
---
  drivers/iommu/Kconfig|   8 +
  drivers/iommu/Makefile   |   1 +
  drivers/iommu/hi6220_iommu.c | 482 +++
  3 files changed, 491 insertions(+)
  create mode 100644 drivers/iommu/hi6220_iommu.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 4664c2a..9b41708 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -343,6 +343,14 @@ config SPAPR_TCE_IOMMU
  Enables bits of IOMMU API required by VFIO. The iommu_ops
  is not implemented as it is not necessary for VFIO.

+config HI6220_IOMMU
+   bool "Hi6220 IOMMU Support"
+   depends on ARM64
+   select IOMMU_API
+   select IOMMU_IOVA
+   help
+ Enable IOMMU Driver for hi6220 SoC.
+
  # ARM IOMMU support
  config ARM_SMMU
bool "ARM Ltd. System MMU (SMMU) Support"
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index c6dcc51..ee4dfef 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -23,3 +23,4 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
  obj-$(CONFIG_SHMOBILE_IOMMU) += shmobile-iommu.o
  obj-$(CONFIG_SHMOBILE_IPMMU) += shmobile-ipmmu.o
  obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
+obj-$(CONFIG_HI6220_IOMMU) += hi6220_iommu.o


It would be nice to insert this before CONFIG_INTEL_IOMMU to try to 
maintain some semblance of order.



diff --git a/drivers/iommu/hi6220_iommu.c b/drivers/iommu/hi6220_iommu.c
new file mode 100644
index 000..6c5198d
--- /dev/null
+++ b/drivers/iommu/hi6220_iommu.c
@@ -0,0 +1,482 @@
+/*
+ * Hisilicon Hi6220 IOMMU driver
+ *
+ * Copyright (c) 2015 Hisilicon Limited.
+ *
+ * Author: Chen Feng 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#define pr_fmt(fmt) "IOMMU: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 


Those last 3 spoil the ordering here. Plus, do you really need 
linux/interrupt.h twice?



+#define SMMU_CTRL_OFFSET (0x)
+#define SMMU_ENABLE_OFFSET   (0x0004)
+#define SMMU_PTBR_OFFSET (0x0008)
+#define SMMU_START_OFFSET(0x000C)
+#define SMMU_END_OFFSET  (0x0010)
+#define SMMU_INTMASK_OFFSET  (0x0014)
+#define SMMU_RINTSTS_OFFSET  (0x0018)
+#define SMMU_MINTSTS_OFFSET  (0x001C)
+#define SMMU_INTCLR_OFFSET   (0x0020)
+#define SMMU_STATUS_OFFSET   (0x0024)
+#define SMMU_AXIID_OFFSET(0x0028)
+#define SMMU_CNTCTRL_OFFSET  (0x002C)
+#define SMMU_TRANSCNT_OFFSET (0x0030)
+#define SMMU_L0TLBHITCNT_OFFSET  (0x0034)
+#define SMMU_L1TLBHITCNT_OFFSET  (0x0038)
+#define SMMU_WRAPCNT_OFFSET  (0x003C)
+#define SMMU_SEC_START_OFFSET(0x0040)
+#define SMMU_SEC_END_OFFSET  (0x0044)
+#define SMMU_VERSION_OFFSET  (0x0048)
+#define SMMU_IPTSRC_OFFSET   (0x004C)
+#define SMMU_IPTPA_OFFSET(0x0050)
+#define SMMU_TRBA_OFFSET (0x0054)
+#define SMMU_BYS_START_OFFSET(0x0058)
+#define SMMU_BYS_END_OFFSET  (0x005C)
+#define SMMU_RAM_OFFSET  (0x1000)
+#define SMMU_REGS_MAX(15)


This is confusing; I count 24 registers listed above, what is 15 the 
maximum of?



+#define SMMU_REGS_SGMT_END   (0x60)
+#define SMMU_CHIP_ID_V100(1)
+#define SMMU_CHIP_ID_V200(2)
+
+#define SMMU_REGS_OPS_SEGMT_START(0xf00)
+#define SMMU_REGS_OPS_SEGMT_NUMB (8)
+#define SMMU_REGS_AXI_SEGMT_START(0xf80)
+#define SMMU_REGS_AXI_SEGMT_NUMB (8)
+
+#define SMMU_INIT(0x1)
+#define SMMU_RUNNING (0x2)
+#define SMMU_SUSPEND (0x3)
+#define SMMU_STOP(0x4)
+#define SMMU_CTRL_INVALID(BIT(10))
+#define PAGE_ENTRY_VALID (0x1)
+
+#define IOVA_START_PFN   (1)
+#define IOPAGE_SHIFT (12)
+#define IOVA_PFN(addr)   ((addr) >> IOPAGE_SHIFT)
+#define IOVA_PAGE_SZ (SZ_4K)
+#define IOVA_START   (0x2000)


This doesn't match up - surely either IOVA_START_PFN should be 2, or 
IOVA_START should be 0x1000.



+#define IOVA_END (0x8000)
+
+struct hi6220_smmu {
+   unsigned int irq;
+   irq_handler_t smmu_isr;
+   void __iomem *reg_base;
+   struct clk *smmu_peri_clk;
+   struct clk *smmu_clk;
+   struct clk *media_sc_clk;
+   size_t page_size;
+   spinlock_t spinlock; /*spinlock for tlb inv

Re: framebuffer corruption due to overlapping stp instructions on arm64

2018-08-06 Thread Robin Murphy

On 06/08/18 11:25, Mikulas Patocka wrote:
[...]

None of this explains why some transactions fail to make it across
entirely. The overlapping writes in question write the same data to
the memory locations that are covered by both, and so the ordering in
which the transactions are received should not affect the outcome.


You're right that the corruption couldn't be explained just by reordering
writes. My hypothesis is that the PCIe controller tries to disambiguate
the overlapping writes, but the disambiguation logic was not tested and it
is buggy. If there's a barrier between the overlapping writes, the PCIe
controller won't see any overlapping writes, so it won't trigger the
faulty disambiguation logic and it works.

Could the ARM engineers look if there's some chicken bit in Cortex-A72
that could insert barriers between non-cached writes automatically?


I don't think there is, and even if there was I imagine it would have a 
pretty hideous effect on non-coherent DMA buffers and the various other 
places in which we have Normal-NC mappings of actual system RAM.



I observe these kinds of corruptions:
- failing to write a few bytes


That could potentially be explained by the reordering/atomicity issues 
Matt mentioned, i.e. the load is observing part of the store, before the 
store has fully completed.



- writing a few bytes that were written 16 bytes before
- writing a few bytes that were written 16 bytes after


Those sound more like the interconnect or root complex ignoring the byte 
strobes on an unaligned burst, of which I think the simplistic view 
would be "it's broken".


FWIW I stuck my old Nvidia 7600GT card in my Arm Juno r2 board (2x 
Cortex-A72), built your test program natively with GCC 8.1.1 at -O2, and 
it's still happily flickering pixels in the corner of the console after 
nearly an hour (in parallel with some iperf3 just to ensure plenty of 
PCIe traffic). I would strongly suspect this issue is particular to 
Armada 8k, so its' probably one for the Marvell folks to take a closer 
look at - I believe some previous interconnect issues on those SoCs were 
actually fixable in firmware.


Robin.


Re: [PATCH] arm64: Trap WFI executed in userspace

2018-08-07 Thread Robin Murphy

On 07/08/18 11:30, Dave Martin wrote:

On Tue, Aug 07, 2018 at 11:24:34AM +0100, Marc Zyngier wrote:

On 07/08/18 11:05, Dave Martin wrote:

On Tue, Aug 07, 2018 at 10:33:26AM +0100, Marc Zyngier wrote:

It recently came to light that userspace can execute WFI, and that
the arm64 kernel doesn trap this event. This sounds rather benign,
but the kernel should decide when it wants to wait for an interrupt,
and not userspace.

Let's trap WFI and treat it as a way to yield the CPU to another
process.


This doesn't amount to a justification.

If the power controller is unexpectedly left in a bad state so that
WFI will do something nasty to a cpu that may enter userspace, then we
probably have bigger problems.

So, maybe it really is pretty harmless to let userspace execute this.


Or not. It is also a very good way for userspace to find out when an
interrupt gets delivered and start doing all kind of probing on the
kernel. The least the userspace knows about that, the better I feel.


Possibly.  I suspect there are other ways to guess pretty accurately
when an interrupt occurs, but WFI allows greater precision.


...unless you're running in a VM and it traps to KVM anyway ;)


I can't think of a legitimate reason for userspace to execute WFI
however.  Userspace doesn't have interrupts under Linux, so it makes
no sense to wait for one.

Have we seen anybody using WFI in userspace?  It may be cleaner to
map this to SIGILL rather than be permissive and regret it later.


I couldn't find any user, and I'm happy to just send userspace to hell
in that case. But it could also been said that since it was never
prevented, it is a de-facto ABI.


Agreed.  I wonder whether it's sufficient to have this mapping to SIGILL
in -next for a while and see whether anybody complains.


I think we'd have to avoid that for compat, though, since v7 code would 
have the expectation that WFI can't be trapped by the kernel at all. 
Personally I'm in favour of this patch as-is, since the architectural 
intent of the instruction is essentially "I've got nothing better to do 
right now than wait for something to happen", so treating it as a poor 
man's sched_vield() stays close to that while mitigating the potential 
nefarious and/or minor DoS implications of letting it architecturally 
execute.


Robin.


Re: [PATCH v14 4/4] iommu/arm-smmu: Add support for qcom,smmu-v2 variant

2018-08-22 Thread Robin Murphy

On 27/07/18 08:02, Vivek Gautam wrote:

qcom,smmu-v2 is an arm,smmu-v2 implementation with specific
clock and power requirements. This smmu core is used with
multiple masters on msm8996, viz. mdss, video, etc.
Add bindings for the same.

Signed-off-by: Vivek Gautam 
Reviewed-by: Rob Herring 
Reviewed-by: Tomasz Figa 
---

Change since v13:
  - No change.

  .../devicetree/bindings/iommu/arm,smmu.txt | 42 ++
  drivers/iommu/arm-smmu.c   | 13 +++
  2 files changed, 55 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt 
b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
index 8a6ffce12af5..7c71a6ed465a 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
@@ -17,10 +17,19 @@ conditions.
  "arm,mmu-401"
  "arm,mmu-500"
  "cavium,smmu-v2"
+"qcom,-smmu-v2", "qcom,smmu-v2"
  
depending on the particular implementation and/or the

version of the architecture implemented.
  
+  A number of Qcom SoCs use qcom,smmu-v2 version of the IP.

+  "qcom,-smmu-v2" represents a soc specific compatible
+  string that should be present along with the "qcom,smmu-v2"
+  to facilitate SoC specific clocks/power connections and to
+  address specific bug fixes.


As demonstrated in the GPU thread, this proves a bit too vague for a 
useful binding. Provided Qcom folks can reach a consensus on what a 
given SoC is actually called, I'd rather just unambiguously list 
whatever sets of fully-defined strings we need.


Robin.


+  An example string would be -
+  "qcom,msm8996-smmu-v2", "qcom,smmu-v2".
+
  - reg   : Base address and size of the SMMU.
  
  - #global-interrupts : The number of global interrupts exposed by the

@@ -71,6 +80,22 @@ conditions.
or using stream matching with #iommu-cells = <2>, and
may be ignored if present in such cases.
  
+- clock-names:List of the names of clocks input to the device. The

+  required list depends on particular implementation and
+  is as follows:
+  - for "qcom,smmu-v2":
+- "bus": clock required for downstream bus access and
+ for the smmu ptw,
+- "iface": clock required to access smmu's registers
+   through the TCU's programming interface.
+  - unspecified for other implementations.
+
+- clocks: Specifiers for all clocks listed in the clock-names property,
+  as per generic clock bindings.
+
+- power-domains:  Specifiers for power domains required to be powered on for
+  the SMMU to operate, as per generic power domain bindings.
+
  ** Deprecated properties:
  
  - mmu-masters (deprecated in favour of the generic "iommus" binding) :

@@ -137,3 +162,20 @@ conditions.
  iommu-map = <0 &smmu3 0 0x400>;
  ...
  };
+
+   /* Qcom's arm,smmu-v2 implementation */
+   smmu4: iommu {
+   compatible = "qcom,msm8996-smmu-v2", "qcom,smmu-v2";
+   reg = <0xd0 0x1>;
+
+   #global-interrupts = <1>;
+   interrupts = ,
+,
+;
+   #iommu-cells = <1>;
+   power-domains = <&mmcc MDSS_GDSC>;
+
+   clocks = <&mmcc SMMU_MDP_AXI_CLK>,
+<&mmcc SMMU_MDP_AHB_CLK>;
+   clock-names = "bus", "iface";
+   };
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index e558abf1ecfc..2b4edba188a5 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -119,6 +119,7 @@ enum arm_smmu_implementation {
GENERIC_SMMU,
ARM_MMU500,
CAVIUM_SMMUV2,
+   QCOM_SMMUV2,
  };
  
  struct arm_smmu_s2cr {

@@ -1971,6 +1972,17 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, 
GENERIC_SMMU);
  ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
  ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
  
+static const char * const qcom_smmuv2_clks[] = {

+   "bus", "iface",
+};
+
+static const struct arm_smmu_match_data qcom_smmuv2 = {
+   .version = ARM_SMMU_V2,
+   .model = QCOM_SMMUV2,
+   .clks = qcom_smmuv2_clks,
+   .num_clks = ARRAY_SIZE(qcom_smmuv2_clks),
+};
+
  static const struct of_device_id arm_smmu_of_match[] = {
{ .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 },
{ .compatible = "arm,smmu-v2", .data = &smmu_generic_v2 },
@@ -1978,6 +1990,7 @@ static const struct of_device_id arm_smmu_of_match[] = {
{ .compatible = "arm,mmu-401", 

Re: [PATCH mmc-next v2 3/3] mmc: sdhci-of-dwcmshc: solve 128MB DMA boundary limitation

2018-07-30 Thread Robin Murphy

Hi Jisheng,

On 26/07/18 08:14, Jisheng Zhang wrote:

When using DMA, if the DMA addr spans 128MB boundary, we have to split
the DMA transfer into two so that each one doesn't exceed the boundary.


Out of interest, is the driver already setting its segment boundary mask 
appropriately? This sounds like the exact kind of hardware restriction 
that dma_parms is intended to describe, which scatterlist-generating 
code is *supposed* to already respect.


Robin.


Signed-off-by: Jisheng Zhang 
---
  drivers/mmc/host/sdhci-of-dwcmshc.c | 42 +
  1 file changed, 42 insertions(+)

diff --git a/drivers/mmc/host/sdhci-of-dwcmshc.c 
b/drivers/mmc/host/sdhci-of-dwcmshc.c
index 1b7cd144fb01..7e189514bc83 100644
--- a/drivers/mmc/host/sdhci-of-dwcmshc.c
+++ b/drivers/mmc/host/sdhci-of-dwcmshc.c
@@ -8,21 +8,51 @@
   */
  
  #include 

+#include 
  #include 
  #include 
  
  #include "sdhci-pltfm.h"
  
+#define BOUNDARY_OK(addr, len) \

+   ((addr | (SZ_128M - 1)) == ((addr + len - 1) | (SZ_128M - 1)))
+
  struct dwcmshc_priv {
struct clk  *bus_clk;
  };
  
+/*

+ * if DMA addr spans 128MB boundary, we split the DMA transfer into two
+ * so that the DMA transfer doesn't exceed the boundary.
+ */
+static unsigned int dwcmshc_adma_write_desc(struct sdhci_host *host,
+   void *desc, dma_addr_t addr,
+   int len, unsigned int cmd)
+{
+   int tmplen, offset;
+
+   if (BOUNDARY_OK(addr, len) || !len)
+   return _sdhci_adma_write_desc(host, desc, addr, len, cmd);
+
+   offset = addr & (SZ_128M - 1);
+   tmplen = SZ_128M - offset;
+   _sdhci_adma_write_desc(host, desc, addr, tmplen, cmd);
+
+   addr += tmplen;
+   len -= tmplen;
+   desc += host->desc_sz;
+   _sdhci_adma_write_desc(host, desc, addr, len, cmd);
+
+   return host->desc_sz * 2;
+}
+
  static const struct sdhci_ops sdhci_dwcmshc_ops = {
.set_clock  = sdhci_set_clock,
.set_bus_width  = sdhci_set_bus_width,
.set_uhs_signaling  = sdhci_set_uhs_signaling,
.get_max_clock  = sdhci_pltfm_clk_get_max_clock,
.reset  = sdhci_reset,
+   .adma_write_desc= dwcmshc_adma_write_desc,
  };
  
  static const struct sdhci_pltfm_data sdhci_dwcmshc_pdata = {

@@ -36,12 +66,24 @@ static int dwcmshc_probe(struct platform_device *pdev)
struct sdhci_host *host;
struct dwcmshc_priv *priv;
int err;
+   u32 extra;
  
  	host = sdhci_pltfm_init(pdev, &sdhci_dwcmshc_pdata,

sizeof(struct dwcmshc_priv));
if (IS_ERR(host))
return PTR_ERR(host);
  
+	/*

+* The DMA descriptor table number is calculated as the maximum
+* number of segments times 2, to allow for an alignment
+* descriptor for each segment, plus 1 for a nop end descriptor,
+* plus extra number for cross 128M boundary handling.
+*/
+   extra = DIV_ROUND_UP(totalram_pages, SZ_128M / PAGE_SIZE);
+   if (extra > SDHCI_MAX_SEGS)
+   extra = SDHCI_MAX_SEGS;
+   host->adma_table_num = SDHCI_MAX_SEGS * 2 + 1 + extra;
+
pltfm_host = sdhci_priv(host);
priv = sdhci_pltfm_priv(pltfm_host);
  



Re: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default coherent DMA mask)

2018-07-30 Thread Robin Murphy

On 28/07/18 17:58, Guenter Roeck wrote:

On Fri, Jul 27, 2018 at 04:04:48PM +0200, Christoph Hellwig wrote:

On Fri, Jul 27, 2018 at 03:18:14PM +0200, Krzysztof Kozlowski wrote:

On 27 July 2018 at 15:11, Krzysztof Kozlowski  wrote:

Hi,

On today's next, the bisect pointed commit
ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d as fault for my boot failures
with NFSv4 root on Toradex Colibri VF50 (Iris carrier board).

Author: Robin Murphy 
Date:   Mon Jul 23 23:16:12 2018 +0100
 OF: Don't set default coherent DMA mask

Board: Toradex Colibri VF50 (NXP VF500, Cortex A5, serial configured
with DMA) on Iris Carrier.

It looks like problem with Freescale Ethernet driver:
[   15.458477] fsl-edma 40018000.dma-controller: coherent DMA mask is unset
[   15.465284] fsl-lpuart 40027000.serial: Cannot prepare cyclic DMA
[   15.472086] Root-NFS: no NFS server address
[   15.476359] VFS: Unable to mount root fs via NFS, trying floppy.
[   15.484228] VFS: Cannot open root device "nfs" or
unknown-block(2,0): error -6
[   15.491664] Please append a correct "root=" boot option; here are
the available partitions:
[   15.500188] 0100   16384 ram0
[   15.500200]  (driver?)
[   15.506406] Kernel panic - not syncing: VFS: Unable to mount root
fs on unknown-block(2,0)
[   15.514747] ---[ end Kernel panic - not syncing: VFS: Unable to
mount root fs on unknown-block(2,0) ]---

Attached - defconfig and full boot log.

Any hints?
Let me know if you need any more information.


My Exynos boards also fail to boot on missing network:
https://krzk.eu/#/builders/21/builds/799/steps/10/logs/serial0

As expected there are plenty of "DMA mask not set" warnings... and
later dwc3 driver fails with:
 dwc3: probe of 1240.dwc3 failed with error -12
which is probably the answer why LAN attached to USB is not present.


Looks like all the drivers failed to set a dma mask and were lucky.


I would call it a serious regression. Also, no longer setting a default
coherent DMA mask is a quite substantial behavioral change, especially
if and since the code worked just fine up to now.


To reiterate, that particular side-effect was an unintentional 
oversight, and I was simply (un)lucky enough that none of the drivers I 
did test depended on that default mask. Sorry for the blip; please check 
whether it's now fixed in next-20180730 as it should be.



Crash when booting sam460ex attached below, as is a bisect log.


Nevertheless, like most of the others that came out of the woodwork, 
that appears to be a crash due to a broken cleanup path down the line 
from dma_alloc_coherent() returning NULL - that warrants fixing (or just 
removing) in its own right, because cleanup code which has never been 
tested and doesn't actually work is little more than a pointless waste 
of space.


Robin.



Guenter

---
irq: type mismatch, failed to map hwirq-0 for interrupt-controller3!
WARNING: CPU: 0 PID: 1 at ppc4xx_msi_probe+0x2dc/0x3b8
Modules linked in:
CPU: 0 PID: 1 Comm: swapper Not tainted 4.18.0-rc6-00010-gff33d1030a6c #1
NIP:  c001c460 LR: c001c29c CTR: 
REGS: cf82db60 TRAP: 0700   Not tainted  (4.18.0-rc6-00010-gff33d1030a6c)
MSR:  00029000   CR: 24002028  XER: 

GPR00: c001c29c cf82dc10 cf828000 d1021000 d1021000 cf882108 cf82db78 
GPR08:  c0377ae4  151b 24002028  c00025e8 
GPR16:       c0492380 004a
GPR24: 00029000 000c 1000 cf8de410 c0494d60 00029000 cf8bebc0 cf8de400
NIP [c001c460] ppc4xx_msi_probe+0x2dc/0x3b8
LR [c001c29c] ppc4xx_msi_probe+0x118/0x3b8
Call Trace:
[cf82dc10] [c001c29c] ppc4xx_msi_probe+0x118/0x3b8 (unreliable)
[cf82dc70] [c0209fbc] platform_drv_probe+0x40/0x9c
[cf82dc90] [c0208240] driver_probe_device+0x2a8/0x350
[cf82dcc0] [c0206204] bus_for_each_drv+0x60/0xac
[cf82dcf0] [c0207e88] __device_attach+0xe8/0x160
[cf82dd20] [c02071e0] bus_probe_device+0xa0/0xbc
[cf82dd40] [c02050c8] device_add+0x404/0x5c4
[cf82dd90] [c0288978] of_platform_device_create_pdata+0x88/0xd8
[cf82ddb0] [c0288b70] of_platform_bus_create+0x134/0x220
[cf82de10] [c0288bcc] of_platform_bus_create+0x190/0x220
[cf82de70] [c0288cf4] of_platform_bus_probe+0x98/0xec
[cf82de90] [c0449650] 
__machine_initcall_canyonlands_ppc460ex_device_probe+0x38/0x54
[cf82dea0] [c0002404] do_one_initcall+0x40/0x188
[cf82df00] [c043daec] kernel_init_freeable+0x130/0x1d0
[cf82df30] [c0002600] kernel_init+0x18/0x104
[cf82df40] [c000c23c] ret_from_kernel_thread+0x14/0x1c
Instruction dump:
386e 4bffa2a5 386f 7f44d378 4bffa299 4bfffe30 386e 4bffa28d
386f 7f24cb78 4bffa281 4bfffde4 <0fe0> 8129 2f89 409efe6c
---[ end trace 8cf551077ecfc429 ]---
ppc4xx-msi c1000.ppc4xx-msi: coherent DMA mask is unset
Unable to handle kernel paging request for data at address 0x
Faulting instruction address: 0xc001bff0
Oops: Kernel access of bad area, sig: 11 [#1]
BE Canyonlands
Modules li

Re: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default coherent DMA mask)

2018-07-31 Thread Robin Murphy

On 31/07/18 09:19, Stefan Agner wrote:

On 30.07.2018 16:38, Robin Murphy wrote:

On 28/07/18 17:58, Guenter Roeck wrote:

On Fri, Jul 27, 2018 at 04:04:48PM +0200, Christoph Hellwig wrote:

On Fri, Jul 27, 2018 at 03:18:14PM +0200, Krzysztof Kozlowski wrote:

On 27 July 2018 at 15:11, Krzysztof Kozlowski  wrote:

Hi,

On today's next, the bisect pointed commit
ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d as fault for my boot failures
with NFSv4 root on Toradex Colibri VF50 (Iris carrier board).

Author: Robin Murphy 
Date:   Mon Jul 23 23:16:12 2018 +0100
  OF: Don't set default coherent DMA mask

Board: Toradex Colibri VF50 (NXP VF500, Cortex A5, serial configured
with DMA) on Iris Carrier.

It looks like problem with Freescale Ethernet driver:
[   15.458477] fsl-edma 40018000.dma-controller: coherent DMA mask is unset
[   15.465284] fsl-lpuart 40027000.serial: Cannot prepare cyclic DMA
[   15.472086] Root-NFS: no NFS server address
[   15.476359] VFS: Unable to mount root fs via NFS, trying floppy.
[   15.484228] VFS: Cannot open root device "nfs" or
unknown-block(2,0): error -6
[   15.491664] Please append a correct "root=" boot option; here are
the available partitions:
[   15.500188] 0100   16384 ram0
[   15.500200]  (driver?)
[   15.506406] Kernel panic - not syncing: VFS: Unable to mount root
fs on unknown-block(2,0)
[   15.514747] ---[ end Kernel panic - not syncing: VFS: Unable to
mount root fs on unknown-block(2,0) ]---

Attached - defconfig and full boot log.

Any hints?
Let me know if you need any more information.


My Exynos boards also fail to boot on missing network:
https://krzk.eu/#/builders/21/builds/799/steps/10/logs/serial0

As expected there are plenty of "DMA mask not set" warnings... and
later dwc3 driver fails with:
  dwc3: probe of 1240.dwc3 failed with error -12
which is probably the answer why LAN attached to USB is not present.


Looks like all the drivers failed to set a dma mask and were lucky.


I would call it a serious regression. Also, no longer setting a default
coherent DMA mask is a quite substantial behavioral change, especially
if and since the code worked just fine up to now.


To reiterate, that particular side-effect was an unintentional
oversight, and I was simply (un)lucky enough that none of the drivers
I did test depended on that default mask. Sorry for the blip; please
check whether it's now fixed in next-20180730 as it should be.



Just for my understanding:

Your first patch ("OF: Don't set default coherent DMA mask") sounded
like that *not* setting default coherent DMA mask was intentionally.
Since the commit message reads: "...the bus code has not initialised any
default value" that was assuming that all bus code sets a default DMA
mask which wasn't the case for "simple-bus".


Yes, reading the patches in the order they were written is perhaps a 
little unclear, but hopefully the order in which they are now applied 
makes more sense.



So I guess that is what ("of/platform: Initialise default DMA masks")
makes up for in the typical device tree case ("simple-bus")?


Indeed, I'd missed the fact that the now-out-of-place-looking 
initialisation in of_dma_configure() still actually belonged to 
of_platform_device_create_pdata() - that patch should make the 
assumptions of "OF: Don't set default coherent DMA mask" true again, 
even for OF-platform devices.



Now, since almost all drivers are inside a soc "simple-bus" and DMA mask
is set again, can/should we rely on the coherent DMA mask set?

Or is the expectation still that this is set on driver level too?


Ideally, we'd like all drivers to explicitly request their masks as the 
documentation in DMA-API-HOWTO.txt recommends, if only to ensure DMA is 
actually possible - there can be systems where even the default 32-bit 
mask is no good - but clearly we're a little way off trying to enforce 
that just yet.


Robin.


Re: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default coherent DMA mask)

2018-07-31 Thread Robin Murphy

On 31/07/18 14:26, Guenter Roeck wrote:

On 07/31/2018 05:32 AM, Robin Murphy wrote:

On 31/07/18 09:19, Stefan Agner wrote:

On 30.07.2018 16:38, Robin Murphy wrote:

On 28/07/18 17:58, Guenter Roeck wrote:

On Fri, Jul 27, 2018 at 04:04:48PM +0200, Christoph Hellwig wrote:

On Fri, Jul 27, 2018 at 03:18:14PM +0200, Krzysztof Kozlowski wrote:
On 27 July 2018 at 15:11, Krzysztof Kozlowski  
wrote:

Hi,

On today's next, the bisect pointed commit
ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d as fault for my boot 
failures

with NFSv4 root on Toradex Colibri VF50 (Iris carrier board).

Author: Robin Murphy 
Date:   Mon Jul 23 23:16:12 2018 +0100
  OF: Don't set default coherent DMA mask

Board: Toradex Colibri VF50 (NXP VF500, Cortex A5, serial 
configured

with DMA) on Iris Carrier.

It looks like problem with Freescale Ethernet driver:
[   15.458477] fsl-edma 40018000.dma-controller: coherent DMA 
mask is unset
[   15.465284] fsl-lpuart 40027000.serial: Cannot prepare cyclic 
DMA

[   15.472086] Root-NFS: no NFS server address
[   15.476359] VFS: Unable to mount root fs via NFS, trying floppy.
[   15.484228] VFS: Cannot open root device "nfs" or
unknown-block(2,0): error -6
[   15.491664] Please append a correct "root=" boot option; here 
are

the available partitions:
[   15.500188] 0100   16384 ram0
[   15.500200]  (driver?)
[   15.506406] Kernel panic - not syncing: VFS: Unable to mount 
root

fs on unknown-block(2,0)
[   15.514747] ---[ end Kernel panic - not syncing: VFS: Unable to
mount root fs on unknown-block(2,0) ]---

Attached - defconfig and full boot log.

Any hints?
Let me know if you need any more information.


My Exynos boards also fail to boot on missing network:
https://krzk.eu/#/builders/21/builds/799/steps/10/logs/serial0

As expected there are plenty of "DMA mask not set" warnings... and
later dwc3 driver fails with:
  dwc3: probe of 1240.dwc3 failed with error -12
which is probably the answer why LAN attached to USB is not present.


Looks like all the drivers failed to set a dma mask and were lucky.


I would call it a serious regression. Also, no longer setting a 
default

coherent DMA mask is a quite substantial behavioral change, especially
if and since the code worked just fine up to now.


To reiterate, that particular side-effect was an unintentional
oversight, and I was simply (un)lucky enough that none of the drivers
I did test depended on that default mask. Sorry for the blip; please
check whether it's now fixed in next-20180730 as it should be.



Just for my understanding:

Your first patch ("OF: Don't set default coherent DMA mask") sounded
like that *not* setting default coherent DMA mask was intentionally.
Since the commit message reads: "...the bus code has not initialised any
default value" that was assuming that all bus code sets a default DMA
mask which wasn't the case for "simple-bus".


Yes, reading the patches in the order they were written is perhaps a 
little unclear, but hopefully the order in which they are now applied 
makes more sense.



So I guess that is what ("of/platform: Initialise default DMA masks")
makes up for in the typical device tree case ("simple-bus")?


Indeed, I'd missed the fact that the now-out-of-place-looking 
initialisation in of_dma_configure() still actually belonged to 
of_platform_device_create_pdata() - that patch should make the 
assumptions of "OF: Don't set default coherent DMA mask" true again, 
even for OF-platform devices.



Now, since almost all drivers are inside a soc "simple-bus" and DMA mask
is set again, can/should we rely on the coherent DMA mask set?

Or is the expectation still that this is set on driver level too?


Ideally, we'd like all drivers to explicitly request their masks as 
the documentation in DMA-API-HOWTO.txt recommends, if only to ensure 
DMA is actually possible - there can be systems where even the default 
32-bit mask is no good - but clearly we're a little way off trying to 
enforce that just yet.


Robin.



Please note that sparc images still generate the warning (next-20180731).


Ugh, OK, any ideas what sparc does to create these platform devices that 
isn't of_platform_device_create_pdata() and has somehow grown an 
implicit dependency on of_dma_configure() since 4.12? I'm looking, but 
nothing jumps out...


Robin.


sunlance ffd35110: DMA mask not set
sunlance.c:v2.02 8/24/03 Miguel de Icaza (mig...@nuclecu.unam.mx)
ioremap: done with statics, switching to malloc
[ cut here ]
WARNING: CPU: 0 PID: 1 at ./include/linux/dma-mapping.h:516 
sparc_lance_probe_one+0x428/0x4f4


esp ffd38e90: DMA mask not set
[ cut here ]
WARNING: CPU: 0 PID: 1 at ./include/linux/dma-mapping.h:516 
esp_sbus_probe+0x408/0x6e8


Guenter



Re: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default coherent DMA mask)

2018-07-31 Thread Robin Murphy

On 31/07/18 16:43, Guenter Roeck wrote:

On Tue, Jul 31, 2018 at 03:09:34PM +0100, Robin Murphy wrote:

Please note that sparc images still generate the warning (next-20180731).


Ugh, OK, any ideas what sparc does to create these platform devices that
isn't of_platform_device_create_pdata() and has somehow grown an implicit
dependency on of_dma_configure() since 4.12? I'm looking, but nothing jumps
out...



I suspect it might be of_device_register(), called from
arch/sparc/kernel/of_device_64.c:scan_one_device()
arch/sparc/kernel/of_device_32.c:scan_one_device()


Right, that's as far as I got as well, so I'm struggling to see how 
these things ever got DMA masks set before the of_dma_configure() call 
moved out of of_platform_device_create_pdata(), or why it wasn't a 
problem prior to the generic dma_ops rework if they didn't :/


Robin.


Re: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default coherent DMA mask)

2018-07-31 Thread Robin Murphy

On 31/07/18 16:53, Stefan Agner wrote:

On 31.07.2018 14:32, Robin Murphy wrote:

On 31/07/18 09:19, Stefan Agner wrote:

On 30.07.2018 16:38, Robin Murphy wrote:

On 28/07/18 17:58, Guenter Roeck wrote:

On Fri, Jul 27, 2018 at 04:04:48PM +0200, Christoph Hellwig wrote:

On Fri, Jul 27, 2018 at 03:18:14PM +0200, Krzysztof Kozlowski wrote:

On 27 July 2018 at 15:11, Krzysztof Kozlowski  wrote:

Hi,

On today's next, the bisect pointed commit
ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d as fault for my boot failures
with NFSv4 root on Toradex Colibri VF50 (Iris carrier board).

Author: Robin Murphy 
Date:   Mon Jul 23 23:16:12 2018 +0100
   OF: Don't set default coherent DMA mask

Board: Toradex Colibri VF50 (NXP VF500, Cortex A5, serial configured
with DMA) on Iris Carrier.

It looks like problem with Freescale Ethernet driver:
[   15.458477] fsl-edma 40018000.dma-controller: coherent DMA mask is unset
[   15.465284] fsl-lpuart 40027000.serial: Cannot prepare cyclic DMA
[   15.472086] Root-NFS: no NFS server address
[   15.476359] VFS: Unable to mount root fs via NFS, trying floppy.
[   15.484228] VFS: Cannot open root device "nfs" or
unknown-block(2,0): error -6
[   15.491664] Please append a correct "root=" boot option; here are
the available partitions:
[   15.500188] 0100   16384 ram0
[   15.500200]  (driver?)
[   15.506406] Kernel panic - not syncing: VFS: Unable to mount root
fs on unknown-block(2,0)
[   15.514747] ---[ end Kernel panic - not syncing: VFS: Unable to
mount root fs on unknown-block(2,0) ]---

Attached - defconfig and full boot log.

Any hints?
Let me know if you need any more information.


My Exynos boards also fail to boot on missing network:
https://krzk.eu/#/builders/21/builds/799/steps/10/logs/serial0

As expected there are plenty of "DMA mask not set" warnings... and
later dwc3 driver fails with:
   dwc3: probe of 1240.dwc3 failed with error -12
which is probably the answer why LAN attached to USB is not present.


Looks like all the drivers failed to set a dma mask and were lucky.


I would call it a serious regression. Also, no longer setting a default
coherent DMA mask is a quite substantial behavioral change, especially
if and since the code worked just fine up to now.


To reiterate, that particular side-effect was an unintentional
oversight, and I was simply (un)lucky enough that none of the drivers
I did test depended on that default mask. Sorry for the blip; please
check whether it's now fixed in next-20180730 as it should be.



Just for my understanding:

Your first patch ("OF: Don't set default coherent DMA mask") sounded
like that *not* setting default coherent DMA mask was intentionally.
Since the commit message reads: "...the bus code has not initialised any
default value" that was assuming that all bus code sets a default DMA
mask which wasn't the case for "simple-bus".


Yes, reading the patches in the order they were written is perhaps a
little unclear, but hopefully the order in which they are now applied
makes more sense.


So I guess that is what ("of/platform: Initialise default DMA masks")
makes up for in the typical device tree case ("simple-bus")?


Indeed, I'd missed the fact that the now-out-of-place-looking
initialisation in of_dma_configure() still actually belonged to
of_platform_device_create_pdata() - that patch should make the
assumptions of "OF: Don't set default coherent DMA mask" true again,
even for OF-platform devices.


Now, since almost all drivers are inside a soc "simple-bus" and DMA mask
is set again, can/should we rely on the coherent DMA mask set?

Or is the expectation still that this is set on driver level too?


Ideally, we'd like all drivers to explicitly request their masks as
the documentation in DMA-API-HOWTO.txt recommends, if only to ensure
DMA is actually possible - there can be systems where even the default
32-bit mask is no good - but clearly we're a little way off trying to
enforce that just yet.


In the FEC driver case, there is an integrated DMA (uDMA). It has
alignment restrictions, but can otherwise address the full 32-bit range.

So something like this should do it right?

if (dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32))) {
dev_warn(dev, "No suitable DMA available\n");
return -ENODEV;
}



Yup, precisely.


However, that, as far as I understand, still requires that the bus set
up dma_mask properly.

Should I be using dma_coerce_mask_and_coherent?


AFAICS for FEC, the ColdFire instances have statically-set masks, the 
i.MX boardfiles get them set via platform+device_register_full(), and 
now that the bug-which-never-should-have-been is fixed the DT-based 
instances should be fine too, so you should be good to go. In general 
I'd say that the dma_coerce_mask*() routines are only really fo

Re: [PATCH v2 0/4] arm64 SMMUv3 PMU driver with IORT support

2018-08-01 Thread Robin Murphy

Hi Shameer,

On 01/08/18 09:52, Shameerali Kolothum Thodi wrote:

Hi Lorenzo/Robin,

Just a  gentle ping on this series. This is a v2 for smmu pmcg support
based on Neil Leeder's v1[1].


Thanks for picking this up - it's not gone unnoticed ;)

It'll take me a while to page all this stuff back in, so given where we 
are in the cycle I was planning to review it once rc1 is out, hope 
that's OK.


Cheers,
Robin.


Main changes include,
-an helper function to IORT to retrieve the associated SMMU info.
-MSI support to the PMU driver.

Please take a look and let me know your thoughts.

Thanks,
Shameer

[1]https://www.spinics.net/lists/arm-kernel/msg598591.html


-Original Message-
From: Linuxarm [mailto:linuxarm-boun...@huawei.com] On Behalf Of
Shameer Kolothum
Sent: 24 July 2018 12:45
To: lorenzo.pieral...@arm.com; robin.mur...@arm.com
Cc: mark.rutl...@arm.com; vkil...@codeaurora.org;
neil.m.lee...@gmail.com; pa...@codeaurora.org; will.dea...@arm.com;
rruig...@codeaurora.org; Linuxarm ; linux-
ker...@vger.kernel.org; linux-a...@vger.kernel.org; linux-arm-
ker...@lists.infradead.org
Subject: [PATCH v2 0/4] arm64 SMMUv3 PMU driver with IORT support

This adds a driver for the SMMUv3 PMU into the perf framework.
It includes an IORT update to support PM Counter Groups.

This is based on the initial work done by Neil Leeder[1]

SMMUv3 PMCG devices are named as arm_smmu_v3_x_pmcg_y where x
denotes the associated smmuv3 dev id(if any) and y denotes the
pmu dev id.

Usage example:
For common arch supported events:
perf stat -e arm_smmu_v3_0_pmcg_6/transaction,filter_enable=1,
  filter_span=1,filter_stream_id=0x42/ -a pwd

For IMP DEF events:
perf stat -e arm_smmu_v3.0_pmcg.6/event=id/ -a pwd

Sanity tested on HiSilicon platform. Further testing on supported
platforms are very much welcome.

v1 --> v2

- Addressed comments from Robin.
- Added an helper to retrieve the associated smmu dev and named PMUs
   to make the association visible to user.
- Added MSI support  for overflow irq

[1]https://www.spinics.net/lists/arm-kernel/msg598591.html

Neil Leeder (2):
   acpi: arm64: add iort support for PMCG
   perf: add arm64 smmuv3 pmu driver

Shameer Kolothum (2):
   acpi: arm64: iort helper to find the associated smmu of pmcg node
   perf/smmuv3: Add MSI irq support

  drivers/acpi/arm64/iort.c | 179 +++--
  drivers/perf/Kconfig  |   9 +
  drivers/perf/Makefile |   1 +
  drivers/perf/arm_smmuv3_pmu.c | 901
++
  include/linux/acpi_iort.h |   4 +
  5 files changed, 1063 insertions(+), 31 deletions(-)
  create mode 100644 drivers/perf/arm_smmuv3_pmu.c

--
2.7.4


___
Linuxarm mailing list
linux...@huawei.com
http://hulk.huawei.com/mailman/listinfo/linuxarm


Re: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default coherent DMA mask)

2018-08-01 Thread Robin Murphy

On 31/07/18 18:38, Guenter Roeck wrote:

On Tue, Jul 31, 2018 at 04:58:41PM +0100, Robin Murphy wrote:

On 31/07/18 16:43, Guenter Roeck wrote:

On Tue, Jul 31, 2018 at 03:09:34PM +0100, Robin Murphy wrote:

Please note that sparc images still generate the warning (next-20180731).


Ugh, OK, any ideas what sparc does to create these platform devices that
isn't of_platform_device_create_pdata() and has somehow grown an implicit
dependency on of_dma_configure() since 4.12? I'm looking, but nothing jumps
out...



I suspect it might be of_device_register(), called from
arch/sparc/kernel/of_device_64.c:scan_one_device()
arch/sparc/kernel/of_device_32.c:scan_one_device()


Right, that's as far as I got as well, so I'm struggling to see how these
things ever got DMA masks set before the of_dma_configure() call moved out
of of_platform_device_create_pdata(), or why it wasn't a problem prior to
the generic dma_ops rework if they didn't :/


Ah, ok. No idea, sorry. All I know is that the messages were first seen
with next-20180727.


OK, I spent this afternoon wrangling toolchains and QEMU to boot an 
instrumented 4.11 kernel, and the answer is that the warnings are 
arguably correct. These masks have indeed never been set where they 
should have been, but then the sbus_dma_ops don't reference them anyway.


The coherent mask WARN_ON *should* have started appearing in 4.16 with 
205e1b7f51e4("dma-mapping: warn when there is no coherent_dma_mask"), 
but happened to be hidden by the inadvertent side-effect of the prior 
dma_configure() change. Since there's seemingly no actual regression of 
functionality, I'm inclined to leave this in the hands of whoever cares 
about sparc32.


Robin.


Re: [PATCH] arm_pmu: fix compiler warning in arm_pmu_device_probe

2018-08-02 Thread Robin Murphy

Hi Chris,

On 01/08/18 22:45, Chris Packham wrote:

GCC warns

   arm_pmu_platform.c:234:5: error: 'err' may be used uninitialized in this 
function [-Werror=maybe-uninitialized]

This is because we rely on the for_each_cpu loop in armpmu_request_irqs
to initialise err. The warning is a little bogus because we know if
there were 0 CPUs this code would not be running.

Initialise err to 0 to avoid the warning.


Maybe initialising to something like -EINVAL would be more appropriate, 
just in case we did ever manage to get here with armpmu->supported_cpus 
unset?


Robin.


Signed-off-by: Chris Packham 
---
This has been reported before in https://lkml.org/lkml/2018/3/5/508 I'm not
sure if it was dismmissed as "meh, gcc is wrong" or if it was just wainting for
someone with some round tuits.


  drivers/perf/arm_pmu_platform.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/perf/arm_pmu_platform.c b/drivers/perf/arm_pmu_platform.c
index 971ff336494a..96075cecb0ae 100644
--- a/drivers/perf/arm_pmu_platform.c
+++ b/drivers/perf/arm_pmu_platform.c
@@ -160,7 +160,7 @@ static int pmu_parse_irqs(struct arm_pmu *pmu)
  static int armpmu_request_irqs(struct arm_pmu *armpmu)
  {
struct pmu_hw_events __percpu *hw_events = armpmu->hw_events;
-   int cpu, err;
+   int cpu, err = 0;
  
  	for_each_cpu(cpu, &armpmu->supported_cpus) {

int irq = per_cpu(hw_events->irq, cpu);



Re: [PATCH 4/7 v6] iommu/arm-smmu: Add support for the fsl-mc bus

2018-07-25 Thread Robin Murphy

On 09/07/18 12:18, Nipun Gupta wrote:

Implement bus specific support for the fsl-mc bus including
registering arm_smmu_ops and bus specific device add operations.


I guess this is about as neat as it can get;

Reviewed-by: Robin Murphy 


Signed-off-by: Nipun Gupta 
---
  drivers/iommu/arm-smmu.c |  7 +++
  drivers/iommu/iommu.c| 13 +
  include/linux/fsl/mc.h   |  8 
  include/linux/iommu.h|  2 ++
  4 files changed, 30 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index f7a96bc..a011bb6 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -52,6 +52,7 @@
  #include 
  
  #include 

+#include 
  
  #include "io-pgtable.h"

  #include "arm-smmu-regs.h"
@@ -1459,6 +1460,8 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
  
  	if (dev_is_pci(dev))

group = pci_device_group(dev);
+   else if (dev_is_fsl_mc(dev))
+   group = fsl_mc_device_group(dev);
else
group = generic_device_group(dev);
  
@@ -2037,6 +2040,10 @@ static void arm_smmu_bus_init(void)

bus_set_iommu(&pci_bus_type, &arm_smmu_ops);
}
  #endif
+#ifdef CONFIG_FSL_MC_BUS
+   if (!iommu_present(&fsl_mc_bus_type))
+   bus_set_iommu(&fsl_mc_bus_type, &arm_smmu_ops);
+#endif
  }
  
  static int arm_smmu_device_probe(struct platform_device *pdev)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d227b86..df2f49e 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -32,6 +32,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  
  static struct kset *iommu_group_kset;

@@ -988,6 +989,18 @@ struct iommu_group *pci_device_group(struct device *dev)
return iommu_group_alloc();
  }
  
+/* Get the IOMMU group for device on fsl-mc bus */

+struct iommu_group *fsl_mc_device_group(struct device *dev)
+{
+   struct device *cont_dev = fsl_mc_cont_dev(dev);
+   struct iommu_group *group;
+
+   group = iommu_group_get(cont_dev);
+   if (!group)
+   group = iommu_group_alloc();
+   return group;
+}
+
  /**
   * iommu_group_get_for_dev - Find or create the IOMMU group for a device
   * @dev: target device
diff --git a/include/linux/fsl/mc.h b/include/linux/fsl/mc.h
index f27cb14..dddaca1 100644
--- a/include/linux/fsl/mc.h
+++ b/include/linux/fsl/mc.h
@@ -351,6 +351,14 @@ struct fsl_mc_io {
  #define dev_is_fsl_mc(_dev) (0)
  #endif
  
+/* Macro to check if a device is a container device */

+#define fsl_mc_is_cont_dev(_dev) (to_fsl_mc_device(_dev)->flags & \
+   FSL_MC_IS_DPRC)
+
+/* Macro to get the container device of a MC device */
+#define fsl_mc_cont_dev(_dev) (fsl_mc_is_cont_dev(_dev) ? \
+   (_dev) : (_dev)->parent)
+
  /*
   * module_fsl_mc_driver() - Helper macro for drivers that don't do
   * anything special in module init/exit.  This eliminates a lot of
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 7447b0b..209891d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -389,6 +389,8 @@ static inline size_t iommu_map_sg(struct iommu_domain 
*domain,
  extern struct iommu_group *pci_device_group(struct device *dev);
  /* Generic device grouping function */
  extern struct iommu_group *generic_device_group(struct device *dev);
+/* FSL-MC device grouping function */
+struct iommu_group *fsl_mc_device_group(struct device *dev);
  
  /**

   * struct iommu_fwspec - per-device IOMMU instance data



Re: [PATCH 5/7 v6] bus/fsl-mc: support dma configure for devices on fsl-mc bus

2018-07-25 Thread Robin Murphy

On 09/07/18 12:18, Nipun Gupta wrote:

This patch adds support of dma configuration for devices on fsl-mc
bus using 'dma_configure' callback for busses. Also, directly calling
arch_setup_dma_ops is removed from the fsl-mc bus.


Reviewed-by: Robin Murphy 


Signed-off-by: Nipun Gupta 
Reviewed-by: Laurentiu Tudor 
---
  drivers/bus/fsl-mc/fsl-mc-bus.c | 15 +++
  1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/bus/fsl-mc/fsl-mc-bus.c b/drivers/bus/fsl-mc/fsl-mc-bus.c
index 5d8266c..fa43c7d 100644
--- a/drivers/bus/fsl-mc/fsl-mc-bus.c
+++ b/drivers/bus/fsl-mc/fsl-mc-bus.c
@@ -127,6 +127,16 @@ static int fsl_mc_bus_uevent(struct device *dev, struct 
kobj_uevent_env *env)
return 0;
  }
  
+static int fsl_mc_dma_configure(struct device *dev)

+{
+   struct device *dma_dev = dev;
+
+   while (dev_is_fsl_mc(dma_dev))
+   dma_dev = dma_dev->parent;
+
+   return of_dma_configure(dev, dma_dev->of_node, 0);
+}
+
  static ssize_t modalias_show(struct device *dev, struct device_attribute 
*attr,
 char *buf)
  {
@@ -148,6 +158,7 @@ struct bus_type fsl_mc_bus_type = {
.name = "fsl-mc",
.match = fsl_mc_bus_match,
.uevent = fsl_mc_bus_uevent,
+   .dma_configure  = fsl_mc_dma_configure,
.dev_groups = fsl_mc_dev_groups,
  };
  EXPORT_SYMBOL_GPL(fsl_mc_bus_type);
@@ -633,10 +644,6 @@ int fsl_mc_device_add(struct fsl_mc_obj_desc *obj_desc,
goto error_cleanup_dev;
}
  
-	/* Objects are coherent, unless 'no shareability' flag set. */

-   if (!(obj_desc->flags & FSL_MC_OBJ_FLAG_NO_MEM_SHAREABILITY))
-   arch_setup_dma_ops(&mc_dev->dev, 0, 0, NULL, true);
-
/*
 * The device-specific probe callback will get invoked by device_add()
 */



Re: [PATCH 1/7 v6] Documentation: fsl-mc: add iommu-map device-tree binding for fsl-mc bus

2018-07-25 Thread Robin Murphy

On 09/07/18 12:18, Nipun Gupta wrote:

The existing IOMMU bindings cannot be used to specify the relationship
between fsl-mc devices and IOMMUs. This patch adds a generic binding for
mapping fsl-mc devices to IOMMUs, using iommu-map property.


No more nits from me :)

Acked-by: Robin Murphy 


Signed-off-by: Nipun Gupta 
Reviewed-by: Rob Herring 
---
  .../devicetree/bindings/misc/fsl,qoriq-mc.txt  | 39 ++
  1 file changed, 39 insertions(+)

diff --git a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt 
b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
index 6611a7c..01fdc33 100644
--- a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
+++ b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
@@ -9,6 +9,25 @@ blocks that can be used to create functional hardware 
objects/devices
  such as network interfaces, crypto accelerator instances, L2 switches,
  etc.
  
+For an overview of the DPAA2 architecture and fsl-mc bus see:

+Documentation/networking/dpaa2/overview.rst
+
+As described in the above overview, all DPAA2 objects in a DPRC share the
+same hardware "isolation context" and a 10-bit value called an ICID
+(isolation context id) is expressed by the hardware to identify
+the requester.
+
+The generic 'iommus' property is insufficient to describe the relationship
+between ICIDs and IOMMUs, so an iommu-map property is used to define
+the set of possible ICIDs under a root DPRC and how they map to
+an IOMMU.
+
+For generic IOMMU bindings, see
+Documentation/devicetree/bindings/iommu/iommu.txt.
+
+For arm-smmu binding, see:
+Documentation/devicetree/bindings/iommu/arm,smmu.txt.
+
  Required properties:
  
  - compatible

@@ -88,14 +107,34 @@ Sub-nodes:
Value type: 
Definition: Specifies the phandle to the PHY device node 
associated
with the this dpmac.
+Optional properties:
+
+- iommu-map: Maps an ICID to an IOMMU and associated iommu-specifier
+  data.
+
+  The property is an arbitrary number of tuples of
+  (icid-base,iommu,iommu-base,length).
+
+  Any ICID i in the interval [icid-base, icid-base + length) is
+  associated with the listed IOMMU, with the iommu-specifier
+  (i - icid-base + iommu-base).
  
  Example:
  
+smmu: iommu@500 {

+   compatible = "arm,mmu-500";
+   #iommu-cells = <1>;
+   stream-match-mask = <0x7C00>;
+   ...
+};
+
  fsl_mc: fsl-mc@80c00 {
  compatible = "fsl,qoriq-mc";
  reg = <0x0008 0x0c00 0 0x40>,/* MC portal base */
<0x 0x0834 0 0x4>; /* MC control reg */
  msi-parent = <&its>;
+/* define map for ICIDs 23-64 */
+iommu-map = <23 &smmu 23 41>;
  #address-cells = <3>;
  #size-cells = <1>;
  



Re: [PATCH] iommu/iova: Update cached node pointer when current node fails to get any free IOVA

2018-07-25 Thread Robin Murphy

On 12/07/18 08:45, Ganapatrao Kulkarni wrote:

Hi Robin,


On Mon, Jun 4, 2018 at 9:36 AM, Ganapatrao Kulkarni  wrote:

ping??

On Mon, May 21, 2018 at 6:45 AM, Ganapatrao Kulkarni  wrote:

On Thu, Apr 26, 2018 at 3:15 PM, Ganapatrao Kulkarni  wrote:

Hi Robin,

On Mon, Apr 23, 2018 at 11:11 PM, Ganapatrao Kulkarni
 wrote:

On Mon, Apr 23, 2018 at 10:07 PM, Robin Murphy  wrote:

On 19/04/18 18:12, Ganapatrao Kulkarni wrote:


The performance drop is observed with long hours iperf testing using 40G
cards. This is mainly due to long iterations in finding the free iova
range in 32bit address space.

In current implementation for 64bit PCI devices, there is always first
attempt to allocate iova from 32bit(SAC preferred over DAC) address
range. Once we run out 32bit range, there is allocation from higher range,
however due to cached32_node optimization it does not suppose to be
painful. cached32_node always points to recently allocated 32-bit node.
When address range is full, it will be pointing to last allocated node
(leaf node), so walking rbtree to find the available range is not
expensive affair. However this optimization does not behave well when
one of the middle node is freed. In that case cached32_node is updated
to point to next iova range. The next iova allocation will consume free
range and again update cached32_node to itself. From now on, walking
over 32-bit range is more expensive.

This patch adds fix to update cached node to leaf node when there are no
iova free range left, which avoids unnecessary long iterations.



The only trouble with this is that "allocation failed" doesn't uniquely mean
"space full". Say that after some time the 32-bit space ends up empty except
for one page at 0x1000 and one at 0x8000, then somebody tries to
allocate 2GB. If we move the cached node down to the leftmost entry when
that fails, all subsequent allocation attempts are now going to fail despite
the space being 99.% free!

I can see a couple of ways to solve that general problem of free space above
the cached node getting lost, but neither of them helps with the case where
there is genuinely insufficient space (and if anything would make it even
slower). In terms of the optimisation you want here, i.e. fail fast when an
allocation cannot possibly succeed, the only reliable idea which comes to
mind is free-PFN accounting. I might give that a go myself to see how ugly
it looks.


did you get any chance to look in to this issue?
i am waiting for your suggestion/patch for this issue!


I got as far as [1], but I wasn't sure how much I liked it, since it 
still seems a little invasive for such a specific case (plus I can't 
remember if it's actually been debugged or not). I think in the end I 
started wondering whether it's even worth bothering with the 32-bit 
optimisation for PCIe devices - 4 extra bytes worth of TLP is surely a 
lot less significant than every transaction taking up to 50% more bus 
cycles was for legacy PCI.


Robin.

[1] 
http://www.linux-arm.org/git?p=linux-rm.git;a=commitdiff;h=a8e0e4af10ebebb3669750e05bf0028e5bd6afe8


Re: [RFC PATCH 0/4] Stop losing firmware-set DMA masks

2018-07-11 Thread Robin Murphy

On 11/07/18 15:40, Rob Herring wrote:

On Tue, Jul 10, 2018 at 12:43 PM Robin Murphy  wrote:


Whilst the common firmware code invoked by dma_configure() initialises
devices' DMA masks according to limitations described by the respective
properties ("dma-ranges" for OF and _DMA/IORT for ACPI), the nature of
the dma_set_mask() API leads to that information getting lost when
well-behaved drivers probe and set a 64-bit mask, since in general
there's no way to tell the difference between a firmware-described mask
(which should be respected) and whatever default may have come from the
bus code (which should be replaced outright). This can break DMA on
systems with certain IOMMU topologies (e.g. [1]) where the IOMMU driver
only knows its maximum supported address size, not how many of those
address bits might actually be wired up between any of its input
interfaces and the associated DMA master devices. Similarly, some PCIe
root complexes only have a 32-bit native interface on their host bridge,
which leads to the same DMA-address-truncation problem in systems with a
larger physical memory map and RAM above 4GB (e.g. [2]).

These patches attempt to deal with this in the simplest way possible by
generalising the specific quirk for 32-bit bridges into an arbitrary
mask which can then also be plumbed into the firmware code. In the
interest of being minimally invasive, I've only included a point fix
for the IOMMU issue as seen on arm64 - there may be further tweaks
needed in DMA ops to catch all possible incarnations of this problem,
but this initial RFC is mostly about the impact beyond the dma-mapping
subsystem itself.


Couldn't you set and use the device's parent's dma_mask instead. At
least for DT, we should always have a parent device representing the
bus. That would avoid further bloating of struct device.


But then if the parent device did have a non-trivial driver which calls 
dma_set_mask(), we'd be back at square 1 :/


More realistically, I don't think that's viable for ACPI, at least with 
IORT, since the memory address size limit belongs to the endpoint 
itself, thus two devices with the same nominal parent in the Linux 
device model could still have different limits (where in DT you'd have 
to have to insert intermediate simple-bus nodes to model the same 
topology with dma-ranges). Plus either way it seems somewhat fragile for 
PCI where the host bridge may be some distance up the hierarchy.


Robin.


Re: [PATCH v5 08/13] iommu/rockchip: Control clocks needed to access the IOMMU

2018-01-24 Thread Robin Murphy

On 24/01/18 10:35, Jeffy Chen wrote:

From: Tomasz Figa 

Current code relies on master driver enabling necessary clocks before
IOMMU is accessed, however there are cases when the IOMMU should be
accessed while the master is not running yet, for example allocating
V4L2 videobuf2 buffers, which is done by the VB2 framework using DMA
mapping API and doesn't engage the master driver at all.

This patch fixes the problem by letting clocks needed for IOMMU
operation to be listed in Device Tree and making the driver enable them
for the time of accessing the hardware.

Signed-off-by: Jeffy Chen 
Signed-off-by: Tomasz Figa 
---

Changes in v5:
Use clk_bulk APIs.

Changes in v4: None
Changes in v3: None
Changes in v2: None

  .../devicetree/bindings/iommu/rockchip,iommu.txt   |  8 +++
  drivers/iommu/rockchip-iommu.c | 74 --
  2 files changed, 76 insertions(+), 6 deletions(-)

diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt 
b/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt
index 2098f7732264..33dd853359fa 100644
--- a/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt
+++ b/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt
@@ -14,6 +14,13 @@ Required properties:
  "single-master" device, and needs no additional 
information
  to associate with its master device.  See:
  Documentation/devicetree/bindings/iommu/iommu.txt
+Optional properties:
+- clocks : A list of master clocks requires for the IOMMU to be accessible


s/requires/required/


+   by the host CPU. The number of clocks depends on the master
+   block and might as well be zero. See [1] for generic clock


Oops, some subtleties of English here :)

To say "the number of clocks ... might as well be zero" effectively 
implies "there's no point ever specifying any clocks". I guess what you 
really mean here is "...might well be...", i.e. it is both valid and 
reasonably likely to require zero clocks.



+   bindings description.
+
+[1] Documentation/devicetree/bindings/clock/clock-bindings.txt
  
  Optional properties:

  - rockchip,disable-mmu-reset : Don't use the mmu reset operation.
@@ -27,5 +34,6 @@ Example:
reg = <0xff940300 0x100>;
interrupts = ;
interrupt-names = "vopl_mmu";
+   clocks = <&cru ACLK_VOP1>, <&cru DCLK_VOP1>, <&cru HCLK_VOP1>;
#iommu-cells = <0>;
};
diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index c4131ca792e0..8a5e2a659b67 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -4,6 +4,7 @@
   * published by the Free Software Foundation.
   */
  
+#include 

  #include 
  #include 
  #include 
@@ -91,6 +92,8 @@ struct rk_iommu {
struct device *dev;
void __iomem **bases;
int num_mmu;
+   struct clk_bulk_data *clocks;
+   int num_clocks;
bool reset_disabled;
struct iommu_device iommu;
struct list_head node; /* entry in rk_iommu_domain.iommus */
@@ -450,6 +453,38 @@ static int rk_iommu_force_reset(struct rk_iommu *iommu)
return 0;
  }
  
+static int rk_iommu_of_get_clocks(struct rk_iommu *iommu)

+{
+   struct device_node *np = iommu->dev->of_node;
+   int ret;
+   int i;
+
+   ret = of_count_phandle_with_args(np, "clocks", "#clock-cells");
+   if (ret == -ENOENT)
+   return 0;
+   else if (ret < 0)
+   return ret;
+
+   iommu->num_clocks = ret;
+   iommu->clocks = devm_kcalloc(iommu->dev, iommu->num_clocks,
+sizeof(*iommu->clocks), GFP_KERNEL);
+   if (!iommu->clocks)
+   return -ENOMEM;
+
+   for (i = 0; i < iommu->num_clocks; ++i) {
+   iommu->clocks[i].clk = of_clk_get(np, i);
+   if (IS_ERR(iommu->clocks[i].clk)) {
+   ret = PTR_ERR(iommu->clocks[i].clk);
+   goto err_clk_put;
+   }
+   }


Just to confirm my understanding from a quick scan through the code, the 
reason we can't use clk_bulk_get() here is that currently, clocks[i].id 
being NULL means we'd end up just getting the first clock multiple 
times, right?


I guess there could be other users who also want "just get whatever 
clocks I have" functionality, so it might be worth proposing that for 
the core API as a separate/follow-up patch, but it definitely doesn't 
need to be part of this series.


I really don't know enough about correct clk API usage, but modulo the 
binding comments it certainly looks nice and tidy now;


Acked-by: Robin Murphy 

Thanks,
Robin.


+
+   return 0;
+err_clk_put:
+   clk

Re: [PATCH v2] iommu/mediatek: Move attach_device after iommu-group is ready for M4Uv1

2018-01-24 Thread Robin Murphy

On 23/01/18 08:39, Yong Wu wrote:

In the commit 05f80300dc8b ("iommu: Finish making iommu_group support
mandatory"), the iommu framework has supposed all the iommu drivers have
their owner iommu-group, it get rid of the FIXME workarounds while the
group is NULL. But the flow of Mediatek M4U gen1 looks a bit trick that
it will hang at this case:

==
Unable to handle kernel NULL pointer dereference at virtual address 0030
PC is at mutex_lock+0x28/0x54
LR is at iommu_attach_device+0xa4/0xd4
pc : []lr : []psr: 6013
sp : df0edbb8  ip : df0edbc8  fp : df0edbc4
r10: c114da14  r9 : df2a3e40  r8 : 0003
r7 : df27a210  r6 : df2a90c4  r5 : 0030  r4 : 
r3 : df0f8000  r2 : f000  r1 : df29c610  r0 : 0030
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
xxx
(mutex_lock) from [] (iommu_attach_device+0xa4/0xd4)
(iommu_attach_device) from [] (__arm_iommu_attach_device+0x28/0x90)
(__arm_iommu_attach_device) from [] 
(arm_iommu_attach_device+0x1c/0x30)
(arm_iommu_attach_device) from [] (mtk_iommu_add_device+0xfc/0x214)
(mtk_iommu_add_device) from [] (add_iommu_group+0x3c/0x68)
(add_iommu_group) from [] (bus_for_each_dev+0x78/0xac)
(bus_for_each_dev) from [] (bus_set_iommu+0xb0/0xec)
(bus_set_iommu) from [] (mtk_iommu_probe+0x328/0x368)
(mtk_iommu_probe) from [] (platform_drv_probe+0x5c/0xc0)
(platform_drv_probe) from [] (driver_probe_device+0x2f4/0x4d8)
(driver_probe_device) from [] (__driver_attach+0x10c/0x128)
(__driver_attach) from [] (bus_for_each_dev+0x78/0xac)
(bus_for_each_dev) from [] (driver_attach+0x2c/0x30)
(driver_attach) from [] (bus_add_driver+0x1e0/0x278)
(bus_add_driver) from [] (driver_register+0x88/0x108)
(driver_register) from [] (__platform_driver_register+0x50/0x58)
(__platform_driver_register) from [] (m4u_init+0x24/0x28)
(m4u_init) from [] (do_one_initcall+0xf0/0x17c)
=

The root cause is that "arm_iommu_attach_device" is called before
"iommu_group_get_for_dev" in the interface "mtk_iommu_add_device". Thus,
We adjust the sequence of this two functions.

Unfortunately, there is another issue after the solution above, From the
function "iommu_attach_device", Only one device in each a iommu group is
allowed. In Mediatek case, there is only one m4u group, all the devices
are in one group. thus it get fail at this step.

In order to satisfy this requirement, a new iommu group is allocated for
each a iommu consumer device. But meanwhile, we still have to use the
same domain for all the iommu group. Use a global variable "mtk_domain_v1"
to save the global domain.


Argh, sorry for breaking it! Seems I managed to forget just how horrible 
and fiddly all the arm_iommu_* stuff is :(



CC: Robin Murphy 
CC: Honghui Zhang 
Fixes: 05f80300dc8b ("iommu: Finish making iommu_group support mandatory")
Reported-by: Ryder Lee 
Tested-by: Bibby Hsieh 
Signed-off-by: Yong Wu 
---
changes since v1:
Add mtk_domain_v1=NULL in domain_free for symmetry.

v1: https://patchwork.kernel.org/patch/10176255/
---
  drivers/iommu/mtk_iommu_v1.c | 42 +++---
  1 file changed, 19 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index 542930c..86106bf 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -103,6 +103,9 @@ struct mtk_iommu_domain {
struct mtk_iommu_data   *data;
  };
  
+/* There is only a iommu domain in M4U gen1. */

+static struct mtk_iommu_domain *mtk_domain_v1;
+
  static struct mtk_iommu_domain *to_mtk_domain(struct iommu_domain *dom)
  {
return container_of(dom, struct mtk_iommu_domain, domain);
@@ -251,10 +254,15 @@ static struct iommu_domain 
*mtk_iommu_domain_alloc(unsigned type)
if (type != IOMMU_DOMAIN_UNMANAGED)
return NULL;
  
+	/* Always return the same domain. */

+   if (mtk_domain_v1)
+   return &mtk_domain_v1->domain;


This seems a bit too fragile (and I vaguely recall we may have discussed 
and rejected this approach for the original driver), since any code doing:


unused = iommu_domain_alloc(bus);
iommu_domain_free(unused);

will pull the rug out from under everyone's feet in a very nasty and 
unexpected manner. Given that mtk_iommu_create_mapping() is already a 
giant workaround for the ARM DMA code not understanding groups and 
default domains, I'd prefer not to have to regress "correct" driver 
behaviour for the sake of that; how about something like the below diff, 
is that enough to make things work?


Robin.

->8-
diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index 542930cd183d..8b90b7a72238 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -376,6 +376,7 @@ static int mtk_iommu_create_mapping(struct device *dev,
  

Re: [PATCH v3] iommu/mediatek: Move attach_device after iommu-group is ready for M4Uv1

2018-01-25 Thread Robin Murphy

On 25/01/18 11:14, Yong Wu wrote:

In the commit 05f80300dc8b, the iommu framework has supposed all the
iommu drivers have their owner iommu-group, it get rid of the FIXME
workarounds while the group is NULL. But the flow of Mediatek M4U gen1
looks a bit trick that it will hang at this case:

==
Unable to handle kernel NULL pointer dereference at virtual address 0030
pgd = c0004000
[0030] *pgd=
PC is at mutex_lock+0x28/0x54
LR is at iommu_attach_device+0xa4/0xd4
pc : []lr : []psr: 6013
sp : df0edbb8  ip : df0edbc8  fp : df0edbc4
r10: c114da14  r9 : df2a3e40  r8 : 0003
r7 : df27a210  r6 : df2a90c4  r5 : 0030  r4 : 
r3 : df0f8000  r2 : f000  r1 : df29c610  r0 : 0030
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
xxx
(mutex_lock) from [] (iommu_attach_device+0xa4/0xd4)
(iommu_attach_device) from [] (__arm_iommu_attach_device+0x28/0x90)
(__arm_iommu_attach_device) from [] 
(arm_iommu_attach_device+0x1c/0x30)
(arm_iommu_attach_device) from [] (mtk_iommu_add_device+0xfc/0x214)
(mtk_iommu_add_device) from [] (add_iommu_group+0x3c/0x68)
(add_iommu_group) from [] (bus_for_each_dev+0x78/0xac)
(bus_for_each_dev) from [] (bus_set_iommu+0xb0/0xec)
(bus_set_iommu) from [] (mtk_iommu_probe+0x328/0x368)
(mtk_iommu_probe) from [] (platform_drv_probe+0x5c/0xc0)
(platform_drv_probe) from [] (driver_probe_device+0x2f4/0x4d8)
(driver_probe_device) from [] (__driver_attach+0x10c/0x128)
(__driver_attach) from [] (bus_for_each_dev+0x78/0xac)
(bus_for_each_dev) from [] (driver_attach+0x2c/0x30)
(driver_attach) from [] (bus_add_driver+0x1e0/0x278)
(bus_add_driver) from [] (driver_register+0x88/0x108)
(driver_register) from [] (__platform_driver_register+0x50/0x58)
(__platform_driver_register) from [] (m4u_init+0x24/0x28)
(m4u_init) from [] (do_one_initcall+0xf0/0x17c)
=

The root cause is that the device's iommu-group is NULL while
arm_iommu_attach_device is called. This patch prepare a new iommu-group
for the iommu consumer devices to fix this issue.

CC: Robin Murphy 
CC: Honghui Zhang 
Fixes: 05f80300dc8b ('iommu: Finish making iommu_group support mandatory')
Reported-by: Ryder Lee 
Signed-off-by: Yong Wu 
---
changes notes:
v3: don't use the global variable and allocate a new iommu group before
 arm_iommu_attach_device following Robin's suggestion.

v2: http://lists.infradead.org/pipermail/linux-mediatek/2018-January/011810.html
Add mtk_domain_v1=NULL in domain_free for symmetry.

v1: https://patchwork.kernel.org/patch/10176255/
---
  drivers/iommu/mtk_iommu_v1.c | 49 ++--
  1 file changed, 25 insertions(+), 24 deletions(-)

diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index 542930c..aca76d2 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -418,20 +418,12 @@ static int mtk_iommu_create_mapping(struct device *dev,
m4udev->archdata.iommu = mtk_mapping;
}
  
-	ret = arm_iommu_attach_device(dev, mtk_mapping);

-   if (ret)
-   goto err_release_mapping;
-
return 0;
-
-err_release_mapping:
-   arm_iommu_release_mapping(mtk_mapping);
-   m4udev->archdata.iommu = NULL;
-   return ret;
  }
  
  static int mtk_iommu_add_device(struct device *dev)

  {
+   struct dma_iommu_mapping *mtk_mapping;
struct of_phandle_args iommu_spec;
struct of_phandle_iterator it;
struct mtk_iommu_data *data;
@@ -452,9 +444,30 @@ static int mtk_iommu_add_device(struct device *dev)
if (!dev->iommu_fwspec || dev->iommu_fwspec->ops != &mtk_iommu_ops)
return -ENODEV; /* Not a iommu client device */
  
+	/*

+* This is a short-term bodge because the ARM DMA code doesn't
+* understand multi-device groups, but we have to call into it
+* successfully (and not just rely on a normal IOMMU API attach
+* here) in order to set the correct DMA API ops on @dev.
+*/
+   group = iommu_group_alloc();
+   if (IS_ERR(group))
+   return PTR_ERR(group);
+
+   err = iommu_group_add_device(group, dev);
+   iommu_group_put(group);
+   if (err)
+   return err;
+
data = dev->iommu_fwspec->iommu_priv;
-   iommu_device_link(&data->iommu, dev);
+   mtk_mapping = data->dev->archdata.iommu;
+   err = arm_iommu_attach_device(dev, mtk_mapping);
+   if (err) {
+   iommu_group_remove_device(dev);
+   return err;
+   }
  
+	iommu_device_link(&data->iommu, dev);

group = iommu_group_get_for_dev(dev);


This call now does nothing, so you may as well remove it (and the 
subsequent iommu_group_put)...



if (IS_ERR(group))
return PTR_ERR(group);
@@ -479,20 +492,8 @@ static void mtk_iommu_remove_device(struct

Re: [PATCH RFC 0/3] API for 128-bit IO access

2018-01-25 Thread Robin Murphy

On 25/01/18 11:38, Yury Norov wrote:

On Wed, Jan 24, 2018 at 11:28:55AM +0100, Arnd Bergmann wrote:

On Wed, Jan 24, 2018 at 10:05 AM, Yury Norov  wrote:

This series adds API for 128-bit memory IO access and enables it for ARM64.
The original motivation for 128-bit API came from new Cavium network device
driver. The hardware requires 128-bit access to make things work. See
description in patch 3 for details.


We might also want to do something similar to the
include/linux/io-64-nonatomic-lo-hi.h
and hi-lo.h files, to simulate 128-bit access using pairs of 64-bit access on
other targets. It's apparently driver specific which half you need to do first
to make it work, so we need both.


OK, will do.
  

Also, starting from ARMv8.4, stp and ldp instructions become atomic, and
API for 128-bit access would be helpful in core arm64 code.

This series is RFC. I'd like to collect opinions on idea and implementation
details.
* I didn't implement all 128-bit operations existing for 64-bit variables
and other types (__swab128p etc). Do we need them all right now, or we
can add them when actually needed?


I think in this case it's better to do them all at once.


Ack.


* u128 name is already used in crypto code. So here I use __uint128_t that
comes from GCC for 128-bit types. Should I rename existing type in crypto
and make core code for 128-bit variables consistent with u64, u32 etc? (I
think yes, but would like to ask crypto people for it.)


Hmm, that file probably predates the __uint128_t support. My guess would
be that the crypto code using it can actually benefit from the new types as
well, so maybe move the existing file to include/linux/int128.h and add
an #if/#else logic to it so we use 'typedef __uint128_t __u128' if that
is available.


It sounds reasonable. But I worry about testing that changes. Hope,
crypto community will help with it.


* Some compilers don't support __uint128_t, so I protected all generic code
with config option HAVE_128BIT_ACCESS. I think it's OK, but...


That would be nicely solved by using the #if/#else definition above.


* For 128-bit read/write functions I take suffix 'o', which means read/write
the octet of bytes. Is this name OK?


Can't think of anything better. It's not an octet though, but 16 bytes
('q' is for quadword, meaning four 16-bit words in Intel terminology).


Ah, sure. Octet of words. Will change it.


* my mips-linux-gnu-gcc v6.3.0 doesn't support __uint128_t, and I
don't have other BE setup on hand, so BE case is formally not tested.
BE code for arm64 is looking well though.


I've run it through my collection of compilers, it seems that most but not
all 64-bit targets support it (exceptions appear to be older versions of
gcc for s390x and parisc), and none of the 32-bit targets do:


Thanks for doing this test. Looking at this I realize that this is
not the architecture feature but compiler feature. So if we add
128-bit interface, it would be reasonable to add it for all targets
that compiled with toolchain supporting 128-bit accesses.

There's already the option ARCH_SUPPORTS_INT128 that is enabled for
x86_64 in arch/x86/Kconfig and conditionally enabled for arm64 in
arch/arm64/Makefile:
KBUILD_CFLAGS+= $(call cc-ifversion, -ge, 0500, 
-DCONFIG_ARCH_SUPPORTS_INT128)

It is used in include/linux/math64.h and in lib/ubsan.c, not so wide.

So I find things little messed. Crypto code ignores compilers' ability
to operate with 128-bit numbers. Ubsan and math64 relies on compiler
version (at least for arm64, and I doubt it would work correctly with clang).
And now I introduce HAVE_128BIT_ACCESS with the same meaning for memory
access.

I think it's time to unify 128-bit code:
  - enable CONFIG_ARCH_SUPPORTS_INT128 if compiler supports it, ie check
it like you do below;
  - declare u128 as structure or typedef depending on ARCH_SUPPORTS_INT128
in generic include/linux/int128.h, as you suggest here;
  - switch this series to ARCH_SUPPORTS_INT128.

Does it sound reasonable?


It probably is about time to formalise the current scattered fragments 
of uint_128_t support, but to reiterate Will's comment it is almost 
certainly not worth the effort to implement 'generic' I/O accessors 
which only work under implementation-defined and undiscoverable hardware 
conditions, and will be unusable on the overwhelming majority of 
systems. Just open-code LDP/STP accessors in the one driver which needs 
them and (by definition) only runs on SoCs where they *are* known to 
work correctly.


Robin.



Yury


$ for i in /home/arnd/cross-gcc/bin/*gcc-[3-8]* ; do echo -n $i" " ;
echo '__uint128_t v;' | $i -xc -S - -o /dev/null && echo ok ; done
/home/arnd/cross-gcc/bin/aarch64-linux-gcc-4.8.5 ok
/home/arnd/cross-gcc/bin/aarch64-linux-gcc-4.9.3 ok
/home/arnd/cross-gcc/bin/aarch64-linux-gcc-4.9.4 ok
/home/arnd/cross-gcc/bin/aarch64-linux-gcc-5.2.1 ok
/home/arnd/cross-gcc/bin/aarch64-linux-gcc-5.4.1 ok
/home/arnd/cross-gcc/bin/aarch64-linux-gcc-5.5.0 ok
/home/arnd/cross-

[PATCH 1/2] acpica: iort: Update SMMU models for IORT rev. C

2017-05-12 Thread Robin Murphy
IORT revision C has been published with a number of new SMMU
implementation identifiers; define them.

CC: Rafael J. Wysocki 
CC: Robert Moore 
CC: Lv Zheng 
Signed-off-by: Robin Murphy 
---
 include/acpi/actbl2.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h
index 7aee9fb3bd1f..0242be07f292 100644
--- a/include/acpi/actbl2.h
+++ b/include/acpi/actbl2.h
@@ -777,6 +777,8 @@ struct acpi_iort_smmu {
 #define ACPI_IORT_SMMU_V2   0x0001 /* Generic SMMUv2 */
 #define ACPI_IORT_SMMU_CORELINK_MMU400  0x0002 /* ARM Corelink MMU-400 
*/
 #define ACPI_IORT_SMMU_CORELINK_MMU500  0x0003 /* ARM Corelink MMU-500 
*/
+#define ACPI_IORT_SMMU_CORELINK_MMU401  0x0004 /* ARM Corelink MMU-401 
*/
+#define ACPI_IORT_SMMU_CAVIUM_SMMUV20x0005 /* Cavium ThunderX 
SMMUv2 */
 
 /* Masks for Flags field above */
 
@@ -795,6 +797,12 @@ struct acpi_iort_smmu_v3 {
u32 sync_gsiv;
 };
 
+/* Values for Model field above */
+
+#define ACPI_IORT_SMMU_V3   0x /* Generic SMMUv3 */
+#define ACPI_IORT_SMMU_HISILICON_HI161X 0x0001 /* HiSilicon Hi161x 
SMMUv3 */
+#define ACPI_IORT_SMMU_CAVIUM_CN99XX0x0002 /* Cavium CN99xx SMMUv3 
*/
+
 /* Masks for Flags field above */
 
 #define ACPI_IORT_SMMU_V3_COHACC_OVERRIDE   (1)
-- 
2.12.2.dirty



[PATCH 2/2] iommu/arm-smmu: Plumb in new ACPI identifiers

2017-05-12 Thread Robin Murphy
Revision C of IORT now allows us to identify ARM MMU-401 and the Cavium
ThunderX implementation; wire them up so that the appropriate quirks get
enabled when booting with ACPI.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm-smmu.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 6dadd51d486c..d9ec840defc9 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -2018,6 +2018,10 @@ static int acpi_smmu_get_data(u32 model, struct 
arm_smmu_device *smmu)
smmu->version = ARM_SMMU_V1;
smmu->model = GENERIC_SMMU;
break;
+   case ACPI_IORT_SMMU_CORELINK_MMU401:
+   smmu->version = ARM_SMMU_V1_64K;
+   smmu->model = GENERIC_SMMU;
+   break;
case ACPI_IORT_SMMU_V2:
smmu->version = ARM_SMMU_V2;
smmu->model = GENERIC_SMMU;
@@ -2026,6 +2030,10 @@ static int acpi_smmu_get_data(u32 model, struct 
arm_smmu_device *smmu)
smmu->version = ARM_SMMU_V2;
smmu->model = ARM_MMU500;
break;
+   case ACPI_IORT_SMMU_CAVIUM_SMMUV2:
+   smmu->version = ARM_SMMU_V2;
+   smmu->model = CAVIUM_SMMUV2;
+   break;
default:
ret = -ENODEV;
}
-- 
2.12.2.dirty



Re: [patch 11/18] iommu/of: Adjust system_state check

2017-05-15 Thread Robin Murphy
On 14/05/17 19:27, Thomas Gleixner wrote:
> To enable smp_processor_id() and might_sleep() debug checks earlier, it's
> required to add system states between SYSTEM_BOOTING and SYSTEM_RUNNING.
> 
> Adjust the system_state check in of_iommu_driver_present() to handle the
> extra states.

FWIW,

Acked-by: Robin Murphy 

> Signed-off-by: Thomas Gleixner 
> Cc: Joerg Roedel 
> Cc: io...@lists.linux-foundation.org
> ---
>  drivers/iommu/of_iommu.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -103,7 +103,7 @@ static bool of_iommu_driver_present(stru
>* it never will be. We don't want to defer indefinitely, nor attempt
>* to dereference __iommu_of_table after it's been freed.
>*/
> - if (system_state > SYSTEM_BOOTING)
> + if (system_state >= SYSTEM_RUNNING)
>   return false;
>  
>   return of_match_node(&__iommu_of_table, np);
> 



Re: [PATCH 27/33] dma-direct: use node local allocations for coherent memory

2018-01-10 Thread Robin Murphy

On 10/01/18 15:30, Christoph Hellwig wrote:

On Wed, Jan 10, 2018 at 12:06:22PM +, Robin Murphy wrote:

On 10/01/18 08:00, Christoph Hellwig wrote:

To preserve the x86 behavior.


And combined with patch 10/22 of the SWIOTLB refactoring, this means
SWIOTLB allocations will also end up NUMA-aware, right? Great, that's what
we want on arm64 too :)


Well, only for swiotlb allocations that can be satisfied by
dma_direct_alloc.  If we actually have to fall back to the swiotlb
buffers there is not node affinity yet.


Yeah, when I looked into it I reached the conclusion that per-node 
bounce buffers probably weren't worth it - if you have to bounce you've 
already pretty much lost the performance game, and if the CPU doing the 
bouncing happens to be on a different node from the device you've 
certainly lost either way. Per-node CMA zones we definitely *would* 
like, but that's a future problem (it looks technically feasible without 
huge infrastructure changes, but fiddly).


Robin.


Re: [PATCH 31/33] dma-direct: reject too small dma masks

2018-01-10 Thread Robin Murphy



On 10/01/18 15:32, Christoph Hellwig wrote:

On Wed, Jan 10, 2018 at 11:49:34AM +, Robin Murphy wrote:

+#ifdef CONFIG_ZONE_DMA
+   if (mask < DMA_BIT_MASK(ARCH_ZONE_DMA_BITS))
+   return 0;
+#else
+   /*
+* Because 32-bit DMA masks are so common we expect every architecture
+* to be able to satisfy them - either by not supporting more physical
+* memory, or by providing a ZONE_DMA32.  If neither is the case, the
+* architecture needs to use an IOMMU instead of the direct mapping.
+*/
+   if (mask < DMA_BIT_MASK(32))
+   return 0;


Do you think it's worth the effort to be a little more accommodating here?
i.e.:

return dma_max_pfn(dev) >= max_pfn;

We seem to have a fair few 28-31 bit masks for older hardware which
probably associates with host systems packing equivalently small amounts of
RAM.


And those devices don't have a ZONE_DMA?  I think we could do something
like that, but I'd rather have it as a separate commit with a good
explanation.  Maybe you can just send on on top of the series?


Good point - other than the IXP4xx platform and possibly the Broadcom 
network drivers, it's probably only x86-relevant stuff where the concern 
is moot. Let's just keep the simple assumption then, until actually 
proven otherwise.


Robin.


Re: [PATCH 10/22] swiotlb: refactor coherent buffer allocation

2018-01-10 Thread Robin Murphy

On 10/01/18 15:46, Christoph Hellwig wrote:

On Wed, Jan 10, 2018 at 12:22:18PM +, Robin Murphy wrote:

+   if (phys_addr == SWIOTLB_MAP_ERROR)
+   goto out_warn;
   -/* Confirm address can be DMA'd by device */
-   if (dev_addr + size - 1 > dma_mask) {
-   printk("hwdev DMA mask = 0x%016Lx, dev_addr = 
0x%016Lx\n",
-  (unsigned long long)dma_mask,
-  (unsigned long long)dev_addr);
+   *dma_handle = swiotlb_phys_to_dma(dev, phys_addr);


nit: this should probably go after the dma_coherent_ok() check (as with the
original logic).


But the originall logic also needs the dma_addr_t for the
dma_coherent_ok check:

dev_addr = swiotlb_phys_to_dma(hwdev, paddr);
/* Confirm address can be DMA'd by device */
if (dev_addr + size - 1 > dma_mask) {
...
goto err_warn;
}

or do you mean assining to *dma_handle?  The dma_handle is not
valid for a failure return, so I don't think this should matter.


Yeah, only the assignment - as I said, it's just a stylistic nit; no big 
deal either way.



+   if (ret) {
+   *dma_handle = swiotlb_virt_to_bus(hwdev, ret);
+   if (dma_coherent_ok(hwdev, *dma_handle, size)) {
+   memset(ret, 0, size);
+   return ret;
+   }


Aren't we leaking the pages here?


Yes, that free_pages got lost somewhere in the rebases, I've added
it back.


Cool.

Robin.


Re: [PATCH 21/22] arm64: replace ZONE_DMA with ZONE_DMA32

2018-01-10 Thread Robin Murphy

On 10/01/18 15:55, Christoph Hellwig wrote:

On Wed, Jan 10, 2018 at 04:55:17PM +0100, Christoph Hellwig wrote:

On Wed, Jan 10, 2018 at 12:58:14PM +, Robin Murphy wrote:

On 10/01/18 08:09, Christoph Hellwig wrote:

arm64 uses ZONE_DMA for allocations below 32-bits.  These days we
name the zone for that ZONE_DMA32, which will allow to use the
dma-direct and generic swiotlb code as-is, so rename it.


I do wonder if we could also "upgrade" GFP_DMA to GFP_DMA32 somehow when
!ZONE_DMA - there are almost certainly arm64 drivers out there using a
combination of GFP_DMA and streaming mappings which will no longer get the
guaranteed 32-bit addresses they expect after this. I'm not sure quite how
feasible that is, though :/


I can't find anything obvious in the tree. The alternative would be
to keep ZONE_DMA and set ARCH_ZONE_DMA_BITS.


That said, I do agree that this is an appropriate change (the legacy of
GFP_DMA is obviously horrible), so, provided we get plenty of time to find
and fix the fallout when it lands:

Reviewed-by: Robin Murphy 


I was hoping to get this into 4.15.  What would be proper time to
fix the fallout?


Err, 4.16 of course.


Hee hee - cramming it into 4.15 is exactly what I wouldn't want to do, 
even if Linus would accept it :)


Landing it this merge window for 4.16-rc1 sounds good if we can manage that.

Robin.


Re: [PATCH 08/22] swiotlb: wire up ->dma_supported in swiotlb_dma_ops

2018-01-10 Thread Robin Murphy

On 10/01/18 15:35, Christoph Hellwig wrote:

On Wed, Jan 10, 2018 at 12:16:15PM +, Robin Murphy wrote:

On 10/01/18 08:09, Christoph Hellwig wrote:

To properly reject too small DMA masks based on the addressability of the
bounce buffer.


I reckon this is self-evident enough that it should simply be squashed into
the previous patch.


x86 didn't wire it up before, so I want a clear blaimpoint for this
change instead of mixing it up.
That almost makes sense, if x86 were using this generic swiotlb_dma_ops 
already. AFAICS it's only ia64, unicore and tile who end up using it, 
and they all had swiotlb_dma_supported hooked up to begin with. Am I 
missing something?


If regressions are going to happen, they'll surely point at whichever 
commit pulls the ops into the relevant arch code - there doesn't seem to 
be a great deal of value in having a piecemeal history of said ops 
*before* that point.


Robin.


Re: [PATCH 1/9] iommu/of: Drop early initialisation hooks

2018-01-11 Thread Robin Murphy

Hi Jeffy,

On 11/01/18 11:14, JeffyChen wrote:

Hi Marek,

Thanks for your reply.

On 01/11/2018 05:40 PM, Marek Szyprowski wrote:

Hi Jeffy,

On 2018-01-11 09:22, Jeffy Chen wrote:

With the probe-deferral mechanism, early initialisation hooks are no
longer needed.

Suggested-by: Robin Murphy 


In fact, shortly after I said that I had a "how hard can it be?" moment 
and took a crack at it myself - sorry, I should probably have cc'd you 
on that series[1].



Signed-off-by: Jeffy Chen 
---

  drivers/iommu/arm-smmu-v3.c  |  2 +-
  drivers/iommu/arm-smmu.c | 12 ++--
  drivers/iommu/exynos-iommu.c |  2 +-


For Exynos IOMMU:
Acked-by: Marek Szyprowski 

IPMMU and MSM IOMMU are no longer multi-platform safe after this patch.
It breaks them in the same way as my commit 928055a01b3f ("iommu/exynos:
Remove custom platform device registration code") broke Exynos IOMMU.

You need a similar fix for them:
https://www.spinics.net/lists/arm-kernel/msg627648.html


hmmm, right, i did saw this fix in the rockchip iommu driver too.

and there're also some other iommu drivers put bus_set_iommu in their 
probe() to avoid that.


maybe we can do it in the iommu framework?

for example:
1/ add a bus type member to struct iommu_device
2/ and a iommu_device_set_bus()
3/ do the bus_set_iommu stuff in iommu_device_register()
4/ undo bus_set_iommu in iommu_device_unregister()


Ultimately we'd like to get rid of the bus relationship altogether, so I 
don't think it's really worth adding more infrastructure around it. 
Having of-iommu-based drivers set bus ops at probe time, and others 
conditionally from an initcall, is pretty clean and simple, so I'd 
rather stick with that approach for now.


Robin.

[1] 
https://lists.linuxfoundation.org/pipermail/iommu/2018-January/025395.html


Re: [PATCH] IIO: ADC: stm32-dfsdm: avoid unused-variable warning

2018-01-11 Thread Robin Murphy

On 11/01/18 10:34, Arnd Bergmann wrote:

Building with CONFIG_OF disabled produces a compiler warning:

drivers/iio/adc/stm32-dfsdm-core.c: In function 'stm32_dfsdm_probe':
drivers/iio/adc/stm32-dfsdm-core.c:245:22: error: unused variable 'pnode' 
[-Werror=unused-variable]

This removes the variable and open-codes it in the only place
it gets used to avoid that warning.

Fixes: bed73904e76f ("IIO: ADC: add stm32 DFSDM core support")
Signed-off-by: Arnd Bergmann 
---
  drivers/iio/adc/stm32-dfsdm-core.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/iio/adc/stm32-dfsdm-core.c 
b/drivers/iio/adc/stm32-dfsdm-core.c
index 72427414db7f..6cd655f8239b 100644
--- a/drivers/iio/adc/stm32-dfsdm-core.c
+++ b/drivers/iio/adc/stm32-dfsdm-core.c
@@ -242,7 +242,6 @@ MODULE_DEVICE_TABLE(of, stm32_dfsdm_of_match);
  static int stm32_dfsdm_probe(struct platform_device *pdev)
  {
struct dfsdm_priv *priv;
-   struct device_node *pnode = pdev->dev.of_node;
const struct of_device_id *of_id;
const struct stm32_dfsdm_dev_data *dev_data;
struct stm32_dfsdm *dfsdm;
@@ -254,7 +253,7 @@ static int stm32_dfsdm_probe(struct platform_device *pdev)
  
  	priv->pdev = pdev;
  
-	of_id = of_match_node(stm32_dfsdm_of_match, pnode);

+   of_id = of_match_node(stm32_dfsdm_of_match, pdev->dev.of_node);
if (!of_id->data) {
dev_err(&pdev->dev, "Data associated to device is missing\n");
return -EINVAL;


FWIW, it looks like this whole lot could be cleaned up by using 
of_device_get_match_data().


Robin.


Re: [PATCH 2/9] iommu/rockchip: Fix error handling in attach

2018-01-11 Thread Robin Murphy

On 11/01/18 08:22, Jeffy Chen wrote:

From: Tomasz Figa 

Currently if the driver encounters an error while attaching device, it
will leave the IOMMU in an inconsistent state. Even though it shouldn't
really happen in reality, let's just add proper error path to keep
things consistent.

Signed-off-by: Tomasz Figa 
Signed-off-by: Jeffy Chen 
---

  drivers/iommu/rockchip-iommu.c | 26 +-
  1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index 9d991c2d8767..ee805e1dfba7 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -826,17 +826,10 @@ static int rk_iommu_attach_device(struct iommu_domain 
*domain,
  
  	ret = rk_iommu_force_reset(iommu);

if (ret)
-   return ret;
+   goto err_disable_stall;
  
  	iommu->domain = domain;
  
-	for (i = 0; i < iommu->num_irq; i++) {

-   ret = devm_request_irq(iommu->dev, iommu->irq[i], rk_iommu_irq,
-  IRQF_SHARED, dev_name(dev), iommu);
-   if (ret)
-   return ret;
-   }
-
for (i = 0; i < iommu->num_mmu; i++) {
rk_iommu_write(iommu->bases[i], RK_MMU_DTE_ADDR,
   rk_domain->dt_dma);
@@ -844,9 +837,16 @@ static int rk_iommu_attach_device(struct iommu_domain 
*domain,
rk_iommu_write(iommu->bases[i], RK_MMU_INT_MASK, 
RK_MMU_IRQ_MASK);
}
  
+	for (i = 0; i < iommu->num_irq; i++) {

+   ret = devm_request_irq(iommu->dev, iommu->irq[i], rk_iommu_irq,
+  IRQF_SHARED, dev_name(dev), iommu);


Why aren't we simply requesting the IRQ once in rk_iommu_probe()? Given 
that the hardware doesn't handle multiple translation contexts, there 
doesn't seem to be much point in being this dynamic about it.


Robin.


+   if (ret)
+   goto err_free_irq;
+   }
+
ret = rk_iommu_enable_paging(iommu);
if (ret)
-   return ret;
+   goto err_free_irq;
  
  	spin_lock_irqsave(&rk_domain->iommus_lock, flags);

list_add_tail(&iommu->node, &rk_domain->iommus);
@@ -857,6 +857,14 @@ static int rk_iommu_attach_device(struct iommu_domain 
*domain,
rk_iommu_disable_stall(iommu);
  
  	return 0;

+
+err_free_irq:
+   while (i--)
+   devm_free_irq(iommu->dev, iommu->irq[i], iommu);
+err_disable_stall:
+   rk_iommu_disable_stall(iommu);
+
+   return ret;
  }
  
  static void rk_iommu_detach_device(struct iommu_domain *domain,




Re: [PATCH v2 10/16] arm64: KVM: Report SMCCC_ARCH_WORKAROUND_1 BP hardening support

2018-01-30 Thread Robin Murphy

On 29/01/18 17:45, Marc Zyngier wrote:

A new feature of SMCCC 1.1 is that it offers firmware-based CPU
workarounds. In particular, SMCCC_ARCH_WORKAROUND_1 provides
BP hardening for CVE-2017-5715.

If the host has some mitigation for this issue, report that
we deal with it using SMCCC_ARCH_WORKAROUND_1, as we apply the
host workaround on every guest exit.

Signed-off-by: Marc Zyngier 
---
  include/linux/arm-smccc.h |  5 +
  virt/kvm/arm/psci.c   | 17 +++--
  2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
index dc68aa5a7261..e1ef944ef1da 100644
--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
@@ -73,6 +73,11 @@
   ARM_SMCCC_SMC_32,\
   0, 1)
  
+#define ARM_SMCCC_ARCH_WORKAROUND_1	\

+   ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+  ARM_SMCCC_SMC_32,\
+  0, 0x8000)
+
  #ifndef __ASSEMBLY__
  
  #include 

diff --git a/virt/kvm/arm/psci.c b/virt/kvm/arm/psci.c
index a021b62ed762..5677d16abc71 100644
--- a/virt/kvm/arm/psci.c
+++ b/virt/kvm/arm/psci.c
@@ -407,14 +407,27 @@ static int kvm_psci_call(struct kvm_vcpu *vcpu)
  int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
  {
u32 func_id = smccc_get_function(vcpu);
-   u32 val;
+   u32 val, feature;
  
  	switch (func_id) {

case ARM_SMCCC_VERSION_FUNC_ID:
val = ARM_SMCCC_VERSION_1_1;
break;
case ARM_SMCCC_ARCH_FEATURES_FUNC_ID:
-   val = -1;   /* Nothing supported yet */


Conceptually, might it still make sense to initialise val to 
NOT_SUPPORTED here, then overwrite it if and when a feature actually is 
present? It would in this case save a few lines as well, but I know 
multiple assignment can be one of those religious issues, so I'm not too 
fussed either way.


Robin.


+   feature = smccc_get_arg1(vcpu);
+   switch(feature) {
+#ifdef CONFIG_ARM64
+   case ARM_SMCCC_ARCH_WORKAROUND_1:
+   if (cpus_have_const_cap(ARM64_HARDEN_BRANCH_PREDICTOR))
+   val = 0;
+   else
+   val = -1;
+   break;
+#endif
+   default:
+   val = -1;
+   break;
+   }
break;
default:
return kvm_psci_call(vcpu);



Re: [PATCH v2 13/16] firmware/psci: Expose SMCCC version through psci_ops

2018-01-30 Thread Robin Murphy

On 29/01/18 17:45, Marc Zyngier wrote:

Since PSCI 1.0 allows the SMCCC version to be (indirectly) probed,
let's do that at boot time, and expose the version of the calling
convention as part of the psci_ops structure.

Signed-off-by: Marc Zyngier 
---
  drivers/firmware/psci.c | 21 +
  include/linux/psci.h|  6 ++
  2 files changed, 27 insertions(+)

diff --git a/drivers/firmware/psci.c b/drivers/firmware/psci.c
index e9493da2b111..dd035aaa1c33 100644
--- a/drivers/firmware/psci.c
+++ b/drivers/firmware/psci.c
@@ -511,6 +511,26 @@ static void __init psci_init_migrate(void)
pr_info("Trusted OS resident on physical CPU 0x%lx\n", cpuid);
  }
  
+static void __init psci_init_smccc(u32 ver)

+{
+   int feature = PSCI_RET_NOT_SUPPORTED;
+
+   if (PSCI_VERSION_MAJOR(ver) >= 1)
+   feature = psci_features(ARM_SMCCC_VERSION_FUNC_ID);
+
+   if (feature == PSCI_RET_NOT_SUPPORTED) {
+   psci_ops.variant = SMCCC_VARIANT_1_0;


Presumably at some point in the future we might want to update PSCI 
itself to use 'fast' calls if available, at which point this 
initialisation is a bit backwards given that you've already made a call 
using *some* convention. Even I think relying on the enum value working 
out OK is a bit subtle, so perhaps it's best to default-initialise 
psci_ops.variant to 1.0 (either statically, or dynamically before the 
first call), then only update it to 1.1 if and when we discover that.


Robin.


+   } else {
+   ver = invoke_psci_fn(ARM_SMCCC_VERSION_FUNC_ID, 0, 0, 0);
+   if (ver != ARM_SMCCC_VERSION_1_1)
+   psci_ops.variant = SMCCC_VARIANT_1_0;
+   else
+   psci_ops.variant = SMCCC_VARIANT_1_1;
+   }
+
+   pr_info("SMC Calling Convention v1.%d\n", psci_ops.variant);
+}
+
  static void __init psci_0_2_set_functions(void)
  {
pr_info("Using standard PSCI v0.2 function IDs\n");
@@ -557,6 +577,7 @@ static int __init psci_probe(void)
psci_0_2_set_functions();
  
  	psci_init_migrate();

+   psci_init_smccc(ver);
  
  	if (PSCI_VERSION_MAJOR(ver) >= 1) {

psci_init_cpu_suspend();
diff --git a/include/linux/psci.h b/include/linux/psci.h
index f2679e5faa4f..83fd16a37be3 100644
--- a/include/linux/psci.h
+++ b/include/linux/psci.h
@@ -31,6 +31,11 @@ enum psci_conduit {
PSCI_CONDUIT_HVC,
  };
  
+enum smccc_variant {

+   SMCCC_VARIANT_1_0,
+   SMCCC_VARIANT_1_1,
+};
+
  struct psci_operations {
u32 (*get_version)(void);
int (*cpu_suspend)(u32 state, unsigned long entry_point);
@@ -41,6 +46,7 @@ struct psci_operations {
unsigned long lowest_affinity_level);
int (*migrate_info_type)(void);
enum psci_conduit conduit;
+   enum smccc_variant variant;
  };
  
  extern struct psci_operations psci_ops;




Re: [PATCH v2 04/16] arm/arm64: KVM: Add PSCI_VERSION helper

2018-01-30 Thread Robin Murphy

On 29/01/18 17:45, Marc Zyngier wrote:

As we're about to trigger a PSCI version explosion, it doesn't
hurt to introduce a PSCI_VERSION helper that is going to be
used everywhere.

Signed-off-by: Marc Zyngier 
---
  include/kvm/arm_psci.h | 5 +++--
  virt/kvm/arm/psci.c| 2 +-
  2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/kvm/arm_psci.h b/include/kvm/arm_psci.h
index 2042bb909474..3a408c846c09 100644
--- a/include/kvm/arm_psci.h
+++ b/include/kvm/arm_psci.h
@@ -18,8 +18,9 @@
  #ifndef __KVM_ARM_PSCI_H__
  #define __KVM_ARM_PSCI_H__
  
-#define KVM_ARM_PSCI_0_1	1

-#define KVM_ARM_PSCI_0_2   2
+#define PSCI_VERSION(x,y)  x) & 0x7fff) << 16) | ((y) & 0x))


I see virt/kvm/arm/psci.c already pulls in uapi/linux/psci.h, so maybe 
this guy could go in there alongside the other PSCI_VERSION_* gubbins?


Robin.


+#define KVM_ARM_PSCI_0_1   PSCI_VERSION(0, 1)
+#define KVM_ARM_PSCI_0_2   PSCI_VERSION(0, 2)
  
  int kvm_psci_version(struct kvm_vcpu *vcpu);

  int kvm_psci_call(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/arm/psci.c b/virt/kvm/arm/psci.c
index b322e46fd142..c00bb324e14e 100644
--- a/virt/kvm/arm/psci.c
+++ b/virt/kvm/arm/psci.c
@@ -222,7 +222,7 @@ static int kvm_psci_0_2_call(struct kvm_vcpu *vcpu)
 * Bits[31:16] = Major Version = 0
 * Bits[15:0] = Minor Version = 2
 */
-   val = 2;
+   val = KVM_ARM_PSCI_0_2;
break;
case PSCI_0_2_FN_CPU_SUSPEND:
case PSCI_0_2_FN64_CPU_SUSPEND:



Re: [PATCH v6 2/6] iommu/arm-smmu: Add pm_runtime/sleep ops

2018-01-31 Thread Robin Murphy

On 19/01/18 11:43, Vivek Gautam wrote:

From: Sricharan R 

The smmu needs to be functional only when the respective
master's using it are active. The device_link feature
helps to track such functional dependencies, so that the
iommu gets powered when the master device enables itself
using pm_runtime. So by adapting the smmu driver for
runtime pm, above said dependency can be addressed.

This patch adds the pm runtime/sleep callbacks to the
driver and also the functions to parse the smmu clocks
from DT and enable them in resume/suspend.

Signed-off-by: Sricharan R 
Signed-off-by: Archit Taneja 
[vivek: Clock rework to request bulk of clocks]
Signed-off-by: Vivek Gautam 
---
  drivers/iommu/arm-smmu.c | 55 ++--
  1 file changed, 53 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 78d4c6b8f1ba..21acffe91a1c 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -48,6 +48,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  
@@ -205,6 +206,9 @@ struct arm_smmu_device {

u32 num_global_irqs;
u32 num_context_irqs;
unsigned int*irqs;
+   struct clk_bulk_data*clocks;
+   int num_clks;
+   const char * const  *clk_names;


This seems unnecessary, as we use it a grand total of of once, during 
initialisation when we have the source data directly to hand. Just pass 
data->clks into arm_smmu_init_clks() as an additional argument.


Otherwise, I think this looks reasonable; it's about as unobtrusive as 
it's going to get.


Robin.


u32 cavium_id_base; /* Specific to Cavium */
  
@@ -1685,6 +1689,25 @@ static int arm_smmu_id_size_to_bits(int size)

}
  }
  
+static int arm_smmu_init_clocks(struct arm_smmu_device *smmu)

+{
+   int i;
+   int num = smmu->num_clks;
+
+   if (num < 1)
+   return 0;
+
+   smmu->clocks = devm_kcalloc(smmu->dev, num,
+   sizeof(*smmu->clocks), GFP_KERNEL);
+   if (!smmu->clocks)
+   return -ENOMEM;
+
+   for (i = 0; i < num; i++)
+   smmu->clocks[i].id = smmu->clk_names[i];
+
+   return devm_clk_bulk_get(smmu->dev, num, smmu->clocks);
+}
+
  static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
  {
unsigned long size;
@@ -1897,10 +1920,12 @@ static int arm_smmu_device_cfg_probe(struct 
arm_smmu_device *smmu)
  struct arm_smmu_match_data {
enum arm_smmu_arch_version version;
enum arm_smmu_implementation model;
+   const char * const *clks;
+   int num_clks;
  };
  
  #define ARM_SMMU_MATCH_DATA(name, ver, imp)	\

-static struct arm_smmu_match_data name = { .version = ver, .model = imp }
+static const struct arm_smmu_match_data name = { .version = ver, .model = imp }
  
  ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);

  ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
@@ -2001,6 +2026,8 @@ static int arm_smmu_device_dt_probe(struct 
platform_device *pdev,
data = of_device_get_match_data(dev);
smmu->version = data->version;
smmu->model = data->model;
+   smmu->clk_names = data->clks;
+   smmu->num_clks = data->num_clks;
  
  	parse_driver_options(smmu);
  
@@ -2099,6 +2126,10 @@ static int arm_smmu_device_probe(struct platform_device *pdev)

smmu->irqs[i] = irq;
}
  
+	err = arm_smmu_init_clocks(smmu);

+   if (err)
+   return err;
+
err = arm_smmu_device_cfg_probe(smmu);
if (err)
return err;
@@ -2197,7 +2228,27 @@ static int __maybe_unused arm_smmu_pm_resume(struct 
device *dev)
return 0;
  }
  
-static SIMPLE_DEV_PM_OPS(arm_smmu_pm_ops, NULL, arm_smmu_pm_resume);

+static int __maybe_unused arm_smmu_runtime_resume(struct device *dev)
+{
+   struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+
+   return clk_bulk_prepare_enable(smmu->num_clks, smmu->clocks);
+}
+
+static int __maybe_unused arm_smmu_runtime_suspend(struct device *dev)
+{
+   struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+
+   clk_bulk_disable_unprepare(smmu->num_clks, smmu->clocks);
+
+   return 0;
+}
+
+static const struct dev_pm_ops arm_smmu_pm_ops = {
+   SET_SYSTEM_SLEEP_PM_OPS(NULL, arm_smmu_pm_resume)
+   SET_RUNTIME_PM_OPS(arm_smmu_runtime_suspend,
+  arm_smmu_runtime_resume, NULL)
+};
  
  static struct platform_driver arm_smmu_driver = {

.driver = {



Re: [PATCH v6 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2018-01-31 Thread Robin Murphy

On 19/01/18 11:43, Vivek Gautam wrote:

From: Sricharan R 

The smmu device probe/remove and add/remove master device callbacks
gets called when the smmu is not linked to its master, that is without
the context of the master device. So calling runtime apis in those places
separately.

Signed-off-by: Sricharan R 
[vivek: Cleanup pm runtime calls]
Signed-off-by: Vivek Gautam 
---
  drivers/iommu/arm-smmu.c | 45 +
  1 file changed, 41 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 21acffe91a1c..95478bfb182c 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -914,11 +914,15 @@ static void arm_smmu_destroy_domain_context(struct 
iommu_domain *domain)
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct arm_smmu_device *smmu = smmu_domain->smmu;
struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
-   int irq;
+   int ret, irq;
  
  	if (!smmu || domain->type == IOMMU_DOMAIN_IDENTITY)

return;
  
+	ret = pm_runtime_get_sync(smmu->dev);

+   if (ret)
+   return;
+
/*
 * Disable the context bank and free the page tables before freeing
 * it.
@@ -933,6 +937,8 @@ static void arm_smmu_destroy_domain_context(struct 
iommu_domain *domain)
  
  	free_io_pgtable_ops(smmu_domain->pgtbl_ops);

__arm_smmu_free_bitmap(smmu->context_map, cfg->cbndx);
+
+   pm_runtime_put_sync(smmu->dev);
  }
  
  static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)

@@ -1408,12 +1414,20 @@ static int arm_smmu_add_device(struct device *dev)
while (i--)
cfg->smendx[i] = INVALID_SMENDX;
  
-	ret = arm_smmu_master_alloc_smes(dev);

+   ret = pm_runtime_get_sync(smmu->dev);
if (ret)
goto out_cfg_free;
  
+	ret = arm_smmu_master_alloc_smes(dev);

+   if (ret) {
+   pm_runtime_put_sync(smmu->dev);
+   goto out_cfg_free;


Please keep to the existing pattern and put this on the cleanup path 
with a new label, rather than inline.



+   }
+
iommu_device_link(&smmu->iommu, dev);
  
+	pm_runtime_put_sync(smmu->dev);

+
return 0;
  
  out_cfg_free:

@@ -1428,7 +1442,7 @@ static void arm_smmu_remove_device(struct device *dev)
struct iommu_fwspec *fwspec = dev->iommu_fwspec;
struct arm_smmu_master_cfg *cfg;
struct arm_smmu_device *smmu;
-
+   int ret;
  
  	if (!fwspec || fwspec->ops != &arm_smmu_ops)

return;
@@ -1436,8 +1450,21 @@ static void arm_smmu_remove_device(struct device *dev)
cfg  = fwspec->iommu_priv;
smmu = cfg->smmu;
  
+	/*

+* The device link between the master device and
+* smmu is already purged at this point.
+* So enable the power to smmu explicitly.
+*/


I don't understand this comment, especially since we don't even 
introduce device links until the following patch... :/



+
+   ret = pm_runtime_get_sync(smmu->dev);
+   if (ret)
+   return;
+
iommu_device_unlink(&smmu->iommu, dev);
arm_smmu_master_free_smes(fwspec);
+
+   pm_runtime_put_sync(smmu->dev);
+
iommu_group_remove_device(dev);
kfree(fwspec->iommu_priv);
iommu_fwspec_free(dev);
@@ -2130,6 +2157,14 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
if (err)
return err;
  
+	platform_set_drvdata(pdev, smmu);

+
+   pm_runtime_enable(dev);
+
+   err = pm_runtime_get_sync(dev);
+   if (err)
+   return err;
+
err = arm_smmu_device_cfg_probe(smmu);
if (err)
return err;
@@ -2171,9 +2206,9 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
return err;
}
  
-	platform_set_drvdata(pdev, smmu);

arm_smmu_device_reset(smmu);
arm_smmu_test_smr_masks(smmu);
+   pm_runtime_put_sync(dev);
  
  	/*

 * For ACPI and generic DT bindings, an SMMU will be probed before
@@ -2212,6 +2247,8 @@ static int arm_smmu_device_remove(struct platform_device 
*pdev)
  
  	/* Turn the thing off */

writel(sCR0_CLIENTPD, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
+   pm_runtime_force_suspend(smmu->dev);


Why do we need this? I guess it might be a Qualcomm-ism as I don't see 
anyone else calling it from .remove other than a couple of other qcom_* 
drivers. Given that we only get here during system shutdown (or the root 
user intentionally pissing about with driver unbinding), it doesn't seem 
like a point where power saving really matters all that much.


I'd also naively expect that anything this device was the last consumer 
off would get turned off by core code anyway once it's removed, but 
maybe things aren't that slick; I dunno :/


Robin.


+
return 0;
  }
  



Re: [PATCH v6 4/6] iommu/arm-smmu: Add the device_link between masters and smmu

2018-01-31 Thread Robin Murphy

On 19/01/18 11:43, Vivek Gautam wrote:

From: Sricharan R 

Finally add the device link between the master device and
smmu, so that the smmu gets runtime enabled/disabled only when the
master needs it. This is done from add_device callback which gets
called once when the master is added to the smmu.


Don't we need to balance this with a device_link_del() in .remove_device 
(like exynos-iommu does)?


Robin.


Signed-off-by: Sricharan R 
---
  drivers/iommu/arm-smmu.c | 11 +++
  1 file changed, 11 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 95478bfb182c..33bbcfedb896 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1367,6 +1367,7 @@ static int arm_smmu_add_device(struct device *dev)
struct arm_smmu_device *smmu;
struct arm_smmu_master_cfg *cfg;
struct iommu_fwspec *fwspec = dev->iommu_fwspec;
+   struct device_link *link;
int i, ret;
  
  	if (using_legacy_binding) {

@@ -1428,6 +1429,16 @@ static int arm_smmu_add_device(struct device *dev)
  
  	pm_runtime_put_sync(smmu->dev);
  
+	/*

+* Establish the link between smmu and master, so that the
+* smmu gets runtime enabled/disabled as per the master's
+* needs.
+*/
+   link = device_link_add(dev, smmu->dev, DL_FLAG_PM_RUNTIME);
+   if (!link)
+   dev_warn(smmu->dev, "Unable to create device link between %s and 
%s\n",
+dev_name(smmu->dev), dev_name(dev));
+
return 0;
  
  out_cfg_free:




Re: [PATCH 1/2] iommu: Fix iommu_unmap and iommu_unmap_fast return type

2018-01-31 Thread Robin Murphy

Hi Suravee,

On 31/01/18 01:48, Suravee Suthikulpanit wrote:

Currently, iommu_unmap and iommu_unmap_fast return unmapped
pages with size_t.  However, the actual value returned could
be error codes (< 0), which can be misinterpreted as large
number of unmapped pages. Therefore, change the return type to ssize_t.

Cc: Joerg Roedel 
Cc: Alex Williamson 
Signed-off-by: Suravee Suthikulpanit 
---
  drivers/iommu/amd_iommu.c   |  6 +++---
  drivers/iommu/intel-iommu.c |  4 ++--


Er, there are a few more drivers than that implementing iommu_ops ;)

It seems like it might be more sensible to fix the single instance of a 
driver returning -EINVAL (which appears to be a "should never happen if 
used correctly" kinda thing anyway) and leave the API-internal callback 
prototype as-is. I do agree the inconsistency of iommu_unmap() itself 
wants sorting, though (particularly the !IOMMU_API stubs which are wrong 
either way).


Robin.


  drivers/iommu/iommu.c   | 16 
  include/linux/iommu.h   | 20 ++--
  4 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 7d5eb00..3609f51 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3030,11 +3030,11 @@ static int amd_iommu_map(struct iommu_domain *dom, 
unsigned long iova,
return ret;
  }
  
-static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,

-  size_t page_size)
+static ssize_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,
+  size_t page_size)
  {
struct protection_domain *domain = to_pdomain(dom);
-   size_t unmap_size;
+   ssize_t unmap_size;
  
  	if (domain->mode == PAGE_MODE_NONE)

return -EINVAL;
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 4a2de34..15ba866 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5068,8 +5068,8 @@ static int intel_iommu_map(struct iommu_domain *domain,
return ret;
  }
  
-static size_t intel_iommu_unmap(struct iommu_domain *domain,

-   unsigned long iova, size_t size)
+static ssize_t intel_iommu_unmap(struct iommu_domain *domain,
+unsigned long iova, size_t size)
  {
struct dmar_domain *dmar_domain = to_dmar_domain(domain);
struct page *freelist = NULL;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3de5c0b..8f7da8a 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1557,12 +1557,12 @@ int iommu_map(struct iommu_domain *domain, unsigned 
long iova,
  }
  EXPORT_SYMBOL_GPL(iommu_map);
  
-static size_t __iommu_unmap(struct iommu_domain *domain,

-   unsigned long iova, size_t size,
-   bool sync)
+static ssize_t __iommu_unmap(struct iommu_domain *domain,
+unsigned long iova, size_t size,
+bool sync)
  {
const struct iommu_ops *ops = domain->ops;
-   size_t unmapped_page, unmapped = 0;
+   ssize_t unmapped_page, unmapped = 0;
unsigned long orig_iova = iova;
unsigned int min_pagesz;
  
@@ -1617,15 +1617,15 @@ static size_t __iommu_unmap(struct iommu_domain *domain,

return unmapped;
  }
  
-size_t iommu_unmap(struct iommu_domain *domain,

-  unsigned long iova, size_t size)
+ssize_t iommu_unmap(struct iommu_domain *domain,
+   unsigned long iova, size_t size)
  {
return __iommu_unmap(domain, iova, size, true);
  }
  EXPORT_SYMBOL_GPL(iommu_unmap);
  
-size_t iommu_unmap_fast(struct iommu_domain *domain,

-   unsigned long iova, size_t size)
+ssize_t iommu_unmap_fast(struct iommu_domain *domain,
+unsigned long iova, size_t size)
  {
return __iommu_unmap(domain, iova, size, false);
  }
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 41b8c57..78df048 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -199,8 +199,8 @@ struct iommu_ops {
void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
int (*map)(struct iommu_domain *domain, unsigned long iova,
   phys_addr_t paddr, size_t size, int prot);
-   size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
-size_t size);
+   ssize_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
+size_t size);
size_t (*map_sg)(struct iommu_domain *domain, unsigned long iova,
 struct scatterlist *sg, unsigned int nents, int prot);
void (*flush_iotlb_all)(struct iommu_domain *domain);
@@ -299,10 +299,10 @@ extern void iommu_detach_device(struct iommu_domain 
*domain,
  extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
  extern int iommu_ma

Re: [PATCH 1/2] iommu: Fix iommu_unmap and iommu_unmap_fast return type

2018-02-01 Thread Robin Murphy

On 01/02/18 05:03, Suravee Suthikulpanit wrote:

Hi Robin,

On 2/1/18 1:02 AM, Robin Murphy wrote:

Hi Suravee,

On 31/01/18 01:48, Suravee Suthikulpanit wrote:

Currently, iommu_unmap and iommu_unmap_fast return unmapped
pages with size_t.  However, the actual value returned could
be error codes (< 0), which can be misinterpreted as large
number of unmapped pages. Therefore, change the return type to ssize_t.

Cc: Joerg Roedel 
Cc: Alex Williamson 
Signed-off-by: Suravee Suthikulpanit 
---
  drivers/iommu/amd_iommu.c   |  6 +++---
  drivers/iommu/intel-iommu.c |  4 ++--


Er, there are a few more drivers than that implementing iommu_ops ;)


Ahh right.


It seems like it might be more sensible to fix the single instance of 
a driver returning -EINVAL (which appears to be a "should never happen 
if used correctly" kinda thing anyway) and leave the API-internal 
callback prototype as-is. I do agree the inconsistency of 
iommu_unmap() itself wants sorting, though (particularly the 
!IOMMU_API stubs which are wrong either way).


Robin.


Make sense. I'll leave the API alone, and change the code to not 
returning error then.


Actually, on a second look I think that check in amd_iommu is 99% 
redundant anyway, since PAGE_MODE_NONE is only normally set for 
IOMMU_DOMAIN_IDENTITY domains, thus iommu_unmap() would have bailed out 
from the __IOMMU_DOMAIN_PAGING check before ops->unmap could be called. 
AFAICS the only way to hit it at all is if a caller somehow managed to 
get hold of the dev_state->domain set up in amd_iommu_init_device(), 
then tried to unmap something from that, which seems such a very wrong 
thing to do it should probably just crash and burn with extreme 
prejudice anyway.


Cheers,
Robin.


Re: [PATCH v3 13/18] firmware/psci: Expose PSCI conduit

2018-02-01 Thread Robin Murphy

On 01/02/18 11:46, Marc Zyngier wrote:

In order to call into the firmware to apply workarounds, it is
useful to find out whether we're using HVC or SMC. Let's expose
this through the psci_ops.


Reviewed-by: Robin Murphy 


Acked-by: Lorenzo Pieralisi 
Signed-off-by: Marc Zyngier 
---
  drivers/firmware/psci.c | 28 +++-
  include/linux/psci.h|  7 +++
  2 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/drivers/firmware/psci.c b/drivers/firmware/psci.c
index 8b25d31e8401..e9493da2b111 100644
--- a/drivers/firmware/psci.c
+++ b/drivers/firmware/psci.c
@@ -59,7 +59,9 @@ bool psci_tos_resident_on(int cpu)
return cpu == resident_cpu;
  }
  
-struct psci_operations psci_ops;

+struct psci_operations psci_ops = {
+   .conduit = PSCI_CONDUIT_NONE,
+};
  
  typedef unsigned long (psci_fn)(unsigned long, unsigned long,

unsigned long, unsigned long);
@@ -210,6 +212,22 @@ static unsigned long psci_migrate_info_up_cpu(void)
  0, 0, 0);
  }
  
+static void set_conduit(enum psci_conduit conduit)

+{
+   switch (conduit) {
+   case PSCI_CONDUIT_HVC:
+   invoke_psci_fn = __invoke_psci_fn_hvc;
+   break;
+   case PSCI_CONDUIT_SMC:
+   invoke_psci_fn = __invoke_psci_fn_smc;
+   break;
+   default:
+   WARN(1, "Unexpected PSCI conduit %d\n", conduit);
+   }
+
+   psci_ops.conduit = conduit;
+}
+
  static int get_set_conduit_method(struct device_node *np)
  {
const char *method;
@@ -222,9 +240,9 @@ static int get_set_conduit_method(struct device_node *np)
}
  
  	if (!strcmp("hvc", method)) {

-   invoke_psci_fn = __invoke_psci_fn_hvc;
+   set_conduit(PSCI_CONDUIT_HVC);
} else if (!strcmp("smc", method)) {
-   invoke_psci_fn = __invoke_psci_fn_smc;
+   set_conduit(PSCI_CONDUIT_SMC);
} else {
pr_warn("invalid \"method\" property: %s\n", method);
return -EINVAL;
@@ -654,9 +672,9 @@ int __init psci_acpi_init(void)
pr_info("probing for conduit method from ACPI.\n");
  
  	if (acpi_psci_use_hvc())

-   invoke_psci_fn = __invoke_psci_fn_hvc;
+   set_conduit(PSCI_CONDUIT_HVC);
else
-   invoke_psci_fn = __invoke_psci_fn_smc;
+   set_conduit(PSCI_CONDUIT_SMC);
  
  	return psci_probe();

  }
diff --git a/include/linux/psci.h b/include/linux/psci.h
index f724fd8c78e8..f2679e5faa4f 100644
--- a/include/linux/psci.h
+++ b/include/linux/psci.h
@@ -25,6 +25,12 @@ bool psci_tos_resident_on(int cpu);
  int psci_cpu_init_idle(unsigned int cpu);
  int psci_cpu_suspend_enter(unsigned long index);
  
+enum psci_conduit {

+   PSCI_CONDUIT_NONE,
+   PSCI_CONDUIT_SMC,
+   PSCI_CONDUIT_HVC,
+};
+
  struct psci_operations {
u32 (*get_version)(void);
int (*cpu_suspend)(u32 state, unsigned long entry_point);
@@ -34,6 +40,7 @@ struct psci_operations {
int (*affinity_info)(unsigned long target_affinity,
unsigned long lowest_affinity_level);
int (*migrate_info_type)(void);
+   enum psci_conduit conduit;
  };
  
  extern struct psci_operations psci_ops;




Re: [PATCH v3 14/18] firmware/psci: Expose SMCCC version through psci_ops

2018-02-01 Thread Robin Murphy

On 01/02/18 11:46, Marc Zyngier wrote:

Since PSCI 1.0 allows the SMCCC version to be (indirectly) probed,
let's do that at boot time, and expose the version of the calling
convention as part of the psci_ops structure.

Acked-by: Lorenzo Pieralisi 
Signed-off-by: Marc Zyngier 
---
  drivers/firmware/psci.c | 19 +++
  include/linux/psci.h|  6 ++
  2 files changed, 25 insertions(+)

diff --git a/drivers/firmware/psci.c b/drivers/firmware/psci.c
index e9493da2b111..8631906c414c 100644
--- a/drivers/firmware/psci.c
+++ b/drivers/firmware/psci.c
@@ -61,6 +61,7 @@ bool psci_tos_resident_on(int cpu)
  
  struct psci_operations psci_ops = {

.conduit = PSCI_CONDUIT_NONE,
+   .smccc_version = SMCCC_VERSION_1_0,
  };
  
  typedef unsigned long (psci_fn)(unsigned long, unsigned long,

@@ -511,6 +512,23 @@ static void __init psci_init_migrate(void)
pr_info("Trusted OS resident on physical CPU 0x%lx\n", cpuid);
  }
  
+static void __init psci_init_smccc(u32 ver)

+{
+   int feature;
+
+   feature = psci_features(ARM_SMCCC_VERSION_FUNC_ID);
+
+   if (feature != PSCI_RET_NOT_SUPPORTED) {
+   ver = invoke_psci_fn(ARM_SMCCC_VERSION_FUNC_ID, 0, 0, 0);
+   if (ver != ARM_SMCCC_VERSION_1_1)
+   psci_ops.smccc_version = SMCCC_VERSION_1_0;


AFAICS, unless you somehow run psci_probe() twice *and* have 
schizophrenic firmware, this assignment now does precisely nothing.


With the condition flipped and the redundant else case removed (or an 
explanation of why I'm wrong...)


Reviewed-by: Robin Murphy 


+   else
+   psci_ops.smccc_version = SMCCC_VERSION_1_1;
+   }
+
+   pr_info("SMC Calling Convention v1.%d\n", psci_ops.smccc_version);
+}
+
  static void __init psci_0_2_set_functions(void)
  {
pr_info("Using standard PSCI v0.2 function IDs\n");
@@ -559,6 +577,7 @@ static int __init psci_probe(void)
psci_init_migrate();
  
  	if (PSCI_VERSION_MAJOR(ver) >= 1) {

+   psci_init_smccc(ver);
psci_init_cpu_suspend();
psci_init_system_suspend();
}
diff --git a/include/linux/psci.h b/include/linux/psci.h
index f2679e5faa4f..8b1b3b5935ab 100644
--- a/include/linux/psci.h
+++ b/include/linux/psci.h
@@ -31,6 +31,11 @@ enum psci_conduit {
PSCI_CONDUIT_HVC,
  };
  
+enum smccc_version {

+   SMCCC_VERSION_1_0,
+   SMCCC_VERSION_1_1,
+};
+
  struct psci_operations {
u32 (*get_version)(void);
int (*cpu_suspend)(u32 state, unsigned long entry_point);
@@ -41,6 +46,7 @@ struct psci_operations {
unsigned long lowest_affinity_level);
int (*migrate_info_type)(void);
enum psci_conduit conduit;
+   enum smccc_version smccc_version;
  };
  
  extern struct psci_operations psci_ops;




Re: [PATCH v3 15/18] arm/arm64: smccc: Make function identifiers an unsigned quantity

2018-02-01 Thread Robin Murphy

On 01/02/18 11:46, Marc Zyngier wrote:

Function identifiers are a 32bit, unsigned quantity. But we never
tell so to the compiler, resulting in the following:

  4ac:   b26187e0mov x0, #0x8001

We thus rely on the firmware narrowing it for us, which is not
always a reasonable expectation.


I think technically it might be OK, since SMCCC states "A Function 
Identifier is passed in register W0.", which implies that a conforming 
implementation should also read w0, not x0, but it's certainly far 
easier to be completely right than to justify being possibly wrong.


Reviewed-by: Robin Murphy 


Cc: sta...@vger.kernel.org
Reported-by: Ard Biesheuvel 
Acked-by: Ard Biesheuvel 
Tested-by: Ard Biesheuvel 
Signed-off-by: Marc Zyngier 
---
  include/linux/arm-smccc.h | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
index e1ef944ef1da..dd44d8458c04 100644
--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
@@ -14,14 +14,16 @@
  #ifndef __LINUX_ARM_SMCCC_H
  #define __LINUX_ARM_SMCCC_H
  
+#include 

+
  /*
   * This file provides common defines for ARM SMC Calling Convention as
   * specified in
   * http://infocenter.arm.com/help/topic/com.arm.doc.den0028a/index.html
   */
  
-#define ARM_SMCCC_STD_CALL		0

-#define ARM_SMCCC_FAST_CALL1
+#define ARM_SMCCC_STD_CALL _AC(0,U)
+#define ARM_SMCCC_FAST_CALL_AC(1,U)
  #define ARM_SMCCC_TYPE_SHIFT  31
  
  #define ARM_SMCCC_SMC_32		0




Re: [PATCH v3 16/18] arm/arm64: smccc: Implement SMCCC v1.1 inline primitive

2018-02-01 Thread Robin Murphy
)r1 asm("r1") = a1; \
+   register typeof(a2)r2 asm("r2") = a2; \
+   register typeof(a3)r3 asm("r3") = a3
+
+#define __declare_arg_4(a0, a1, a2, a3, a4, res)   \
+   __declare_arg_3(a0, a1, a2, a3, res);   \
+   register typeof(a4) r4 asm("r4") = a4
+
+#define __declare_arg_5(a0, a1, a2, a3, a4, a5, res)   \
+   __declare_arg_4(a0, a1, a2, a3, a4, res);   \
+   register typeof(a5) r5 asm("r5") = a5
+
+#define __declare_arg_6(a0, a1, a2, a3, a4, a5, a6, res)   \
+   __declare_arg_5(a0, a1, a2, a3, a4, a5, res);   \
+   register typeof(a6) r6 asm("r6") = a6
+
+#define __declare_arg_7(a0, a1, a2, a3, a4, a5, a6, a7, res)   \
+   __declare_arg_6(a0, a1, a2, a3, a4, a5, a6, res);   \
+   register typeof(a7) r7 asm("r7") = a7
+
+#define ___declare_args(count, ...) __declare_arg_ ## count(__VA_ARGS__)
+#define __declare_args(count, ...)  ___declare_args(count, __VA_ARGS__)
+
+#define ___constraints(count)  \
+   : __constraint_write_ ## count  \
+   : __constraint_read_ ## count   \
+   : "memory"
+#define __constraints(count)   ___constraints(count)
+
+/*
+ * We have an output list that is not necessarily used, and GCC feels
+ * entitled to optimise the whole sequence away. "volatile" is what
+ * makes it stick.
+ */
+#define __arm_smccc_1_1(inst, ...) \
+   do {\
+   __declare_args(__count_args(__VA_ARGS__), __VA_ARGS__); \
+       asm volatile(inst "\n"\
+__constraints(__count_args(__VA_ARGS__))); \
+   if (___res) \
+   *___res = (typeof(*___res)){r0, r1, r2, r3};\


...especially since there's no obvious indication of where it comes from 
when you're looking here.


Otherwise, though, this has already turned out pretty sleek;

Reviewed-by: Robin Murphy 


+   } while (0)
+
+/*
+ * arm_smccc_1_1_smc() - make an SMCCC v1.1 compliant SMC call
+ *
+ * This is a variadic macro taking one to eight source arguments, and
+ * an optional return structure.
+ *
+ * @a0-a7: arguments passed in registers 0 to 7
+ * @res: result values from registers 0 to 3
+ *
+ * This macro is used to make SMC calls following SMC Calling Convention v1.1.
+ * The content of the supplied param are copied to registers 0 to 7 prior
+ * to the SMC instruction. The return values are updated with the content
+ * from register 0 to 3 on return from the SMC instruction if not NULL.
+ */
+#define arm_smccc_1_1_smc(...) __arm_smccc_1_1(SMCCC_SMC_INST, __VA_ARGS__)
+
+/*
+ * arm_smccc_1_1_hvc() - make an SMCCC v1.1 compliant HVC call
+ *
+ * This is a variadic macro taking one to eight source arguments, and
+ * an optional return structure.
+ *
+ * @a0-a7: arguments passed in registers 0 to 7
+ * @res: result values from registers 0 to 3
+ *
+ * This macro is used to make HVC calls following SMC Calling Convention v1.1.
+ * The content of the supplied param are copied to registers 0 to 7 prior
+ * to the HVC instruction. The return values are updated with the content
+ * from register 0 to 3 on return from the HVC instruction if not NULL.
+ */
+#define arm_smccc_1_1_hvc(...) __arm_smccc_1_1(SMCCC_HVC_INST, __VA_ARGS__)
+
  #endif /*__ASSEMBLY__*/
  #endif /*__LINUX_ARM_SMCCC_H*/



Re: [PATCH v3 16/18] arm/arm64: smccc: Implement SMCCC v1.1 inline primitive

2018-02-01 Thread Robin Murphy

On 01/02/18 13:54, Marc Zyngier wrote:

On 01/02/18 13:34, Robin Murphy wrote:

On 01/02/18 11:46, Marc Zyngier wrote:

One of the major improvement of SMCCC v1.1 is that it only clobbers
the first 4 registers, both on 32 and 64bit. This means that it
becomes very easy to provide an inline version of the SMC call
primitive, and avoid performing a function call to stash the
registers that would otherwise be clobbered by SMCCC v1.0.

Signed-off-by: Marc Zyngier 
---
   include/linux/arm-smccc.h | 143 
++
   1 file changed, 143 insertions(+)

diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
index dd44d8458c04..575aabe85905 100644
--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
@@ -150,5 +150,148 @@ asmlinkage void __arm_smccc_hvc(unsigned long a0, 
unsigned long a1,
   
   #define arm_smccc_hvc_quirk(...) __arm_smccc_hvc(__VA_ARGS__)
   
+/* SMCCC v1.1 implementation madness follows */

+#ifdef CONFIG_ARM64
+
+#define SMCCC_SMC_INST "smc   #0"
+#define SMCCC_HVC_INST "hvc   #0"


Nit: Maybe the argument can go in the template and we just define the
instruction mnemonics here?


+
+#endif
+
+#ifdef CONFIG_ARM


#elif ?


Sure, why not.




+#include 
+#include 
+
+#define SMCCC_SMC_INST __SMC(0)
+#define SMCCC_HVC_INST __HVC(0)


Oh, I see, it was to line up with this :(

I do wonder if we could just embed an asm(".arch armv7-a+virt\n") (if
even necessary) for ARM, then take advantage of the common mnemonics for
all 3 instruction sets instead of needing manual encoding tricks? I
don't think we should ever be pulling this file in for non-v7 builds.

I suppose that strictly that appears to need binutils 2.21 rather than
the offical supported minimum of 2.20, but are people going to be
throwing SMCCC configs at antique toolchains in practice?


It has been an issue in the past, back when we merged KVM. We settled on
a hybrid solution where code outside of KVM would not rely on a newer
toolchain, hence the macros that Dave introduced. Maybe we've moved on
and we can take that bold step?


Either way I think we can happily throw that on the "future cleanup" 
pile right now as it's not directly relevant to the purpose of the 
patch; I'm sure we don't want to make potential backporting even more 
difficult.





+
+#endif
+
+#define ___count_args(_0, _1, _2, _3, _4, _5, _6, _7, _8, x, ...) x
+
+#define __count_args(...)  \
+   ___count_args(__VA_ARGS__, 7, 6, 5, 4, 3, 2, 1, 0)
+
+#define __constraint_write_0   \
+   "+r" (r0), "=&r" (r1), "=&r" (r2), "=&r" (r3)
+#define __constraint_write_1   \
+   "+r" (r0), "+r" (r1), "=&r" (r2), "=&r" (r3)
+#define __constraint_write_2   \
+   "+r" (r0), "+r" (r1), "+r" (r2), "=&r" (r3)
+#define __constraint_write_3   \
+   "+r" (r0), "+r" (r1), "+r" (r2), "+r" (r3)
+#define __constraint_write_4   __constraint_write_3
+#define __constraint_write_5   __constraint_write_4
+#define __constraint_write_6   __constraint_write_5
+#define __constraint_write_7   __constraint_write_6
+
+#define __constraint_read_0
+#define __constraint_read_1
+#define __constraint_read_2
+#define __constraint_read_3
+#define __constraint_read_4"r" (r4)
+#define __constraint_read_5__constraint_read_4, "r" (r5)
+#define __constraint_read_6__constraint_read_5, "r" (r6)
+#define __constraint_read_7__constraint_read_6, "r" (r7)
+
+#define __declare_arg_0(a0, res)   \
+   struct arm_smccc_res   *___res = res;   \


Looks like the declaration of ___res could simply be factored out to the
template...


Tried that. But...




+   register u32   r0 asm("r0") = a0; \
+   register unsigned long r1 asm("r1");  \
+   register unsigned long r2 asm("r2");  \
+   register unsigned long r3 asm("r3")
+
+#define __declare_arg_1(a0, a1, res)   \
+   struct arm_smccc_res   *___res = res;   \
+   register u32   r0 asm("r0") = a0; \
+   register typeof(a1)r1 asm("r1") = a1; \
+   register unsigned long r2 asm("r2");  \
+   register unsigned long r3 asm("r3")
+
+#define __declare_arg_2(a0, a1, a2, res)   \
+   str

Re: [PATCH] soc: mediatek: Handle return of of_match_device function

2018-02-01 Thread Robin Murphy

On 01/02/18 11:00, Himanshu Jha wrote:

In scpsys_probe function, return value of of_match_device function which
returns null is dereferenced without checking. Therefore, add a check for
potential null dereference.

Detected by CoverityScan, CID#1424087 "Dereference null return value"

Fixes: commit 53fddb1a66dd ("soc: mediatek: reduce code duplication of scpsys_probe 
across all SoCs")
Signed-off-by: Himanshu Jha 
---
  drivers/soc/mediatek/mtk-scpsys.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/soc/mediatek/mtk-scpsys.c 
b/drivers/soc/mediatek/mtk-scpsys.c
index 435ce5e..6e7f196 100644
--- a/drivers/soc/mediatek/mtk-scpsys.c
+++ b/drivers/soc/mediatek/mtk-scpsys.c
@@ -981,6 +981,9 @@ static int scpsys_probe(struct platform_device *pdev)
int i, ret;
  
  	match = of_match_device(of_scpsys_match_tbl, &pdev->dev);

+   if (!match)
+   return -EINVAL;
+
soc = (const struct scp_soc_data *)match->data;


You could of course replace the whole sequence with an 
of_device_get_match_data() call, which happens to be inherently safe 
against the no-match case even when that *is* impossible by design.


Robin.

  
  	scp = init_scp(pdev, soc->domains, soc->num_domains, &soc->regs,




Re: [PATCH] soc: mediatek: Handle return of of_match_device function

2018-02-01 Thread Robin Murphy

On 01/02/18 15:09, Geert Uytterhoeven wrote:

On Thu, Feb 1, 2018 at 4:02 PM, Robin Murphy  wrote:

On 01/02/18 11:00, Himanshu Jha wrote:

In scpsys_probe function, return value of of_match_device function which
returns null is dereferenced without checking. Therefore, add a check for
potential null dereference.

Detected by CoverityScan, CID#1424087 "Dereference null return value"

Fixes: commit 53fddb1a66dd ("soc: mediatek: reduce code duplication of
scpsys_probe across all SoCs")
Signed-off-by: Himanshu Jha 
---
   drivers/soc/mediatek/mtk-scpsys.c | 3 +++
   1 file changed, 3 insertions(+)

diff --git a/drivers/soc/mediatek/mtk-scpsys.c
b/drivers/soc/mediatek/mtk-scpsys.c
index 435ce5e..6e7f196 100644
--- a/drivers/soc/mediatek/mtk-scpsys.c
+++ b/drivers/soc/mediatek/mtk-scpsys.c
@@ -981,6 +981,9 @@ static int scpsys_probe(struct platform_device *pdev)
 int i, ret;
 match = of_match_device(of_scpsys_match_tbl, &pdev->dev);
+   if (!match)
+   return -EINVAL;
+
 soc = (const struct scp_soc_data *)match->data;


You could of course replace the whole sequence with an
of_device_get_match_data() call, which happens to be inherently safe against
the no-match case even when that *is* impossible by design.


+1


 scp = init_scp(pdev, soc->domains, soc->num_domains, &soc->regs,


... followed by the static analyser gang complaining we may dereference
NULL pointer soc...


Well, if the static analysers can't track the provenance of 
dev->driver->of_match_table, let's keep ignoring them until they get 
cleverer :P


Robin.



Gr{oetje,eeting}s,

 Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
 -- Linus Torvalds



Re: [PATCH v6 2/2] media: V3s: Add support for Allwinner CSI.

2018-02-01 Thread Robin Murphy

On 31/01/18 09:37, Arnd Bergmann wrote:

On Wed, Jan 31, 2018 at 8:29 AM, Maxime Ripard
 wrote:

Hi Thierry,

On Tue, Jan 30, 2018 at 11:01:50AM +0100, Thierry Reding wrote:

On Tue, Jan 30, 2018 at 10:59:16AM +0100, Thierry Reding wrote:

On Tue, Jan 30, 2018 at 10:24:48AM +0100, Arnd Bergmann wrote:

On Tue, Jan 30, 2018 at 8:54 AM, Maxime Ripard
 wrote:

On Mon, Jan 29, 2018 at 03:34:02PM +0100, Arnd Bergmann wrote:

On Mon, Jan 29, 2018 at 10:25 AM, Linus Walleij
 wrote:

On Mon, Jan 29, 2018 at 9:25 AM, Maxime Ripard
 wrote:

On Sat, Jan 27, 2018 at 05:14:26PM +0100, Linus Walleij wrote:




At one point we had discussed adding a 'dma-masters' property that
lists all the buses on which a device can be a dma master, and
the respective properties of those masters (iommu, coherency,
offset, ...).

IIRC at the time we decided that we could live without that complexity,
but perhaps we cannot.


Are you talking about this ?
https://elixir.free-electrons.com/linux/latest/source/Documentation/devicetree/bindings/dma/dma.txt#L41

It doesn't seem to be related to that issue to me. And in our
particular cases, all the devices are DMA masters, the RAM is just
mapped to another address.


No, that's not the one I was thinking of. The idea at the time was much
more generic, and not limited to dma engines. I don't recall the details,
but I think that Thierry was either involved or made the proposal at the
time.


Yeah, I vaguely remember discussing something like this before. A quick
search through my inbox yielded these two threads, mostly related to
IOMMU but I think there were some mentions about dma-ranges and so on as
well. I'll have to dig deeper into those threads to refresh my memories,
but I won't get around to it until later today.

If someone wants to read up on this in the meantime, here are the links:

 https://lkml.org/lkml/2014/4/27/346
 http://lists.infradead.org/pipermail/linux-arm-kernel/2014-May/257200.html

 From a quick glance the issue of dma-ranges was something that we hand-
waved at the time.


Also found this, which seems to be relevant as well:

   
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-May/252715.html

Adding Dave.


Thanks for the pointers, I started to read through it.

I guess we have to come up with two solutions here: a short term one
to address the users we already have in the kernel properly, and a
long term one where we could use that discussion as a starting point.

For the short term one, could we just set the device dma_pfn_offset to
PHYS_OFFSET at probe time, and use our dma_addr_t directly later on,
or would this also cause some issues?


That would certainly be an improvement over the current version,
it keeps the hack more localized. That's fine with me. Note that both
PHYS_OFFSET and dma_pfn_offset have architecture specific
meanings and they could in theory change, so ideally we'd do that
fixup somewhere in arch/arm/mach-sunxi/ at boot time before the
driver gets probed, but this wouldn't work on arm64 if we need it
there too.


dma_pfn_offset certainly *shouldn't* be architecture-specific; it's just 
historically not been used consistently everywhere. That should now be 
addressed by Christoph's big dma_direct_ops cleanup (now merged) which 
fills in the places it was being missed in generic code. From quickly 
skimming this thread, it sounds like that is really should be sufficient 
for this particular hardware - if all its DMA goes through an 
interconnect which makes RAM appear at 0, then you give it "dma-ranges = 
<0 PHYS_OFFSET size>;" and things should 'just work' provided the DMA 
API is otherwise being used correctly.


If different devices have differing DMA master address maps (such that 
you can't just place a single dma-ranges on your "soc" node) then the 
trick is to wrap them in intermediate "simple-bus" nodes with a 
straight-though "ranges;" and the appropriate "dma-ranges = ...". Yes, 
it's a bit of a bodge, but then pretending AXI/APB interconnects are 
buses in the sense that DT assumes is already a bit of a bodge.



For the long term plan, from what I read from the discussion, it's
mostly centered around IOMMU indeed, and we don't have that. What we
would actually need is to break the assumption that the DMA "parent"
bus is the DT node's parent bus.

And I guess this could be done quite easily by adding a dma-parent
property with a phandle to the bus controller, that would have a
dma-ranges property of its own with the proper mapping described
there. It should be simple enough to support, but is there anything
that could prevent something like that to work properly (such as
preventing further IOMMU-related developments that were described in
those mail threads).


I've thought about it a little bit now. A dma-parent property would nicely
solve two problems:

- a device on a memory mapped control bus that is a bus master on
   a different bus. This is the case we are talking about here AFAICT

- a device that is on a 

Re: [PATCH] ARM-SMMU: Delete error messages for a failed memory allocation in three functions

2018-01-22 Thread Robin Murphy

On 20/01/18 14:36, SF Markus Elfring wrote:

From: Markus Elfring 
Date: Sat, 20 Jan 2018 15:30:17 +0100

Omit extra messages for a memory allocation failure in these functions.


Why?

It's your job as patch author to convince reviewers and maintainers why 
your change is a good thing and they should spend their time looking at 
it, much less consider merging it. This may as well be "delete some 
stuff because I feel like it".


Do bear in mind the nature of these drivers; Arm SMMUs are not something 
you find in microcontrollers. On systems using these drivers, it will 
make no difference whatsoever to anyone if the many-megabyte kernel 
image is 47 bytes (or whatever) smaller.



This issue was detected by using the Coccinelle software.


I think I'm going to have to start treating mention of Coccinelle as a 
potential disclaimer saying "I haven't attempted to understand the code 
I'm changing" :(



Signed-off-by: Markus Elfring 
---
  drivers/iommu/arm-smmu-v3.c | 9 +++--
  drivers/iommu/arm-smmu.c| 9 +++--
  2 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index f122071688fd..5c2a7103d494 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2134,10 +2134,8 @@ static int arm_smmu_init_l1_strtab(struct 
arm_smmu_device *smmu)
void *strtab = smmu->strtab_cfg.strtab;
  
  	cfg->l1_desc = devm_kzalloc(smmu->dev, size, GFP_KERNEL);

-   if (!cfg->l1_desc) {
-   dev_err(smmu->dev, "failed to allocate l1 stream table desc\n");


OK, I'll stop playing *completely* dumb; I do know you would get a splat 
if kmalloc() ever did fail. But what you're removing isn't 
printk("failed to allocate memory\n"), it's a message which says exactly 
what allocation failed *for which device*. Can you clarify how I'm going 
to diagnose this particular problem from the generic splat when all I 
have is en email from a customer with a dmesg dump?



+   if (!cfg->l1_desc)
return -ENOMEM;
-   }
  
  	for (i = 0; i < cfg->num_l1_ents; ++i) {

arm_smmu_write_strtab_l1_desc(strtab, &cfg->l1_desc[i]);
@@ -2828,10 +2826,9 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
bool bypass;
  
  	smmu = devm_kzalloc(dev, sizeof(*smmu), GFP_KERNEL);

-   if (!smmu) {
-   dev_err(dev, "failed to allocate arm_smmu_device\n");
+   if (!smmu)
return -ENOMEM;
-   }
+
smmu->dev = dev;
  
  	if (dev->of_node) {

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 78d4c6b8f1ba..a4da4a870a2e 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -2048,10 +2048,9 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
int num_irqs, i, err;
  
  	smmu = devm_kzalloc(dev, sizeof(*smmu), GFP_KERNEL);

-   if (!smmu) {
-   dev_err(dev, "failed to allocate arm_smmu_device\n");
+   if (!smmu)
return -ENOMEM;
-   }
+
smmu->dev = dev;
  
  	if (dev->of_node)

@@ -2084,10 +2083,8 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
  
  	smmu->irqs = devm_kzalloc(dev, sizeof(*smmu->irqs) * num_irqs,

  GFP_KERNEL);
-   if (!smmu->irqs) {
-   dev_err(dev, "failed to allocate %d irqs\n", num_irqs);


This more than any other is removing potentially useful information: 
"failed to allocate 37890756 irqs", for instance, would indicate a bug 
which is very much *not* an out-of-memory condition.


Robin.


+   if (!smmu->irqs)
return -ENOMEM;
-   }
  
  	for (i = 0; i < num_irqs; ++i) {

int irq = platform_get_irq(pdev, i);



Re: [PATCH] iommu/of: Only do IOMMU lookup for available ones

2018-01-08 Thread Robin Murphy

Hi Jeffy,

On 03/01/18 06:09, Jeffy Chen wrote:

The for_each_matching_node_and_match() would return every matching
nodes including unavailable ones.

It's pointless to init unavailable IOMMUs, so add a sanity check to
avoid that.


Even better would be to clean up the last remaining init_fn user and get 
rid of the whole business. With the probe-deferral mechanism, early 
initialisation hooks are no longer needed, and the IOMMU_OF_DECLARE 
section really only remains as a way of detecting builtin drivers long 
before their registration initcalls run.


Robin.


Signed-off-by: Jeffy Chen 
---

  drivers/iommu/of_iommu.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 50947ebb6d17..6f7456caa30d 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -240,6 +240,9 @@ static int __init of_iommu_init(void)
for_each_matching_node_and_match(np, matches, &match) {
const of_iommu_init_fn init_fn = match->data;
  
+		if (!of_device_is_available(np))

+   continue;
+
if (init_fn && init_fn(np))
pr_err("Failed to initialise IOMMU %pOF\n", np);
}



Re: [PATCH] ARM: realview: remove eb-mp clcd IRQ

2018-01-08 Thread Robin Murphy

Hi Linus,

On 21/12/17 22:08, Linus Walleij wrote:

On Thu, Dec 21, 2017 at 10:31 PM, Arnd Bergmann  wrote:


We get a dtc warning about the CLCD interrupt being invalid:

arch/arm/boot/dts/arm-realview-eb-11mp-ctrevb.dtb: Warning 
(interrupts_property): interrupts size is (8), expected multiple of 12 in 
/fpga/charlcd@10008000

According to the datasheet I found and the old board file, this
line is not connected, so I'm removing the respective properties here.

Link: 
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0411d/index.html
Signed-off-by: Arnd Bergmann 


There is some confusion here. There is CLCD "Color LCD"
which is just a code name for PrimeCell PL111 and there is the actual
character LCD which is a hardware thin to talk to a character LCD with
some characters on.


diff --git a/arch/arm/boot/dts/arm-realview-eb-mp.dtsi 
b/arch/arm/boot/dts/arm-realview-eb-mp.dtsi


So this DTS is for the ARM 11 MP core tile which is described in
DUI0318F. It doesn't even list an IRQ for the character LCD.


  &charlcd {
-   interrupt-parent = <&intc>;
-   interrupts = <0  IRQ_TYPE_LEVEL_HIGH>;


This was probably me thinking to go back and fill in the right
IRQ and forgetting to actually do it. Sorry :(


+   /* CLCD is not connected here */


Call it character LCD instead to avoid confusion please.


+   /delete-property/interrupt-parent;
+   /delete-property/interrupts;


I don't understand this delete-property business (first time
I see it, but the top level
DTSI (arm-realview-eb.dtsi) does not define any interrupt
so can't you just delete this whole &charlcd?

I do think the reference design has a character LCD, and I
do think it has an interrupt, it's just undocumented so
someone with this board would have to test it manually
to figure out which line it is. Whoever uses this design
will get to it if ever.


FWIW the EB baseboard is *physically* the same regardless of the CPU, 
it's just flashed with a Core-Tile-specific FPGA bitstream. I've just 
tried firing up an 11MPCore one, and indeed the character LCD does light 
up with the kernel version. I can't convince the recalcitrant beast to 
actually get to userspace, though, so I can't confirm what the 
interrupt's deal is.


The baseboard manual (DUI0303E) says it's interrupt 22 on the 
board-level secondary GICs, and since neither the CT11MP nor its 
corresponding FPGA (AN152) mention any alternate routing direct to the 
Core Tile GIC, I'd guess it probably still is. On the other hand, 
though, it also says this:


  "... However this interrupt signal is reserved for future use and you
   must use a polling routine instead of an interrupt service routine."

So maybe it's appropriate to just remove the interrupt everywhere :/

Robin.


  1   2   3   4   5   6   7   8   9   10   >