Re: [RFC] arm64: Ensure proper addressing for ldnp/stnp
On 19/09/16 18:36, Brent DeGraaf wrote: > According to section 6.3.8 of the ARM Programmer's Guide, non-temporal > loads and stores do not verify that address dependency is met between a > load of an address to a register and a subsequent non-temporal load or > store using that address on the executing PE. Therefore, context switch > code and subroutine calls that use non-temporally accessed addresses as > parameters that might depend on a load of an address into an argument > register must ensure that ordering requirements are met by introducing > a barrier prior to the successive non-temporal access. Add appropriate > barriers whereever this specific situation comes into play. > > Signed-off-by: Brent DeGraaf > --- > arch/arm64/kernel/entry.S | 1 + > arch/arm64/lib/copy_page.S | 2 ++ > 2 files changed, 3 insertions(+) > > diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S > index 441420c..982c4d3 100644 > --- a/arch/arm64/kernel/entry.S > +++ b/arch/arm64/kernel/entry.S > @@ -679,6 +679,7 @@ ENTRY(cpu_switch_to) > ldp x27, x28, [x8], #16 > ldp x29, x9, [x8], #16 > ldr lr, [x8] > + dmb nshld // Existence of instructions with loose load-use > dependencies (e.g. ldnp/stnp) make this barrier necessary > mov sp, x9 > and x9, x9, #~(THREAD_SIZE - 1) > msr sp_el0, x9 > diff --git a/arch/arm64/lib/copy_page.S b/arch/arm64/lib/copy_page.S > index 4c1e700..21c6892 100644 > --- a/arch/arm64/lib/copy_page.S > +++ b/arch/arm64/lib/copy_page.S > @@ -47,6 +47,8 @@ alternative_endif > ldp x14, x15, [x1, #96] > ldp x16, x17, [x1, #112] > > + dmb nshld // In case x0 (for stnp) is dependent on a load The ARMv8 ARM (B2.7.2 in issue j) says that when an address dependency exists between a load and a subsequent LDNP, *other* observers may observe those accesses in any order. How's that related to an STNP on the same CPU? Robin. > + > mov x18, #(PAGE_SIZE - 128) > add x1, x1, #128 > 1: >
Re: [RFC] arm64: Ensure proper addressing for ldnp/stnp
On 19/09/16 19:25, bdegr...@codeaurora.org wrote: > On 2016-09-19 14:01, Robin Murphy wrote: >> On 19/09/16 18:36, Brent DeGraaf wrote: >>> According to section 6.3.8 of the ARM Programmer's Guide, non-temporal >>> loads and stores do not verify that address dependency is met between a >>> load of an address to a register and a subsequent non-temporal load or >>> store using that address on the executing PE. Therefore, context switch >>> code and subroutine calls that use non-temporally accessed addresses as >>> parameters that might depend on a load of an address into an argument >>> register must ensure that ordering requirements are met by introducing >>> a barrier prior to the successive non-temporal access. Add appropriate >>> barriers whereever this specific situation comes into play. >>> >>> Signed-off-by: Brent DeGraaf >>> --- >>> arch/arm64/kernel/entry.S | 1 + >>> arch/arm64/lib/copy_page.S | 2 ++ >>> 2 files changed, 3 insertions(+) >>> >>> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S >>> index 441420c..982c4d3 100644 >>> --- a/arch/arm64/kernel/entry.S >>> +++ b/arch/arm64/kernel/entry.S >>> @@ -679,6 +679,7 @@ ENTRY(cpu_switch_to) >>> ldpx27, x28, [x8], #16 >>> ldpx29, x9, [x8], #16 >>> ldrlr, [x8] >>> +dmbnshld// Existence of instructions with loose load-use >>> dependencies (e.g. ldnp/stnp) make this barrier necessary >>> movsp, x9 >>> andx9, x9, #~(THREAD_SIZE - 1) >>> msrsp_el0, x9 >>> diff --git a/arch/arm64/lib/copy_page.S b/arch/arm64/lib/copy_page.S >>> index 4c1e700..21c6892 100644 >>> --- a/arch/arm64/lib/copy_page.S >>> +++ b/arch/arm64/lib/copy_page.S >>> @@ -47,6 +47,8 @@ alternative_endif >>> ldpx14, x15, [x1, #96] >>> ldpx16, x17, [x1, #112] >>> >>> +dmbnshld // In case x0 (for stnp) is dependent on a load >> >> The ARMv8 ARM (B2.7.2 in issue j) says that when an address dependency >> exists between a load and a subsequent LDNP, *other* observers may >> observe those accesses in any order. How's that related to an STNP on >> the same CPU? >> >> Robin. >> >>> + >>> movx18, #(PAGE_SIZE - 128) >>> addx1, x1, #128 >>> 1: >>> > > Yes, I have seen the section in the ARM ARM about this. But the > Programmer's Guide goes further, even providing a concrete example: > > "Non-temporal loads and stores relax the memory ordering > requirements...the LDNP instruction might > be observed before the preceding LDR instruction, which can result in > reading from an unpredictable > address in X0. > > For example: > LDR X0, [X3] > LDNP X2, X1, [X0] > To correct the above, you need an explicit load barrier: > LDR X0, [X3] > DMB NSHLD > LDNP X2, X1, [X0]" > > Did the ARM ARM leave this out? Or is the Programmer's Guide section > incorrect? If the ARM ARM and the Programmer's Guide don't agree, then the Programmer's Guide is wrong (I'll raise a bug against it). All the ARM ARM says is that in this situation: P1P2 STP x0, x1, [x2] 1: LDR x0, DMB ISH CBZ x0, 1b STR x2, LDNP x1, x2, [x0] P2's address dependency still very much exists from the point of view of P2's execution, it just may not guarantee order against the DMB on P1, so P2's LDNP isn't guaranteed to see the data from P1's STP (as opposed to how a regular LDP *is*), and may still load older stale data instead. Robin. > > Thanks for your comments, > Brent >
Re: [PATCH] dma-mapping.h: Preserve unmap info for CONFIG_DMA_API_DEBUG
On 20/09/16 16:58, Andrey Smirnov wrote: > When CONFIG_DMA_API_DEBUG is enabled we need to preserve unmapping > address even if "unmap" is a no-op for our architecutre because we need > debug_dma_unmap_page() to correctly cleanup all of the debug > bookkeeping. Failing to do so results in a false positive warnings about > previously mapped areas never being unmapped. Makes sense, although I guess it might be even clearer to simply have DMA_API_DEBUG select NEED_DMA_MAP_STATE. Either way, though, Reviewed-by: Robin Murphy > > Signed-off-by: Andrey Smirnov > --- > include/linux/dma-mapping.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h > index 71c1b21..41c9875 100644 > --- a/include/linux/dma-mapping.h > +++ b/include/linux/dma-mapping.h > @@ -678,7 +678,7 @@ static inline int dma_mmap_wc(struct device *dev, > #define dma_mmap_writecombine dma_mmap_wc > #endif > > -#ifdef CONFIG_NEED_DMA_MAP_STATE > +#if defined(CONFIG_NEED_DMA_MAP_STATE) || defined(CONFIG_DMA_API_DEBUG) > #define DEFINE_DMA_UNMAP_ADDR(ADDR_NAME)dma_addr_t ADDR_NAME > #define DEFINE_DMA_UNMAP_LEN(LEN_NAME) __u32 LEN_NAME > #define dma_unmap_addr(PTR, ADDR_NAME) ((PTR)->ADDR_NAME) >
Re: [PATCH 1/2] armv8: aarch32: Execute 32-bit Linux for LayerScape platforms
Hi Alison, On 23/09/16 03:19, Alison Wang wrote: > The ARMv8 architecture supports: > 1. 64-bit execution state, AArch64. > 2. 32-bit execution state, AArch32, that is compatible with previous > versions of the ARM architecture. > > LayerScape platforms are compliant with ARMv8 architecture. This patch > is to support running 32-bit Linux kernel for LayerScape platforms. > > Verified on LayerScape LS1043ARDB, LS1012ARDB, LS1046ARDB boards. > > Signed-off-by: Ebony Zhu > Signed-off-by: Alison Wang > --- > arch/arm/Kconfig| 9 + > arch/arm/mach-imx/Kconfig | 14 ++ > arch/arm/mach-imx/Makefile | 4 +++- > arch/arm/mach-imx/mach-layerscape.c | 23 +++ > 4 files changed, 49 insertions(+), 1 deletion(-) > create mode 100644 arch/arm/mach-imx/mach-layerscape.c > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index f0c8068..e8d470e 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -294,6 +294,15 @@ config PGTABLE_LEVELS > default 3 if ARM_LPAE > default 2 > > +config ARCH_AARCH32_ES_SUPPORT > + def_bool n > + help > + The ARMv8 architecture supports 64-bit execution state, AArch64 > + and 32-bit execution state, AArch32, that is compatible with > + previous versions of the ARM architecture. > + > + Enable AArch32 execution state support for ARMv8 architecture. What's this supposed to do, exactly? I've been running 32-bit kernels on my Juno with very little issue (beyond a couple of DT tweaks, and some firmware hacks with a corresponding bit of A64 assembly tacked on the front of the zImage to switch into AArch32 state). Robin. > + > source "init/Kconfig" > > source "kernel/Kconfig.freezer" > diff --git a/arch/arm/mach-imx/Kconfig b/arch/arm/mach-imx/Kconfig > index 0ac05a0..fda4f5f 100644 > --- a/arch/arm/mach-imx/Kconfig > +++ b/arch/arm/mach-imx/Kconfig > @@ -549,6 +549,20 @@ config SOC_LS1021A > help > This enables support for Freescale LS1021A processor. > > +config ARCH_LAYERSCAPE_AARCH32 > + bool "Freescale Layerscape SoC AArch32 ES support" > + select ARCH_AARCH32_ES_SUPPORT > + select ARM_AMBA > + select ARM_GIC > + select ARM_ARCH_TIMER > + select ARCH_DMA_ADDR_T_64BIT if ARM_LPAE > + select PCI_LAYERSCAPE if PCI > + select PCI_DOMAINS if PCI > + > + help > + This enables support for Freescale Layerscape SoC family in > + in AArch32 execution state. > + > endif > > comment "Cortex-A/Cortex-M asymmetric multiprocessing platforms" > diff --git a/arch/arm/mach-imx/Makefile b/arch/arm/mach-imx/Makefile > index 737450f..7ded4fa 100644 > --- a/arch/arm/mach-imx/Makefile > +++ b/arch/arm/mach-imx/Makefile > @@ -69,7 +69,7 @@ obj-$(CONFIG_HAVE_IMX_ANATOP) += anatop.o > obj-$(CONFIG_HAVE_IMX_GPC) += gpc.o > obj-$(CONFIG_HAVE_IMX_MMDC) += mmdc.o > obj-$(CONFIG_HAVE_IMX_SRC) += src.o > -ifneq ($(CONFIG_SOC_IMX6)$(CONFIG_SOC_LS1021A),) > +ifneq > ($(CONFIG_SOC_IMX6)$(CONFIG_SOC_LS1021A)$(CONFIG_ARCH_LAYERSCAPE_AARCH32),) > AFLAGS_headsmp.o :=-Wa,-march=armv7-a > obj-$(CONFIG_SMP) += headsmp.o platsmp.o > obj-$(CONFIG_HOTPLUG_CPU) += hotplug.o > @@ -96,4 +96,6 @@ obj-$(CONFIG_SOC_VF610) += mach-vf610.o > > obj-$(CONFIG_SOC_LS1021A) += mach-ls1021a.o > > +obj-$(CONFIG_ARCH_LAYERSCAPE_AARCH32) += mach-layerscape.o > + > obj-y += devices/ > diff --git a/arch/arm/mach-imx/mach-layerscape.c > b/arch/arm/mach-imx/mach-layerscape.c > new file mode 100644 > index 000..acfb2a2 > --- /dev/null > +++ b/arch/arm/mach-imx/mach-layerscape.c > @@ -0,0 +1,23 @@ > +/* > + * Copyright 2015-2016 Freescale Semiconductor, Inc. > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + */ > + > +#include > + > +#include "common.h" > + > +static const char * const layerscape_dt_compat[] __initconst = { > + "fsl,ls1012a", > + "fsl,ls1043a", > + "fsl,ls1046a", > + NULL, > +}; > + > +DT_MACHINE_START(LAYERSCAPE_AARCH32, "Freescale LAYERSCAPE") > + .dt_compat = layerscape_dt_compat, > +MACHINE_END >
Re: [PATCH 1/2] armv8: aarch32: Execute 32-bit Linux for LayerScape platforms
On 23/09/16 14:13, Stuart Yoder wrote: > > >> -Original Message----- >> From: Robin Murphy [mailto:robin.mur...@arm.com] >> Sent: Friday, September 23, 2016 7:17 AM >> To: Alison Wang ; shawn...@kernel.org; >> ker...@pengutronix.de; Fabio Estevam >> Estevam ; li...@armlinux.org.uk; >> linux-arm-ker...@lists.infradead.org; linux- >> ker...@vger.kernel.org; Scott Wood ; Stuart Yoder >> ; Leo Li >> >> Cc: Jason Jin >> Subject: Re: [PATCH 1/2] armv8: aarch32: Execute 32-bit Linux for LayerScape >> platforms >> >> Hi Alison, >> >> On 23/09/16 03:19, Alison Wang wrote: >>> The ARMv8 architecture supports: >>> 1. 64-bit execution state, AArch64. >>> 2. 32-bit execution state, AArch32, that is compatible with previous >>> versions of the ARM architecture. >>> >>> LayerScape platforms are compliant with ARMv8 architecture. This patch >>> is to support running 32-bit Linux kernel for LayerScape platforms. >>> >>> Verified on LayerScape LS1043ARDB, LS1012ARDB, LS1046ARDB boards. >>> >>> Signed-off-by: Ebony Zhu >>> Signed-off-by: Alison Wang >>> --- >>> arch/arm/Kconfig| 9 + >>> arch/arm/mach-imx/Kconfig | 14 ++ >>> arch/arm/mach-imx/Makefile | 4 +++- >>> arch/arm/mach-imx/mach-layerscape.c | 23 +++ >>> 4 files changed, 49 insertions(+), 1 deletion(-) >>> create mode 100644 arch/arm/mach-imx/mach-layerscape.c >>> >>> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig >>> index f0c8068..e8d470e 100644 >>> --- a/arch/arm/Kconfig >>> +++ b/arch/arm/Kconfig >>> @@ -294,6 +294,15 @@ config PGTABLE_LEVELS >>> default 3 if ARM_LPAE >>> default 2 >>> >>> +config ARCH_AARCH32_ES_SUPPORT >>> + def_bool n >>> + help >>> +The ARMv8 architecture supports 64-bit execution state, AArch64 >>> +and 32-bit execution state, AArch32, that is compatible with >>> +previous versions of the ARM architecture. >>> + >>> +Enable AArch32 execution state support for ARMv8 architecture. >> >> What's this supposed to do, exactly? I've been running 32-bit kernels on >> my Juno with very little issue (beyond a couple of DT tweaks, and some >> firmware hacks with a corresponding bit of A64 assembly tacked on the >> front of the zImage to switch into AArch32 state). > > Which arch/arm/mach-* platform are you using for Juno? I don't even know! :) I just start with a multi_v7_defconfig plus a few extra bits (LPAE, KVM, sil24, sky2, etc.) and it works. I guess it's the combination of mach-vexpress and mach-virt. Robin. > > Thanks, > Stuart >
Re: [PATCH 1/2] armv8: aarch32: Execute 32-bit Linux for LayerScape platforms
On 23/09/16 15:01, Stuart Yoder wrote: > > >> -Original Message----- >> From: Robin Murphy [mailto:robin.mur...@arm.com] >> Sent: Friday, September 23, 2016 8:19 AM >> To: Stuart Yoder ; Alison Wang ; >> shawn...@kernel.org; >> ker...@pengutronix.de; Fabio Estevam Estevam ; >> li...@armlinux.org.uk; linux-arm- >> ker...@lists.infradead.org; linux-kernel@vger.kernel.org; Scott Wood >> ; Leo Li >> >> Cc: Jason Jin >> Subject: Re: [PATCH 1/2] armv8: aarch32: Execute 32-bit Linux for LayerScape >> platforms >> >> On 23/09/16 14:13, Stuart Yoder wrote: >>> >>> >>>> -Original Message- >>>> From: Robin Murphy [mailto:robin.mur...@arm.com] >>>> Sent: Friday, September 23, 2016 7:17 AM >>>> To: Alison Wang ; shawn...@kernel.org; >>>> ker...@pengutronix.de; Fabio Estevam >>>> Estevam ; li...@armlinux.org.uk; >>>> linux-arm-ker...@lists.infradead.org; linux- >>>> ker...@vger.kernel.org; Scott Wood ; Stuart Yoder >>>> ; Leo Li >>>> >>>> Cc: Jason Jin >>>> Subject: Re: [PATCH 1/2] armv8: aarch32: Execute 32-bit Linux for >>>> LayerScape platforms >>>> >>>> Hi Alison, >>>> >>>> On 23/09/16 03:19, Alison Wang wrote: >>>>> The ARMv8 architecture supports: >>>>> 1. 64-bit execution state, AArch64. >>>>> 2. 32-bit execution state, AArch32, that is compatible with previous >>>>> versions of the ARM architecture. >>>>> >>>>> LayerScape platforms are compliant with ARMv8 architecture. This patch >>>>> is to support running 32-bit Linux kernel for LayerScape platforms. >>>>> >>>>> Verified on LayerScape LS1043ARDB, LS1012ARDB, LS1046ARDB boards. >>>>> >>>>> Signed-off-by: Ebony Zhu >>>>> Signed-off-by: Alison Wang >>>>> --- >>>>> arch/arm/Kconfig| 9 + >>>>> arch/arm/mach-imx/Kconfig | 14 ++ >>>>> arch/arm/mach-imx/Makefile | 4 +++- >>>>> arch/arm/mach-imx/mach-layerscape.c | 23 +++ >>>>> 4 files changed, 49 insertions(+), 1 deletion(-) >>>>> create mode 100644 arch/arm/mach-imx/mach-layerscape.c >>>>> >>>>> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig >>>>> index f0c8068..e8d470e 100644 >>>>> --- a/arch/arm/Kconfig >>>>> +++ b/arch/arm/Kconfig >>>>> @@ -294,6 +294,15 @@ config PGTABLE_LEVELS >>>>> default 3 if ARM_LPAE >>>>> default 2 >>>>> >>>>> +config ARCH_AARCH32_ES_SUPPORT >>>>> + def_bool n >>>>> + help >>>>> + The ARMv8 architecture supports 64-bit execution state, AArch64 >>>>> + and 32-bit execution state, AArch32, that is compatible with >>>>> + previous versions of the ARM architecture. >>>>> + >>>>> + Enable AArch32 execution state support for ARMv8 architecture. >>>> >>>> What's this supposed to do, exactly? I've been running 32-bit kernels on >>>> my Juno with very little issue (beyond a couple of DT tweaks, and some >>>> firmware hacks with a corresponding bit of A64 assembly tacked on the >>>> front of the zImage to switch into AArch32 state). >>> >>> Which arch/arm/mach-* platform are you using for Juno? >> >> I don't even know! :) I just start with a multi_v7_defconfig plus a few >> extra bits (LPAE, KVM, sil24, sky2, etc.) and it works. I guess it's the >> combination of mach-vexpress and mach-virt. > > Thanks. A question about the switch into aarch32 state... our assumption > was that the kernel starts at EL2. In this proof of concept we're doing the > switch to aarch32/EL2 in firmware. And what I'm being told is that the > firmware aarch64 EL2 code cannot switch to aarch32 EL2 without some > assistance from EL3 firmware. This is leading us to invent a new > SMC call to do this. > > Did you face this? Yes, the only way to enter in Hyp is to have the firmware twiddle SCR_EL3.RW (I simply stuck a disgusting hack directly in ATF's exception handler, which my dodgy 64-bit header then calls). Otherwise you can always simply run your own shim at EL2 to drive an AArch32 EL1 (it'll need to trap and translate subsequent SMC calls for e.g. PSCI). > If there is such a requirement, it's something begging for standardization. > Doesn't make sense for multiple divergent approaches for switching from > aarch64/EL2 to aarch32/EL2. Perhaps - I did briefly look into how hard it would be to write a proper SMC service handler to do this (since ATF does have a framework for such things), but concluded it would be more than 10 minutes' work and just cheated instead. It's certainly something which could be raised with the firmware folks. Robin. > > Thanks, > Stuart >
Re: [PATCH 1/2] armv8: aarch32: Execute 32-bit Linux for LayerScape platforms
On 23/09/16 15:44, Arnd Bergmann wrote: > On Friday, September 23, 2016 3:24:12 PM CEST Robin Murphy wrote: >> On 23/09/16 15:01, Stuart Yoder wrote: >> Otherwise you can >> always simply run your own shim at EL2 to drive an AArch32 EL1 (it'll >> need to trap and translate subsequent SMC calls for e.g. PSCI). >> >>> If there is such a requirement, it's something begging for standardization. >>> Doesn't make sense for multiple divergent approaches for switching from >>> aarch64/EL2 to aarch32/EL2. >> >> Perhaps - I did briefly look into how hard it would be to write a proper >> SMC service handler to do this (since ATF does have a framework for such >> things), but concluded it would be more than 10 minutes' work and just >> cheated instead. It's certainly something which could be raised with the >> firmware folks. > > If we end up allowing all arm64 platforms to be enabled in arch/arm, > we could perhaps create a generic implementation that does both of > those things, i.e. > > - Take the arm32 kernel Image or zImage file, wrap it inside of a binary > that implements the arm64 boot protocol. > - When that starts up, try to use the new PSCI call to jump into > the arm32 kernel > - If PSCI failed and we are running in EL2, implement the shim > and start the arm32 kernel in EL1 mode Really, though, the firmware call thing is an incredibly niche use-case. Beyond development, the only real benefit of starting an AArch32 kernel in Hyp is that you can run AArch32 KVM guests, which you can do equally well (if not better) under an AArch64 kernel. The standalone EL2 shim probably has legs, though, if the one currently found in arch/arm64 is too heavyweight ;) My first version was essentially that, albeit implemented by a load of .inst directives in head.S. Robin. > > Arnd >
Re: [PATCH] ARM: decompressor: reset ttbcr fields to use TTBR0 on ARMv7
On 27/09/16 13:16, Srinivas Ramana wrote: > Hi Robin, Sorry! This one had slipped my mind already... > On 09/13/2016 08:22 PM, Srinivas Ramana wrote: >> On 09/12/2016 11:21 PM, Robin Murphy wrote: >>> On 12/09/16 07:57, Srinivas Ramana wrote: >>>> If the bootloader uses the long descriptor format and jumps to >>>> kernel decompressor code, TTBCR may not be in a right state. >>>> Before enabling the MMU, it is required to clear the TTBCR.PD0 >>>> field to use TTBR0 for translation table walks. >>>> >>>> The 'commit dbece45894d3a ("ARM: 7501/1: decompressor: >>>> reset ttbcr for VMSA ARMv7 cores")' does the reset of TTBCR.N, but >>>> doesn't consider all the bits for the size of TTBCR.N. >>>> >>>> Clear TTBCR.PD0 field and reset all the three bits of TTBCR.N to >>>> indicate the use of TTBR0 and the correct base address width. >>>> >>>> Signed-off-by: Srinivas Ramana >>>> --- >>>> arch/arm/boot/compressed/head.S | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/arch/arm/boot/compressed/head.S >>>> b/arch/arm/boot/compressed/head.S >>>> index af11c2f8f3b7..fc6d541549a2 100644 >>>> --- a/arch/arm/boot/compressed/head.S >>>> +++ b/arch/arm/boot/compressed/head.S >>>> @@ -779,7 +779,7 @@ __armv7_mmu_cache_on: >>>> orrner0, r0, #1@ MMU enabled >>>> movner1, #0xfffd@ domain 0 = client >>>> bic r6, r6, #1 << 31@ 32-bit translation system >>> >>> Hmm, if TTBCR.EAE _was_ actually set... >>> >>>> -bic r6, r6, #3 << 0 @ use only ttbr0 >>>> +bic r6, r6, #(7 << 0) | (1 << 4)@ use only ttbr0 >>>> mcrnep15, 0, r3, c2, c0, 0@ load page table pointer >>>> mcrnep15, 0, r1, c3, c0, 0@ load domain access >>>> control >>>> mcrne p15, 0, r6, c2, c0, 2 @ load ttb control >>> >>> ...then strictly the TLBIALL needs to happen after the ISB following >>> this update. Otherwise per B3.10.2 of DDI406C.c I think we might be into >>> unpredictable territory - i.e. if the TLB happens to treat long- and >>> short-descriptor entries differently then the TLBI beforehand (with EAE >>> set) may be at liberty to only discard long-descriptor entries and leave >>> bogus short-descriptor entries sitting around. >> Yes, it seems this has to be taken care of, along with resetting >> TTBCR.PD0 and TTBCR.N. Do you say that this needs to be done in the same >> patch or a different one? >>> >>> In other words, something like (completely untested): >>> >>> ---8<--- >>> diff --git a/arch/arm/boot/compressed/head.S >>> b/arch/arm/boot/compressed/head.S >>> index af11c2f8f3b7..536b7781024a 100644 >>> --- a/arch/arm/boot/compressed/head.S >>> +++ b/arch/arm/boot/compressed/head.S >>> @@ -764,7 +764,6 @@ __armv7_mmu_cache_on: >>> mov r0, #0 >>> mcr p15, 0, r0, c7, c10, 4 @ drain write buffer >>> tst r11, #0xf @ VMSA >>> - mcrne p15, 0, r0, c8, c7, 0 @ flush I,D TLBs >> >> Shouldn't this be still there for the same reason you explained above? I >> mean to discard the long descriptor entries when EAE was 1 (before we >> reset it). >>> #endif >>> mrc p15, 0, r0, c1, c0, 0 @ read control reg >>> bic r0, r0, #1 << 28@ clear SCTLR.TRE >>> @@ -783,8 +782,11 @@ __armv7_mmu_cache_on: >>> mcrne p15, 0, r3, c2, c0, 0 @ load page table >>> pointer >>> mcrne p15, 0, r1, c3, c0, 0 @ load domain access >>> control >>> mcrne p15, 0, r6, c2, c0, 2 @ load ttb control >>> -#endif >>> mcr p15, 0, r0, c7, c5, 4 @ ISB >>> + mcrne p15, 0, r0, c8, c7, 0 @ flush I,D TLBs >>> +#else >>> + mcr p15, 0, r0, c7, c5, 4 @ ISB >>> +#endif >>> mcr p15, 0, r0, c1, c0, 0 @ load control register >>> mrc p15, 0, r0, c1, c0, 0 @ and read it back >>> ---8<--- >>> >>> Robin. >>> >> i have tested this change (flush I, D, TLBs after TTB control is >> written) and don't see any issue. But on my setup decompression is >> successful even without this (probably not hitting the case in >> discussion). >> >> >> Thanks, >> -- Srinivas R >> > > Would like your feedback on the above. Can we get the TTBCR fix merged > first?(will send final patch with Russell Kings comments fixed) > > For testing the TLB flush change we may have to check if we can create a > failure case. Yeah, the TLBI being in the wrong place is a separate, pre-existing problem; as far as this patch goes, it does what it claims to do, and matches what the ARMv7 (and ARMv6) docs say, so: Acked-by: Robin Murphy > > Thanks, > -- Srinivas R >
Re: [PATCH v2] arm64: mm: convert __dma_* routines to use start, size
On 28/07/16 01:08, kwangwoo@sk.com wrote: >> -Original Message- >> From: Robin Murphy [mailto:robin.mur...@arm.com] >> Sent: Wednesday, July 27, 2016 6:56 PM >> To: 이광우(LEE KWANGWOO) MS SW; Russell King - ARM Linux; Catalin Marinas; Will >> Deacon; Mark Rutland; >> linux-arm-ker...@lists.infradead.org >> Cc: 김현철(KIM HYUNCHUL) MS SW; linux-kernel@vger.kernel.org; 정우석(CHUNG WOO >> SUK) MS SW >> Subject: Re: [PATCH v2] arm64: mm: convert __dma_* routines to use start, >> size >> >> On 27/07/16 02:55, kwangwoo@sk.com wrote: >> [...] >>>>> /* >>>>> - * __dma_clean_range(start, end) >>>>> + * __dma_clean_area(start, size) >>>>> * - start - virtual start address of region >>>>> - * - end - virtual end address of region >>>>> + * - size- size in question >>>>> */ >>>>> -__dma_clean_range: >>>>> - dcache_line_size x2, x3 >>>>> - sub x3, x2, #1 >>>>> - bic x0, x0, x3 >>>>> -1: >>>>> +__dma_clean_area: >>>>> alternative_if_not ARM64_WORKAROUND_CLEAN_CACHE >>>>> - dc cvac, x0 >>>>> + dcache_by_line_op cvac, sy, x0, x1, x2, x3 >>>>> alternative_else >>>>> - dc civac, x0 >>>>> + dcache_by_line_op civac, sy, x0, x1, x2, x3 >>>> >>>> dcache_by_line_op is a relatively large macro - is there any way we can >>>> still apply the alternative to just the one instruction which needs it, >>>> as opposed to having to patch the entire mostly-identical routine? >>> >>> I agree with your opinion. Then, how do you think about using CONFIG_* >>> options >>> like below? I think that alternative_* macros seems to keep the space for >>> unused instruction. Is it necessary? Please, share your thought about the >>> space. Thanks! >>> >>> +__dma_clean_area: >>> +#ifdefined(CONFIG_ARM64_ERRATUM_826319) || \ >>> + defined(CONFIG_ARM64_ERRATUM_827319) || \ >>> + defined(CONFIG_ARM64_ERRATUM_824069) || \ >>> + defined(CONFIG_ARM64_ERRATUM_819472) >>> + dcache_by_line_op civac, sy, x0, x1, x2, x3 >>> +#else >>> + dcache_by_line_op cvac, sy, x0, x1, x2, x3 >>> +#endif >> >> That's not ideal, because we still only really want to use the >> workaround if we detect a CPU which needs it, rather than baking it in >> at compile time. I was thinking more along the lines of pushing the >> alternative down into dcache_by_line_op, something like the idea below >> (compile-tested only, may not actually be viable). > > OK. Using the capability of CPU features seems to be preferred. > >> Robin. >> >> -8<- >> diff --git a/arch/arm64/include/asm/assembler.h >> b/arch/arm64/include/asm/assembler.h >> index 10b017c4bdd8..1c005c90387e 100644 >> --- a/arch/arm64/include/asm/assembler.h >> +++ b/arch/arm64/include/asm/assembler.h >> @@ -261,7 +261,16 @@ lr .reqx30 // link register >> add \size, \kaddr, \size >> sub \tmp2, \tmp1, #1 >> bic \kaddr, \kaddr, \tmp2 >> -9998: dc \op, \kaddr >> +9998: >> +.ifeqs "\op", "cvac" >> +alternative_if_not ARM64_WORKAROUND_CLEAN_CACHE >> +dc cvac, \kaddr >> +alternative_else >> +dc civac, \kaddr >> +alternative_endif >> +.else >> +dc \op, \kaddr >> +.endif >> add \kaddr, \kaddr, \tmp1 >> cmp \kaddr, \size >> b.lo9998b > > I agree that it looks not viable because it makes the macro bigger and > conditional specifically with CVAC op. Actually, having had a poke around in the resulting disassembly, it looks like this does work correctly. I can't think of a viable reason for the whole dcache_by_line_op to ever be wrapped in yet another alternative (which almost certainly would go horribly wrong), and it would mean that any other future users are automatically covered for free. It's just horrible to look at at the source level. Robin. > > Then.. if the number of the usage of alternative_* macros for erratum is > few (just one in this case for cache clean), I think only small change like > below seems to be optimal and there is no need to create a variant macro of > dcache_cache_by_line_op. How do you think about it? > > /* > - * __dma_clean_range(start,
Re: [PATCH v2] arm64: mm: convert __dma_* routines to use start, size
On 01/08/16 00:45, kwangwoo@sk.com wrote: [...] -8<- diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index 10b017c4bdd8..1c005c90387e 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -261,7 +261,16 @@ lr.reqx30 // link register add \size, \kaddr, \size sub \tmp2, \tmp1, #1 bic \kaddr, \kaddr, \tmp2 -9998: dc \op, \kaddr +9998: + .ifeqs "\op", "cvac" +alternative_if_not ARM64_WORKAROUND_CLEAN_CACHE + dc cvac, \kaddr +alternative_else + dc civac, \kaddr +alternative_endif + .else + dc \op, \kaddr + .endif add \kaddr, \kaddr, \tmp1 cmp \kaddr, \size b.lo9998b >>> >>> I agree that it looks not viable because it makes the macro bigger and >>> conditional specifically with CVAC op. >> >> Actually, having had a poke around in the resulting disassembly, it >> looks like this does work correctly. I can't think of a viable reason >> for the whole dcache_by_line_op to ever be wrapped in yet another >> alternative (which almost certainly would go horribly wrong), and it >> would mean that any other future users are automatically covered for >> free. It's just horrible to look at at the source level. > > Then, Are you going to send a patch for this? Or should I include this change? I'll do a bit more testing just to make sure, then spin a separate patch (and try to remember to keep you on CC..) Robin.
Re: [PATCH v2] arm64: mm: convert __dma_* routines to use start, size
On 01/08/16 14:36, Robin Murphy wrote: > On 01/08/16 00:45, kwangwoo@sk.com wrote: > [...] >>>>> -8<- >>>>> diff --git a/arch/arm64/include/asm/assembler.h >>>>> b/arch/arm64/include/asm/assembler.h >>>>> index 10b017c4bdd8..1c005c90387e 100644 >>>>> --- a/arch/arm64/include/asm/assembler.h >>>>> +++ b/arch/arm64/include/asm/assembler.h >>>>> @@ -261,7 +261,16 @@ lr .reqx30 // link register >>>>> add \size, \kaddr, \size >>>>> sub \tmp2, \tmp1, #1 >>>>> bic \kaddr, \kaddr, \tmp2 >>>>> -9998:dc \op, \kaddr >>>>> +9998: >>>>> + .ifeqs "\op", "cvac" >>>>> +alternative_if_not ARM64_WORKAROUND_CLEAN_CACHE >>>>> + dc cvac, \kaddr >>>>> +alternative_else >>>>> + dc civac, \kaddr >>>>> +alternative_endif >>>>> + .else >>>>> + dc \op, \kaddr >>>>> + .endif >>>>> add \kaddr, \kaddr, \tmp1 >>>>> cmp \kaddr, \size >>>>> b.lo9998b >>>> >>>> I agree that it looks not viable because it makes the macro bigger and >>>> conditional specifically with CVAC op. >>> >>> Actually, having had a poke around in the resulting disassembly, it >>> looks like this does work correctly. I can't think of a viable reason >>> for the whole dcache_by_line_op to ever be wrapped in yet another >>> alternative (which almost certainly would go horribly wrong), and it >>> would mean that any other future users are automatically covered for >>> free. It's just horrible to look at at the source level. >> >> Then, Are you going to send a patch for this? Or should I include this >> change? > > I'll do a bit more testing just to make sure, then spin a separate patch > (and try to remember to keep you on CC..) ...and said patch turns out to conflict with 823066d9edcd, since I hadn't realised it's already been fixed! So you can go ahead with the dcache_by_line_op cleanup as well, just rebase onto arm64/for-next/core (or linux/master, since it's been pulled already). Robin.
[PATCH v6.1] iommu/dma: Add support for mapping MSIs
When an MSI doorbell is located downstream of an IOMMU, attaching devices to a DMA ops domain and switching on translation leads to a rude shock when their attempt to write to the physical address returned by the irqchip driver faults (or worse, writes into some already-mapped buffer) and no interrupt is forthcoming. Address this by adding a hook for relevant irqchip drivers to call from their compose_msi_msg() callback, to swizzle the physical address with an appropriatly-mapped IOVA for any device attached to one of our DMA ops domains. CC: Thomas Gleixner CC: Jason Cooper CC: Marc Zyngier CC: linux-kernel@vger.kernel.org Signed-off-by: Robin Murphy --- - Rework map_page() helper function plus insane lookup logic into straightforward get_page() helper - Use phys_addr_t to further simplify address matching - Fix the bit where I neglected to actually round the doorbell address to a page boundary (oops!) - Make the locking hardirq-safe to satisfy lockdep --- drivers/iommu/dma-iommu.c| 136 ++- drivers/irqchip/irq-gic-v2m.c| 3 + drivers/irqchip/irq-gic-v3-its.c | 3 + include/linux/dma-iommu.h| 9 +++ 4 files changed, 136 insertions(+), 15 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 00c8a08d56e7..4329d18080cf 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -25,10 +25,28 @@ #include #include #include +#include #include #include #include +struct iommu_dma_msi_page { + struct list_headlist; + dma_addr_t iova; + phys_addr_t phys; +}; + +struct iommu_dma_cookie { + struct iova_domain iovad; + struct list_headmsi_page_list; + spinlock_t msi_lock; +}; + +static inline struct iova_domain *cookie_iovad(struct iommu_domain *domain) +{ + return &((struct iommu_dma_cookie *)domain->iova_cookie)->iovad; +} + int iommu_dma_init(void) { return iova_cache_get(); @@ -43,15 +61,19 @@ int iommu_dma_init(void) */ int iommu_get_dma_cookie(struct iommu_domain *domain) { - struct iova_domain *iovad; + struct iommu_dma_cookie *cookie; if (domain->iova_cookie) return -EEXIST; - iovad = kzalloc(sizeof(*iovad), GFP_KERNEL); - domain->iova_cookie = iovad; + cookie = kzalloc(sizeof(*cookie), GFP_KERNEL); + if (!cookie) + return -ENOMEM; - return iovad ? 0 : -ENOMEM; + spin_lock_init(&cookie->msi_lock); + INIT_LIST_HEAD(&cookie->msi_page_list); + domain->iova_cookie = cookie; + return 0; } EXPORT_SYMBOL(iommu_get_dma_cookie); @@ -63,14 +85,20 @@ EXPORT_SYMBOL(iommu_get_dma_cookie); */ void iommu_put_dma_cookie(struct iommu_domain *domain) { - struct iova_domain *iovad = domain->iova_cookie; + struct iommu_dma_cookie *cookie = domain->iova_cookie; + struct iommu_dma_msi_page *msi, *tmp; - if (!iovad) + if (!cookie) return; - if (iovad->granule) - put_iova_domain(iovad); - kfree(iovad); + if (cookie->iovad.granule) + put_iova_domain(&cookie->iovad); + + list_for_each_entry_safe(msi, tmp, &cookie->msi_page_list, list) { + list_del(&msi->list); + kfree(msi); + } + kfree(cookie); domain->iova_cookie = NULL; } EXPORT_SYMBOL(iommu_put_dma_cookie); @@ -88,7 +116,7 @@ EXPORT_SYMBOL(iommu_put_dma_cookie); */ int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, u64 size) { - struct iova_domain *iovad = domain->iova_cookie; + struct iova_domain *iovad = cookie_iovad(domain); unsigned long order, base_pfn, end_pfn; if (!iovad) @@ -155,7 +183,7 @@ int dma_direction_to_prot(enum dma_data_direction dir, bool coherent) static struct iova *__alloc_iova(struct iommu_domain *domain, size_t size, dma_addr_t dma_limit) { - struct iova_domain *iovad = domain->iova_cookie; + struct iova_domain *iovad = cookie_iovad(domain); unsigned long shift = iova_shift(iovad); unsigned long length = iova_align(iovad, size) >> shift; @@ -171,7 +199,7 @@ static struct iova *__alloc_iova(struct iommu_domain *domain, size_t size, /* The IOVA allocator knows what we mapped, so just unmap whatever that was */ static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t dma_addr) { - struct iova_domain *iovad = domain->iova_cookie; + struct iova_domain *iovad = cookie_iovad(domain); unsigned long shift = iova_shift(iovad); unsigned long pfn = dma_addr >> shift; struct iova *iova = find_iova(iovad, pfn); @@ -294,7 +322,7 @@ struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp,
Re: [PATCH] usb: dwc3: host: inherit dma configuration from parent dev
On 07/09/16 10:55, Peter Chen wrote: [...] >> Regarding the DMA configuration that you mention in ci_hdrc_add_device(), >> I think we should replace >> >> pdev->dev.dma_mask = dev->dma_mask; >> pdev->dev.dma_parms = dev->dma_parms; >> dma_set_coherent_mask(&pdev->dev, dev->coherent_dma_mask); >> >> with of_dma_configure(), which has the chance to configure more than >> just those three, as the dma API might look into different aspects: >> >> - iommu specific configuration >> - cache coherency information >> - bus type >> - dma offset >> - dma_map_ops pointer >> >> We try to handle everything in of_dma_configure() at configuration >> time, and that would be the place to add anything else that we might >> need in the future. >> > > Yes, I agree with you, but just like Felipe mentioned, we also need to > consider PCI device, can we do something like gpiod_get_index does? Are > there any similar APIs like of_dma_configure for ACPI? Not yet, but Lorenzo has one in progress[1], primarily for the sake of abstracting away the IOMMU configuration. Robin. [1]:http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1209911.html
Re: [PATCH] pcie: qcom: add support to msm8996 PCIE controller
On 07/09/16 12:06, Srinivas Kandagatla wrote: > This patch adds support to msm8996/apq8096 pcie, MSM8996 supports > Gen 1/2, One lane, 3 pcie root-complex with support to MSI and > legacy interrupts and it conforms to PCI Express Base 2.1 specification. > > This patch adds post_init callback to qcom_pcie_ops, as this is pcie > pipe clocks and smmu configuration are only setup after the phy is > powered on. It also adds ltssm_enable callback as it is very much > different to other supported SOCs in the driver. > > Signed-off-by: Srinivas Kandagatla > --- > > I tested this patch along with phy driver patch on DB820c based on > APQ8096 on port A and port B using sata and ethernet controllers. > > Thanks > srini > > .../devicetree/bindings/pci/qcom,pcie.txt | 88 +++- > drivers/pci/host/pcie-qcom.c | 237 > - > 2 files changed, 318 insertions(+), 7 deletions(-) > > diff --git a/Documentation/devicetree/bindings/pci/qcom,pcie.txt > b/Documentation/devicetree/bindings/pci/qcom,pcie.txt > index 4059a6f..1ddfcb4 100644 > --- a/Documentation/devicetree/bindings/pci/qcom,pcie.txt > +++ b/Documentation/devicetree/bindings/pci/qcom,pcie.txt > @@ -92,6 +92,18 @@ > - "aux" Auxiliary (AUX) clock > - "bus_master" Master AXI clock > - "bus_slave" Slave AXI clock > + > +- clock-names: > + Usage: required for msm8996/apq8096 > + Value type: > + Definition: Should contain the following entries > + - "aux" Auxiliary (AUX) clock > + - "bus_master" Master AXI clock > + - "bus_slave" Slave AXI clock > + - "pipe"Pipe Clock driving internal logic. > + - "cfg" Configuration clk. > + - "snoc_axi"SNOC AXI clk > + - "cnoc_ahb"CNOC AHB clk > - resets: > Usage: required > Value type: > @@ -114,8 +126,15 @@ > Definition: Should contain the following entries > - "core" Core reset > > +- iommus: > + Usage: required for msm8996/apq8096 > + Value type: > + Definition: Should point to the respective IOMMU block with master port > + as argument, see Documentation/devicetree/bindings/iommu/ > + mediatek,iommu.txt for details. Qualcomm have cross-licensed the M4U!? :P In seriousness, though, it seems a bit odd to list "iommus" as a required property - unless the IOMMU is actually an integral part of the PCIe root complex being described (in which case it wants its own binding documenting as well), then the relationship is merely a circumstance of integration of those particular SoCs. > + > - power-domains: > - Usage: required for apq8084 > + Usage: required for apq8084 and msm8996/apq8096 > Value type: > Definition: A phandle and power domain specifier pair to the > power domain which is responsible for collapsing > @@ -137,6 +156,11 @@ > Definition: A phandle to the analog power supply for IC which generates > reference clock > > +- vdda_1p8-supply: > + Usage: required for msm8996/apq8096 > + Value type: > + Definition: A phandle to the analog power supply for PCIE_1P8 > + > - phys: > Usage: required for apq8084 > Value type: > @@ -231,3 +255,65 @@ > pinctrl-0 = <&pcie0_pins_default>; > pinctrl-names = "default"; > }; > + > +* Example for apq8096: > + pcie1: qcom,pcie@00608000 { > + compatible = "qcom,pcie-msm8996", "snps,dw-pcie"; > + power-domains = <&gcc PCIE1_GDSC>; > + bus-range = <0x00 0xff>; > + num-lanes = <1>; > + > + status = "disabled"; > + > + reg = <0x00608000 0x2000>, > + <0x0d00 0xf1d>, > + <0x0d000f20 0xa8>, > + <0x0d10 0x10>; > + > + reg-names = "parf", "dbi", "elbi","config"; > + > + phys = <&pcie_phy 1>; > + phy-names = "pciephy"; > + > + #address-cells = <3>; > + #size-cells = <2>; > + ranges = <0x0100 0x0 0x0d20 0x0d20 0x0 0x10>, > + <0x0200 0x0 0x0d30 0x0d30 0x0 0xd0>; > + > + interrupts = ; > + interrupt-names = "msi"; > + #interrupt-cells = <1>; > + interrupt-map-mask = <0 0 0 0x7>; > + interrupt-map = <0 0 0 1 &intc 0 272 IRQ_TYPE_LEVEL_HIGH>, /* > int_a */ > + <0 0 0 2 &intc 0 273 IRQ_TYPE_LEVEL_HIGH>, /* > int_b */ > + <0 0 0 3 &intc 0 274 IRQ_TYPE_LEVEL_HIGH>, /* > int_c */ > + <0 0 0 4 &intc 0 275 IRQ_TYPE_LEVEL_HIGH>; /* > int_d */ > + > + pinctrl-names = "d
Re: [PATCH v5 05/14] drivers: iommu: make iommu_fwspec OF agnostic
Hi Lorenzo, On 09/09/16 15:23, Lorenzo Pieralisi wrote: > The iommu_fwspec structure, used to hold per device iommu configuration > data is not OF specific and therefore can be moved to a generic > and OF independent compilation unit. > > In particular, the iommu_fwspec handling hinges on the device_node > pointer to identify the IOMMU device associated with the iommu_fwspec > structure, that is easily converted to a more generic fwnode_handle > pointer that can cater for OF and non-OF (ie ACPI) systems. > > Create the files and related Kconfig entry to decouple iommu_fwspec > structure from the OF iommu kernel layer. > > Given that the current iommu_fwspec implementation relies on > the arch specific struct device.archdata.iommu field in its > implementation, by making the code standalone and independent > of the OF layer this patch makes sure that the iommu_fwspec > kernel code can be selected only on arches implementing the > struct device.archdata.iommu field by adding an explicit > arch dependency in its config entry. > > Current drivers using the iommu_fwspec for streamid translation > are converted to the new iommu_fwspec API by simply converting > the device_node to its fwnode_handle pointer. > > Signed-off-by: Lorenzo Pieralisi > Cc: Will Deacon > Cc: Hanjun Guo > Cc: Robin Murphy > Cc: Joerg Roedel > --- > drivers/iommu/Kconfig| 4 ++ > drivers/iommu/Makefile | 1 + > drivers/iommu/arm-smmu-v3.c | 16 -- > drivers/iommu/arm-smmu.c | 17 +++--- > drivers/iommu/iommu-fwspec.c | 126 > +++ > drivers/iommu/of_iommu.c | 93 > include/linux/iommu-fwspec.h | 70 > include/linux/of_iommu.h | 38 - > 8 files changed, 234 insertions(+), 131 deletions(-) > create mode 100644 drivers/iommu/iommu-fwspec.c > create mode 100644 include/linux/iommu-fwspec.h > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig > index 101cb17..873bd41 100644 > --- a/drivers/iommu/Kconfig > +++ b/drivers/iommu/Kconfig > @@ -70,6 +70,10 @@ config OF_IOMMU > config HAVE_IOMMU_FWSPEC > bool > > +config IOMMU_FWSPEC > + def_bool y > + depends on IOMMU_API > + > # IOMMU-agnostic DMA-mapping layer > config IOMMU_DMA > bool > diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile > index 195f7b9..bbbc6d6 100644 > --- a/drivers/iommu/Makefile > +++ b/drivers/iommu/Makefile > @@ -6,6 +6,7 @@ obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o > obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o > obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o > obj-$(CONFIG_IOMMU_IOVA) += iova.o > +obj-$(CONFIG_IOMMU_FWSPEC) += iommu-fwspec.o > obj-$(CONFIG_OF_IOMMU) += of_iommu.o > obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o > obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c > index be293b5..a7e9de9 100644 > --- a/drivers/iommu/arm-smmu-v3.c > +++ b/drivers/iommu/arm-smmu-v3.c > @@ -25,6 +25,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -1720,13 +1721,18 @@ static struct platform_driver arm_smmu_driver; > > static int arm_smmu_match_node(struct device *dev, void *data) > { > - return dev->of_node == data; > + struct fwnode_handle *fwnode; > + > + fwnode = dev->of_node ? &dev->of_node->fwnode : dev->fwnode; > + > + return fwnode == data; > } Maybe we should hoist the dev_fwnode() helper from property.c up to property.h so we can just have "return dev_fwnode(dev) == data;" here? > > -static struct arm_smmu_device *arm_smmu_get_by_node(struct device_node *np) > +static struct arm_smmu_device * > +arm_smmu_get_by_fwnode(struct fwnode_handle *fwnode) > { > struct device *dev = driver_find_device(&arm_smmu_driver.driver, NULL, > - np, arm_smmu_match_node); > + fwnode, arm_smmu_match_node); > put_device(dev); > return dev ? dev_get_drvdata(dev) : NULL; > } > @@ -1762,7 +1768,7 @@ static int arm_smmu_add_device(struct device *dev) > master = fwspec->iommu_priv; > smmu = master->smmu; > } else { > - smmu = arm_smmu_get_by_node(fwspec->iommu_np); > + smmu = arm_smmu_get_by_fwnode(fwspec->iommu_fwnode); > if (!smmu) > return -ENODEV; > master = kzalloc(sizeof(*master), GFP_KERNEL); > @@
Re: [PATCH v5 06/14] drivers: acpi: implement acpi_dma_configure
On 09/09/16 15:23, Lorenzo Pieralisi wrote: > On DT based systems, the of_dma_configure() API implements DMA > configuration for a given device. On ACPI systems an API equivalent to > of_dma_configure() is missing which implies that it is currently not > possible to set-up DMA operations for devices through the ACPI generic > kernel layer. > > This patch fills the gap by introducing acpi_dma_configure/deconfigure() > calls that for now are just wrappers around arch_setup_dma_ops() and > arch_teardown_dma_ops() and also updates ACPI and PCI core code to use > the newly introduced acpi_dma_configure/acpi_dma_deconfigure functions. > > The DMA range size passed to arch_setup_dma_ops() is sized according > to the device coherent_dma_mask (starting at address 0x0), mirroring the > DT probing path behaviour when a dma-ranges property is not provided > for the device being probed; this changes the current arch_setup_dma_ops() > call parameters in the ACPI probing case, but since arch_setup_dma_ops() > is a NOP on all architectures but ARM/ARM64 this patch does not change > the current kernel behaviour on them. > > Signed-off-by: Lorenzo Pieralisi > Acked-by: Bjorn Helgaas [pci] > Cc: Bjorn Helgaas > Cc: Robin Murphy > Cc: Tomasz Nowicki > Cc: Joerg Roedel > Cc: "Rafael J. Wysocki" > --- > drivers/acpi/glue.c | 4 ++-- > drivers/acpi/scan.c | 24 > drivers/pci/probe.c | 3 +-- > include/acpi/acpi_bus.h | 2 ++ > include/linux/acpi.h| 5 + > 5 files changed, 34 insertions(+), 4 deletions(-) > > diff --git a/drivers/acpi/glue.c b/drivers/acpi/glue.c > index 5ea5dc2..f8d6564 100644 > --- a/drivers/acpi/glue.c > +++ b/drivers/acpi/glue.c > @@ -227,8 +227,7 @@ int acpi_bind_one(struct device *dev, struct acpi_device > *acpi_dev) > > attr = acpi_get_dma_attr(acpi_dev); > if (attr != DEV_DMA_NOT_SUPPORTED) > - arch_setup_dma_ops(dev, 0, 0, NULL, > -attr == DEV_DMA_COHERENT); > + acpi_dma_configure(dev, attr); > > acpi_physnode_link_name(physical_node_name, node_id); > retval = sysfs_create_link(&acpi_dev->dev.kobj, &dev->kobj, > @@ -251,6 +250,7 @@ int acpi_bind_one(struct device *dev, struct acpi_device > *acpi_dev) > return 0; > > err: > + acpi_dma_deconfigure(dev); > ACPI_COMPANION_SET(dev, NULL); > put_device(dev); > put_device(&acpi_dev->dev); > diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c > index e878fc7..9614232 100644 > --- a/drivers/acpi/scan.c > +++ b/drivers/acpi/scan.c > @@ -1370,6 +1370,30 @@ enum dev_dma_attr acpi_get_dma_attr(struct acpi_device > *adev) > return DEV_DMA_NON_COHERENT; > } > > +/** > + * acpi_dma_configure - Set-up DMA configuration for the device. > + * @dev: The pointer to the device > + * @attr: device dma attributes > + */ > +void acpi_dma_configure(struct device *dev, enum dev_dma_attr attr) > +{ > + /* > + * Assume dma valid range starts at 0 and covers the whole > + * coherent_dma_mask. > + */ > + arch_setup_dma_ops(dev, 0, dev->coherent_dma_mask + 1, NULL, > +attr == DEV_DMA_COHERENT); This looks a bit hairy - if we're setting up the DMA configuration at this point can we really always rely on the device already having a valid DMA mask? I think it would make sense to at least check, and apply the standard default 32-bit mask if not. > +} > + > +/** > + * acpi_dma_deconfigure - Tear-down DMA configuration for the device. > + * @dev: The pointer to the device > + */ > +void acpi_dma_deconfigure(struct device *dev) > +{ > + arch_teardown_dma_ops(dev); > +} As touched upon in the dwc-usb3 thread, is it worth exporting these for the benefit of modular bus/glue code, to match of_dma_configure()? > + > static void acpi_init_coherency(struct acpi_device *adev) > { > unsigned long long cca = 0; > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index 93f280d..e96d482 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -1734,8 +1734,7 @@ static void pci_dma_configure(struct pci_dev *dev) > if (attr == DEV_DMA_NOT_SUPPORTED) > dev_warn(&dev->dev, "DMA not supported.\n"); > else > - arch_setup_dma_ops(&dev->dev, 0, 0, NULL, > -attr == DEV_DMA_COHERENT); > + acpi_dma_configure(&dev->dev, attr); > } What about non-PCI stuff? I see there's at least an acpi_create_platform_device() wh
Re: [PATCH v5 07/14] drivers: acpi: iort: add support for ARM SMMU platform devices creation
On 09/09/16 15:23, Lorenzo Pieralisi wrote: > In ARM ACPI systems, IOMMU components are specified through static > IORT table entries. In order to create platform devices for the > corresponding ARM SMMU components, IORT kernel code should be made > able to parse IORT table entries and create platform devices > dynamically. > > This patch adds the generic IORT infrastructure required to create > platform devices for ARM SMMUs. > > ARM SMMU versions have different resources requirement therefore this > patch also introduces an IORT specific structure (ie iort_iommu_config) > that contains hooks (to be defined when the corresponding ARM SMMU > driver support is added to the kernel) to be used to define the > platform devices names, init the IOMMUs, count their resources and > finally initialize them. > > Signed-off-by: Lorenzo Pieralisi > Cc: Hanjun Guo > Cc: Tomasz Nowicki > Cc: "Rafael J. Wysocki" > --- > drivers/acpi/arm64/iort.c | 131 > ++ > 1 file changed, 131 insertions(+) > > diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c > index b89b3d3..e0a9b16 100644 > --- a/drivers/acpi/arm64/iort.c > +++ b/drivers/acpi/arm64/iort.c > @@ -22,6 +22,7 @@ > #include > #include > #include > +#include > #include > > struct iort_its_msi_chip { > @@ -424,6 +425,135 @@ struct irq_domain *iort_get_device_domain(struct device > *dev, u32 req_id) > return irq_find_matching_fwnode(handle, DOMAIN_BUS_PCI_MSI); > } > > +struct iort_iommu_config { > + const char *name; > + int (*iommu_init)(struct acpi_iort_node *node); > + bool (*iommu_is_coherent)(struct acpi_iort_node *node); > + int (*iommu_count_resources)(struct acpi_iort_node *node); > + void (*iommu_init_resources)(struct resource *res, > + struct acpi_iort_node *node); > +}; > + > +static __init > +const struct iort_iommu_config *iort_get_iommu_cfg(struct acpi_iort_node > *node) > +{ > + return NULL; > +} > + > +/** > + * iort_add_smmu_platform_device() - Allocate a platform device for SMMU > + * @fwnode: IORT node associated fwnode handle > + * @node: Pointer to SMMU ACPI IORT node > + * > + * Returns: 0 on success, <0 failure > + */ > +static int __init iort_add_smmu_platform_device(struct fwnode_handle *fwnode, > + struct acpi_iort_node *node) > +{ > + struct platform_device *pdev; > + struct resource *r; > + enum dev_dma_attr attr; > + int ret, count; > + const struct iort_iommu_config *ops = iort_get_iommu_cfg(node); > + > + if (!ops) > + return -ENODEV; > + > + pdev = platform_device_alloc(ops->name, PLATFORM_DEVID_AUTO); > + if (!pdev) > + return PTR_ERR(pdev); > + > + count = ops->iommu_count_resources(node); > + > + r = kcalloc(count, sizeof(*r), GFP_KERNEL); > + if (!r) { > + ret = -ENOMEM; > + goto dev_put; > + } > + > + ops->iommu_init_resources(r, node); > + > + ret = platform_device_add_resources(pdev, r, count); > + /* > + * Resources are duplicated in platform_device_add_resources, > + * free their allocated memory > + */ > + kfree(r); > + > + if (ret) > + goto dev_put; > + > + /* > + * Add a copy of IORT node pointer to platform_data to > + * be used to retrieve IORT data information. > + */ > + ret = platform_device_add_data(pdev, &node, sizeof(node)); > + if (ret) > + goto dev_put; > + > + pdev->dev.dma_mask = kmalloc(sizeof(*pdev->dev.dma_mask), GFP_KERNEL); > + if (!pdev->dev.dma_mask) { > + ret = -ENOMEM; > + goto dev_put; > + } Since this is exclusively for creating SMMUs, and we know they should never have weird shenanigans going on requiring different masks, I'd be inclined to just point dev.dma_mask at dev.coherent_dma_mask and be done with it. > + > + pdev->dev.fwnode = fwnode; > + > + /* > + * Set default dma mask value for the table walker, > + * to be overridden on probing with correct value. > + */ > + *pdev->dev.dma_mask = DMA_BIT_MASK(32); > + pdev->dev.coherent_dma_mask = *pdev->dev.dma_mask; > + > + attr = ops->iommu_is_coherent(node) ? > + DEV_DMA_COHERENT : DEV_DMA_NON_COHERENT; > + > + /* Configure DMA for the page table walker */ > + acpi_dma_configure(&pdev->dev, attr); Oh look, some more code which would be simpler if acpi_dma_configure() set the default mask itself ;) Robin. > + > + ret = platform_device_add(pdev); > + if (ret) > + goto dma_deconfigure; > + > + return 0; > + > +dma_deconfigure: > + acpi_dma_deconfigure(&pdev->dev); > + kfree(pdev->dev.dma_mask); > + > +dev_put: > + platform_device_put(pdev); > + > + return ret; > +} > + > +static acpi_status __init iort_match_iommu_callback(struct acpi_iort_n
Re: [PATCH v5 09/14] drivers: iommu: arm-smmu-v3: add IORT configuration
On 09/09/16 15:23, Lorenzo Pieralisi wrote: > In ACPI bases systems, in order to be able to create platform > devices and initialize them for ARM SMMU v3 components, the IORT > kernel implementation requires a set of static functions to be > used by the IORT kernel layer to configure platform devices for > ARM SMMU v3 components. > > Add static configuration functions to the IORT kernel layer for > the ARM SMMU v3 components, so that the ARM SMMU v3 driver can > initialize its respective platform device by relying on the IORT > kernel infrastructure and by adding a corresponding ACPI device > early probe section entry. > > Signed-off-by: Lorenzo Pieralisi > Cc: Will Deacon > Cc: Robin Murphy > Cc: Joerg Roedel > --- > drivers/acpi/arm64/iort.c | 103 > +++- > drivers/iommu/arm-smmu-v3.c | 95 +++- > 2 files changed, 195 insertions(+), 3 deletions(-) > > diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c > index e0a9b16..a2ad102 100644 > --- a/drivers/acpi/arm64/iort.c > +++ b/drivers/acpi/arm64/iort.c > @@ -425,6 +425,95 @@ struct irq_domain *iort_get_device_domain(struct device > *dev, u32 req_id) > return irq_find_matching_fwnode(handle, DOMAIN_BUS_PCI_MSI); > } > > +static void __init acpi_iort_register_irq(int hwirq, const char *name, > + int trigger, > + struct resource *res) > +{ > + int irq = acpi_register_gsi(NULL, hwirq, trigger, > + ACPI_ACTIVE_HIGH); > + > + if (irq < 0) { irq <= 0 ? > + pr_err("could not register gsi hwirq %d name [%s]\n", hwirq, > + name); > + return; > + } > + > + res->start = irq; > + res->end = irq; > + res->flags = IORESOURCE_IRQ; > + res->name = name; > +} > + > +static int __init arm_smmu_v3_count_resources(struct acpi_iort_node *node) > +{ > + struct acpi_iort_smmu_v3 *smmu; > + /* Always present mem resource */ > + int num_res = 1; > + > + /* Retrieve SMMUv3 specific data */ > + smmu = (struct acpi_iort_smmu_v3 *)node->node_data; > + > + if (smmu->event_gsiv) > + num_res++; > + > + if (smmu->pri_gsiv) > + num_res++; > + > + if (smmu->gerr_gsiv) > + num_res++; > + > + if (smmu->sync_gsiv) > + num_res++; > + > + return num_res; > +} > + > +static void __init arm_smmu_v3_init_resources(struct resource *res, > + struct acpi_iort_node *node) > +{ > + struct acpi_iort_smmu_v3 *smmu; > + int num_res = 0; > + > + /* Retrieve SMMUv3 specific data */ > + smmu = (struct acpi_iort_smmu_v3 *)node->node_data; > + > + res[num_res].start = smmu->base_address; > + res[num_res].end = smmu->base_address + SZ_128K - 1; > + res[num_res].flags = IORESOURCE_MEM; > + > + num_res++; > + > + if (smmu->event_gsiv) > + acpi_iort_register_irq(smmu->event_gsiv, "eventq", > +ACPI_EDGE_SENSITIVE, > +&res[num_res++]); > + > + if (smmu->pri_gsiv) > + acpi_iort_register_irq(smmu->pri_gsiv, "priq", > +ACPI_EDGE_SENSITIVE, > +&res[num_res++]); > + > + if (smmu->gerr_gsiv) > + acpi_iort_register_irq(smmu->gerr_gsiv, "gerror", > +ACPI_EDGE_SENSITIVE, > +&res[num_res++]); > + > + if (smmu->sync_gsiv) > + acpi_iort_register_irq(smmu->sync_gsiv, "cmdq-sync", > +ACPI_EDGE_SENSITIVE, > +&res[num_res++]); > +} > + > +static bool __init arm_smmu_v3_is_coherent(struct acpi_iort_node *node) > +{ > + struct acpi_iort_smmu_v3 *smmu; > + > + /* Retrieve SMMUv3 specific data */ > + smmu = (struct acpi_iort_smmu_v3 *)node->node_data; > + > + return smmu->flags & ACPI_IORT_SMMU_V3_COHACC_OVERRIDE; > +} > + > struct iort_iommu_config { > const char *name; > int (*iommu_init)(struct acpi_iort_node *node); > @@ -434,10 +523,22 @@ struct iort_iommu_config { >struct acpi_iort_node *node); > }; > > +stati
[PATCH] of/platform: Initialise dev->fwnode appropriately
Whilst we're some of the way towards a universal firmware property interface, drivers which deal with both OF and ACPI probing end up having to do things like this: dev->of_node ? &dev->of_node->fwnode : dev->fwnode This seems unnecessary, when the OF code could instead simply fill in the device's fwnode when binding the of_node, and let the drivers use dev->fwnode either way. Let's give it a go and see what falls out. Signed-off-by: Robin Murphy --- drivers/of/platform.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/of/platform.c b/drivers/of/platform.c index f39ccd5aa701..f811d2796437 100644 --- a/drivers/of/platform.c +++ b/drivers/of/platform.c @@ -142,6 +142,7 @@ struct platform_device *of_device_alloc(struct device_node *np, } dev->dev.of_node = of_node_get(np); + dev->dev.fwnode = &np->fwnode; dev->dev.parent = parent ? : &platform_bus; if (bus_id) @@ -241,6 +242,7 @@ static struct amba_device *of_amba_device_create(struct device_node *node, /* setup generic device info */ dev->dev.of_node = of_node_get(node); + dev->dev.fwnode = &node->fwnode; dev->dev.parent = parent ? : &platform_bus; dev->dev.platform_data = platform_data; if (bus_id) -- 2.8.1.dirty
Re: [PATCH] usb: dwc3: host: inherit dma configuration from parent dev
On 02/09/16 11:53, Felipe Balbi wrote: > > Hi, > > Arnd Bergmann writes: >> On Thursday, September 1, 2016 5:14:28 PM CEST Leo Li wrote: >>> >>> Hi Felipe and Arnd, >>> >>> It has been a while since the last response to this discussion, but we >>> haven't reached an agreement yet! Can we get to a conclusion on if it >>> is valid to create child platform device for abstraction purpose? If >>> yes, can this child device do DMA by itself? >> >> I'd say it's no problem for a driver to create child devices in order >> to represent different aspects of a device, but you should not rely on >> those devices working when used with the dma-mapping interfaces. > > heh, that looks like an excuse to me :-) > > This will always be a problem for e.g. MFD, for example. Are you saying > MFD child-devices shouldn't be allowed to do DMA? It becomes silly when > you read it that way, right? > >> This used to be simpler back when we could configure the kernel for >> only one SoC platform at a time, and the platforms could provide their >> own overrides for the dma-mapping interfaces. These days, we rely on > > right, so we have a very old regression that just took a complex driver > such as dwc3 to trigger ;-) > >> firmware or bootloader to describe various aspects of how DMA is done, > > there's no DMA description in DT. Every OF device gets the same 32-bit > DMA mask and that is, itself, wrong for several devices. Huh? There's only no DMA description in DT if the device can be assumed to be happy with the defaults. Anything else should be using "dma-ranges", "dma-coherent", etc. to describe non-default integration aspects. For devices with an inherent fixed addressing capability !=32 bits, then it's down to the driver to call dma_set_mask() appropriately to override the default 32-bit mask (which is not unique to OF-probed devices either). Sure, it's by no means a perfect API, but you're railing against untruths here. Robin. >> so you can't assume that passing a device without an of_node pointer >> or ACPI data into those functions will do the right thing. > > That's not the problem, however. We can very easily pass along > ACPI_COMPANION() to any platform_device we want, but that's not enough > because DMA-related bits are passed along with archdata; but archdata > isn't generic in any way. Some arches (like x86) _do_ use it for DMA, > but some don't. > > > > ___ > linux-arm-kernel mailing list > linux-arm-ker...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >
Re: [PATCH 10/11] soc: ti: knav_qmss_queue: use of_property_read_bool
Hi Julia, On 05/08/16 09:56, Julia Lawall wrote: > Use of_property_read_bool to check for the existence of a property. This caught my eye since Rob told me off for doing the same recently[1]. > The semantic patch that makes this change is as follows: > (http://coccinelle.lip6.fr/) > > // > @@ > expression e1,e2; > statement S2,S1; > @@ > - if (of_get_property(e1,e2,NULL)) > + if (of_property_read_bool(e1,e2)) > S1 else S2 > // > > Signed-off-by: Julia Lawall > > --- > drivers/soc/ti/knav_qmss_queue.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/soc/ti/knav_qmss_queue.c > b/drivers/soc/ti/knav_qmss_queue.c > index b73e353..56b5d7c 100644 > --- a/drivers/soc/ti/knav_qmss_queue.c > +++ b/drivers/soc/ti/knav_qmss_queue.c > @@ -1240,7 +1240,7 @@ static int knav_setup_queue_range(struct knav_device > *kdev, > if (of_get_property(node, "qalloc-by-id", NULL)) According to the binding, "qalloc-by-id" _is_ a boolean property, so this one really does deserve to be of_property_read_bool()... > range->flags |= RANGE_RESERVED; > > - if (of_get_property(node, "accumulator", NULL)) { > + if (of_property_read_bool(node, "accumulator")) { ...whereas "accumulator" must have a value, so this isn't technically appropriate. In general, most of these "if the property exists, read the property and do stuff" checks are probably a sign of code that could be simplified by refactoring the "do stuff" step to just specifically handle the "read the property" step returning -EINVAL when it's not present. Robin. [1]:https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg13375.html > ret = knav_init_acc_range(kdev, node, range); > if (ret < 0) { > devm_kfree(dev, range); > > > ___ > linux-arm-kernel mailing list > linux-arm-ker...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >
[PATCH v5 19/19] iommu/dma: Add support for mapping MSIs
When an MSI doorbell is located downstream of an IOMMU, attaching devices to a DMA ops domain and switching on translation leads to a rude shock when their attempt to write to the physical address returned by the irqchip driver faults (or worse, writes into some already-mapped buffer) and no interrupt is forthcoming. Address this by adding a hook for relevant irqchip drivers to call from their compose_msi_msg() callback, to swizzle the physical address with an appropriatly-mapped IOVA for any device attached to one of our DMA ops domains. CC: Thomas Gleixner CC: Jason Cooper CC: Marc Zyngier CC: linux-kernel@vger.kernel.org Signed-off-by: Robin Murphy --- drivers/iommu/dma-iommu.c| 141 ++- drivers/irqchip/irq-gic-v2m.c| 3 + drivers/irqchip/irq-gic-v3-its.c | 3 + include/linux/dma-iommu.h| 9 +++ 4 files changed, 141 insertions(+), 15 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 00c8a08d56e7..330cce60cad9 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -25,10 +25,29 @@ #include #include #include +#include #include #include #include +struct iommu_dma_msi_page { + struct list_headlist; + dma_addr_t iova; + u32 phys_lo; + u32 phys_hi; +}; + +struct iommu_dma_cookie { + struct iova_domain iovad; + struct list_headmsi_page_list; + spinlock_t msi_lock; +}; + +static inline struct iova_domain *cookie_iovad(struct iommu_domain *domain) +{ + return &((struct iommu_dma_cookie *)domain->iova_cookie)->iovad; +} + int iommu_dma_init(void) { return iova_cache_get(); @@ -43,15 +62,19 @@ int iommu_dma_init(void) */ int iommu_get_dma_cookie(struct iommu_domain *domain) { - struct iova_domain *iovad; + struct iommu_dma_cookie *cookie; if (domain->iova_cookie) return -EEXIST; - iovad = kzalloc(sizeof(*iovad), GFP_KERNEL); - domain->iova_cookie = iovad; + cookie = kzalloc(sizeof(*cookie), GFP_KERNEL); + if (!cookie) + return -ENOMEM; - return iovad ? 0 : -ENOMEM; + spin_lock_init(&cookie->msi_lock); + INIT_LIST_HEAD(&cookie->msi_page_list); + domain->iova_cookie = cookie; + return 0; } EXPORT_SYMBOL(iommu_get_dma_cookie); @@ -63,14 +86,20 @@ EXPORT_SYMBOL(iommu_get_dma_cookie); */ void iommu_put_dma_cookie(struct iommu_domain *domain) { - struct iova_domain *iovad = domain->iova_cookie; + struct iommu_dma_cookie *cookie = domain->iova_cookie; + struct iommu_dma_msi_page *msi, *tmp; - if (!iovad) + if (!cookie) return; - if (iovad->granule) - put_iova_domain(iovad); - kfree(iovad); + if (cookie->iovad.granule) + put_iova_domain(&cookie->iovad); + + list_for_each_entry_safe(msi, tmp, &cookie->msi_page_list, list) { + list_del(&msi->list); + kfree(msi); + } + kfree(cookie); domain->iova_cookie = NULL; } EXPORT_SYMBOL(iommu_put_dma_cookie); @@ -88,7 +117,7 @@ EXPORT_SYMBOL(iommu_put_dma_cookie); */ int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, u64 size) { - struct iova_domain *iovad = domain->iova_cookie; + struct iova_domain *iovad = cookie_iovad(domain); unsigned long order, base_pfn, end_pfn; if (!iovad) @@ -155,7 +184,7 @@ int dma_direction_to_prot(enum dma_data_direction dir, bool coherent) static struct iova *__alloc_iova(struct iommu_domain *domain, size_t size, dma_addr_t dma_limit) { - struct iova_domain *iovad = domain->iova_cookie; + struct iova_domain *iovad = cookie_iovad(domain); unsigned long shift = iova_shift(iovad); unsigned long length = iova_align(iovad, size) >> shift; @@ -171,7 +200,7 @@ static struct iova *__alloc_iova(struct iommu_domain *domain, size_t size, /* The IOVA allocator knows what we mapped, so just unmap whatever that was */ static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t dma_addr) { - struct iova_domain *iovad = domain->iova_cookie; + struct iova_domain *iovad = cookie_iovad(domain); unsigned long shift = iova_shift(iovad); unsigned long pfn = dma_addr >> shift; struct iova *iova = find_iova(iovad, pfn); @@ -294,7 +323,7 @@ struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp, void (*flush_page)(struct device *, const void *, phys_addr_t)) { struct iommu_domain *domain = iommu_get_domain_for_dev(dev); - struct iova_domain *iovad = domain->iova_cookie; + struct iova_domain *iovad = cookie_iovad(domain);
Re: [PATCH v5 19/19] iommu/dma: Add support for mapping MSIs
On 24/08/16 09:16, Thomas Gleixner wrote: > On Tue, 23 Aug 2016, Robin Murphy wrote: >> +cookie = domain->iova_cookie; >> +iovad = &cookie->iovad; >> + >> +spin_lock(&cookie->msi_lock); >> +list_for_each_entry(msi_page, &cookie->msi_page_list, list) >> +if (msi_page->phys_hi == msg->address_hi && >> +msi_page->phys_lo - msg->address_lo < iovad->granule) >> +goto unlock; >> + >> +ret = __iommu_dma_map_msi_page(dev, msg, domain, &msi_page); >> +unlock: >> +spin_unlock(&cookie->msi_lock); >> + >> +if (!ret) { >> +msg->address_hi = upper_32_bits(msi_page->iova); >> +msg->address_lo &= iova_mask(iovad); >> +msg->address_lo += lower_32_bits(msi_page->iova); >> +} else { >> +/* >> + * We're called from a void callback, so the best we can do is >> + * 'fail' by filling the message with obviously bogus values. >> + * Since we got this far due to an IOMMU being present, it's >> + * not like the existing address would have worked anyway... >> + */ >> +msg->address_hi = ~0U; >> +msg->address_lo = ~0U; >> +msg->data = ~0U; >> +} > > The above is really horrible to parse. I had to read it five times to > understand the logic. Yeah, on reflection it is needlessly hideous. I think we should take this as a clear lesson that whenever you find yourself thinking "Man, I wish I had Python's for...else construct here", you're doing it wrong ;) > static struct iommu_dma_msi_page * > find_or_map_msi_page(struct iommu_dma_cookie *cookie, struct msi_msg *msg) > { > struct iova_domain *iovad = &cookie->iovad; > struct iommu_dma_msi_page *page; > > list_for_each_entry(*page, &cookie->msi_page_list, list) { > if (page->phys_hi == msg->address_hi && > page->phys_lo - msg->address_lo < iovad->granule) > return page; > } > > /* >* FIXME: __iommu_dma_map_msi_page() should return a page or NULL. >* The integer return value is pretty pointless. If seperate error >* codes are required that's what ERR_PTR() is for >*/ > ret = __iommu_dma_map_msi_page(dev, msg, domain, &page); > return ret ? ERR_PTR(ret) : page; > } > > So now the code in iommu_dma_map_msi_msg() becomes: > > spin_lock(&cookie->msi_lock); > msi_page = find_or_map_msi_page(cookie, msg); > spin_unlock(&cookie->msi_lock); > > if (!IS_ERR_OR_NULL(msi_page)) { > msg->address_hi = upper_32_bits(msi_page->iova); > msg->address_lo &= iova_mask(iovad); > msg->address_lo += lower_32_bits(msi_page->iova); > } else { > /* >* We're called from a void callback, so the best we can do is >* 'fail' by filling the message with obviously bogus values. >* Since we got this far due to an IOMMU being present, it's >* not like the existing address would have worked anyway... >*/ > msg->address_hi = ~0U; > msg->address_lo = ~0U; > msg->data = ~0U; > } > > Hmm? OK, I've turned map_msi_page into get_msi_page (returning a page) and just hoisted the list lookup into that, which leads to knock-on simplifications throughout and is _much_ nicer. I now can't imagine why I didn't get that far in the first place - thanks for the reality check! Robin. > > Thanks, > > tglx >
Re: [PATCH v5 19/19] iommu/dma: Add support for mapping MSIs
Hi Eric, On Fri, 26 Aug 2016 00:25:34 +0200 Auger Eric wrote: > Hi Robin, > > On 23/08/2016 21:05, Robin Murphy wrote: > > When an MSI doorbell is located downstream of an IOMMU, attaching > > devices to a DMA ops domain and switching on translation leads to a > > rude shock when their attempt to write to the physical address > > returned by the irqchip driver faults (or worse, writes into some > > already-mapped buffer) and no interrupt is forthcoming. > > > > Address this by adding a hook for relevant irqchip drivers to call > > from their compose_msi_msg() callback, to swizzle the physical > > address with an appropriatly-mapped IOVA for any device attached to > > one of our DMA ops domains. > > > > CC: Thomas Gleixner > > CC: Jason Cooper > > CC: Marc Zyngier > > CC: linux-kernel@vger.kernel.org > > Signed-off-by: Robin Murphy > > --- > > drivers/iommu/dma-iommu.c| 141 > > ++- > > drivers/irqchip/irq-gic-v2m.c| 3 + > > drivers/irqchip/irq-gic-v3-its.c | 3 + > > include/linux/dma-iommu.h| 9 +++ 4 files changed, 141 > > insertions(+), 15 deletions(-) > > > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > > index 00c8a08d56e7..330cce60cad9 100644 > > --- a/drivers/iommu/dma-iommu.c > > +++ b/drivers/iommu/dma-iommu.c > > @@ -25,10 +25,29 @@ > > #include > > #include > > #include > > +#include > > #include > > #include > > #include > > > > +struct iommu_dma_msi_page { > > + struct list_headlist; > > + dma_addr_t iova; > > + u32 phys_lo; > > + u32 phys_hi; > > +}; > > + > > +struct iommu_dma_cookie { > > + struct iova_domain iovad; > > + struct list_headmsi_page_list; > > + spinlock_t msi_lock; > > +}; > > + > > +static inline struct iova_domain *cookie_iovad(struct iommu_domain > > *domain) +{ > > + return &((struct iommu_dma_cookie > > *)domain->iova_cookie)->iovad; +} > > + > > int iommu_dma_init(void) > > { > > return iova_cache_get(); > > @@ -43,15 +62,19 @@ int iommu_dma_init(void) > > */ > > int iommu_get_dma_cookie(struct iommu_domain *domain) > > { > > - struct iova_domain *iovad; > > + struct iommu_dma_cookie *cookie; > > > > if (domain->iova_cookie) > > return -EEXIST; > > > > - iovad = kzalloc(sizeof(*iovad), GFP_KERNEL); > > - domain->iova_cookie = iovad; > > + cookie = kzalloc(sizeof(*cookie), GFP_KERNEL); > > + if (!cookie) > > + return -ENOMEM; > > > > - return iovad ? 0 : -ENOMEM; > > + spin_lock_init(&cookie->msi_lock); > > + INIT_LIST_HEAD(&cookie->msi_page_list); > > + domain->iova_cookie = cookie; > > + return 0; > > } > > EXPORT_SYMBOL(iommu_get_dma_cookie); > > > > @@ -63,14 +86,20 @@ EXPORT_SYMBOL(iommu_get_dma_cookie); > > */ > > void iommu_put_dma_cookie(struct iommu_domain *domain) > > { > > - struct iova_domain *iovad = domain->iova_cookie; > > + struct iommu_dma_cookie *cookie = domain->iova_cookie; > > + struct iommu_dma_msi_page *msi, *tmp; > > > > - if (!iovad) > > + if (!cookie) > > return; > > > > - if (iovad->granule) > > - put_iova_domain(iovad); > > - kfree(iovad); > > + if (cookie->iovad.granule) > > + put_iova_domain(&cookie->iovad); > > + > > + list_for_each_entry_safe(msi, tmp, &cookie->msi_page_list, > > list) { > > + list_del(&msi->list); > > + kfree(msi); > > + } > > + kfree(cookie); > > domain->iova_cookie = NULL; > > } > > EXPORT_SYMBOL(iommu_put_dma_cookie); > > @@ -88,7 +117,7 @@ EXPORT_SYMBOL(iommu_put_dma_cookie); > > */ > > int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t > > base, u64 size) { > > - struct iova_domain *iovad = domain->iova_cookie; > > + struct iova_domain *iovad = cookie_iovad(domain); > > unsigned long order, base_pfn, end_pfn; > > > > if (!iovad) > > @@ -155,7 +184,7 @@ int dma_direction_to_prot(enum > > dma_data_direction dir, bool coherent) static struct iova > > *__alloc_iova(struct iommu_domain *domain, size_t size, dma_addr_t > > dma_limi
Re: [PATCH] irqchip/gic-{v2m, v3-its}: check iommu capable before doing iommu map
On 17/08/17 10:22, Marc Zyngier wrote: > On 17/08/17 10:01, Shawn Lin wrote: >> Hi Marc >> >> On 2017/8/17 16:52, Marc Zyngier wrote: >>> On 17/08/17 09:28, Shawn Lin wrote: If a PCIe RC use gic-v2m or gic-v3-its as a msi domain but doesn't have iommu support, we don't need to do iommu_dma_map_msi_msg to get mapped iommu address as all we need is the physical address. Otherwise we see the oops shown below as we can't get a iommu_group for a device without iommu capable. Unable to handle kernel NULL pointer dereference at virtual address 00d0 [00d0] user address but active_mm is swapper Internal error: Oops: 9604 [#1] PREEMPT SMP Modules linked in: CPU: 3 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc5-next-20170815-1-g0744890-dirty #53 Hardware name: Firefly-RK3399 Board (DT) task: 80007bc7 task.stack: 80007bc6c000 PC is at iommu_get_domain_for_dev+0x38/0x50 LR is at iommu_dma_map_msi_msg+0x3c/0x1b8 pc : [] lr : [] pstate: a045 ... [] iommu_get_domain_for_dev+0x38/0x50 [] iommu_dma_map_msi_msg+0x3c/0x1b8 [] its_irq_compose_msi_msg+0x44/0x50 [] irq_chip_compose_msi_msg+0x40/0x58 [] msi_domain_activate+0x1c/0x48 [] __irq_domain_activate_irq+0x40/0x58 [] irq_domain_activate_irq+0x24/0x40 [] msi_domain_alloc_irqs+0x104/0x190 [] pci_msi_setup_msi_irqs+0x3c/0x48 [] __pci_enable_msi_range+0x21c/0x408 [] pci_alloc_irq_vectors_affinity+0x104/0x168 [] pcie_port_device_register+0x200/0x488 [] pcie_portdrv_probe+0x34/0xd0 [] local_pci_probe+0x3c/0xb8 [] pci_device_probe+0x138/0x170 [] driver_probe_device+0x21c/0x2d8 [] __device_attach_driver+0x9c/0xf8 [] bus_for_each_drv+0x5c/0x98 [] __device_attach+0xc4/0x138 [] device_attach+0x10/0x18 [] pci_bus_add_device+0x4c/0xa8 [] pci_bus_add_devices+0x44/0x90 [] rockchip_pcie_probe+0xc70/0xcc8 [] platform_drv_probe+0x58/0xc0 [] driver_probe_device+0x21c/0x2d8 [] __driver_attach+0xac/0xb0 [] bus_for_each_dev+0x64/0xa0 [] driver_attach+0x20/0x28 [] bus_add_driver+0x110/0x230 [] driver_register+0x60/0xf8 [] __platform_driver_register+0x40/0x48 [] rockchip_pcie_driver_init+0x18/0x20 [] do_one_initcall+0xb0/0x120 [] kernel_init_freeable+0x184/0x224 [] kernel_init+0x10/0x100 [] ret_from_fork+0x10/0x18 This patch is to fix the problem exposed by the commit mentioned below. Before this commit, iommu has a work around method to fix this but now it doesn't. So we could fix this in gic code but maybe still need a fixes tag here. Fixes: 05f80300dc8b ("iommu: Finish making iommu_group support mandatory") Signed-off-by: Shawn Lin >>> >>> That looks pretty horrible. Please see the patch I posted in response to >>> your initial report: >>> >>> http://marc.info/?l=linux-pci&m=150295809430903&w=2 >> >> Aha, I saw it but I didn't get your point of "I'm not convinced that >> playing with the group refcount in an irqchip is the right approach..." >> >> So, I'm not sure whether you prefer your patch? > I really dislike both: > > - yours: because it reinvent the wheel (parsing DT one more time on each > MSI message creation), and is DT specific while the rest of the code is > completely firmware agnostic (what about ACPI?). > > - mine: because it relies on an intimate knowledge of the internals of > the iommu stuff, which is not what was originally intended. > > My comment was an invitation to Joerg to speak his mind. > > The original intent of iommu_dma_map_msi_msg() was that it could silently > fail without exploding in your face. We can change that, but a minimum of > coordination wouldn't hurt. > > My personal preference would be to address this in the iommu layer, > restoring a non-exploding iommu_dma_map_msi_msg(): Indeed, but that still doesn't fix other breakages that simply haven't been found yet ;) Thanks for the reports - I see exactly what I've done here. Gimme a moment to write it up convincingly... Robin. > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > index 44eca1e3243f..5440eae21bea 100644 > --- a/drivers/iommu/dma-iommu.c > +++ b/drivers/iommu/dma-iommu.c > @@ -883,12 +883,18 @@ static struct iommu_dma_msi_page > *iommu_dma_get_msi_page(struct device *dev, > void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg) > { > struct device *dev = msi_desc_to_dev(irq_get_msi_desc(irq)); > - struct iommu_domain *domain = iommu_get_domain_for_dev(dev); > + struct iommu_domain *domain; > + struct iommu_group *group; > struct iommu_dma_cookie *cookie; > struct iommu_dma_msi_page *msi_page; > phys_addr_t msi_addr = (u64)msg->address_hi << 32 | msg->address_lo; > unsigned long flags; > > + group = iommu_group_get(dev); > + if (!group) > +
[PATCH] iommu: Avoid NULL group dereference
The recently-removed FIXME in iommu_get_domain_for_dev() turns out to have been a little misleading, since that check is still worthwhile even when groups *are* universal. We have a few IOMMU-aware drivers which only care whether their device is already attached to an existing domain or not, for which the previous behaviour of iommu_get_domain_for_dev() was ideal, and who now crash if their device does not have an IOMMU. With IOMMU groups now serving as a reliable indicator of whether a device has an IOMMU or not (barring false-positives from VFIO no-IOMMU mode), drivers could arguably do this: group = iommu_group_get(dev); if (group) { domain = iommu_get_domain_for_dev(dev); iommu_group_put(group); } However, rather than duplicate that code across multiple callsites, particularly when it's still only the domain they care about, let's skip straight to the next step and factor out the check into the common place it applies - in iommu_get_domain_for_dev() itself. Sure, it ends up looking rather familiar, but now it's backed by the reasoning of having a robust API able to do the expected thing for all devices regardless. Fixes: 05f80300dc8b ("iommu: Finish making iommu_group support mandatory") Reported-by: Shawn Lin Signed-off-by: Robin Murphy --- As well as dma-iommu, there are at least the Cavium ThunderX and Freescale DPAA2 ethernet drivers expecting this to work too. drivers/iommu/iommu.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index af69bf7e035a..5499a0387349 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1352,6 +1352,8 @@ struct iommu_domain *iommu_get_domain_for_dev(struct device *dev) struct iommu_group *group; group = iommu_group_get(dev); + if (!group) + return NULL; domain = group->domain; -- 2.13.4.dirty
Re: [PATCH] iommu: Avoid NULL group dereference
On 17/08/17 16:41, Joerg Roedel wrote: > On Thu, Aug 17, 2017 at 11:40:08AM +0100, Robin Murphy wrote: >> The recently-removed FIXME in iommu_get_domain_for_dev() turns out to >> have been a little misleading, since that check is still worthwhile even >> when groups *are* universal. We have a few IOMMU-aware drivers which >> only care whether their device is already attached to an existing domain >> or not, for which the previous behaviour of iommu_get_domain_for_dev() >> was ideal, and who now crash if their device does not have an IOMMU. >> >> With IOMMU groups now serving as a reliable indicator of whether a >> device has an IOMMU or not (barring false-positives from VFIO no-IOMMU >> mode), drivers could arguably do this: >> >> group = iommu_group_get(dev); >> if (group) { >> domain = iommu_get_domain_for_dev(dev); >> iommu_group_put(group); >> } > > Okay, so just to check I got it right: Drivers do the above to check > whether a device is managed by an IOMMU, and that crashes now because > the 'group == NULL' check was removed? Indeed - the typical context is network descriptors that don't have space to store the CPU virtual address of the buffer, so when a packet arrives the driver has to work backwards from the DMA address, in this sort of pattern: addr = desc[idx]->addr; domain = iommu_get_domain_for_dev(dev); if (domain) addr = iommu_iova_to_phys(domain, addr) buf = phys_to_virt(addr) (the GIC thing is similar but in reverse, with a physical address which may or may not need replacing with an IOVA). Unless we were to change the interface to be iommu_get_domain_for_group(), I think it makes sense for it to remain valid to call for any device. Robin.
Re: [PATCH] PCI/MSI: Improve MSI alias detection
On 19/06/17 22:52, Bjorn Helgaas wrote: > [+cc David, Alex] > > On Wed, May 31, 2017 at 06:52:28PM +0100, Robin Murphy wrote: >> Currently, we consider all DMA aliases when calculating MSI requester >> IDs. This turns out to be the wrong thing to do in the face of pure DMA >> quirks like those of Marvell SATA cards, where we can end up configuring >> the MSI doorbell to expect the phantom function alias, such that it then >> goes on to happily ignore actual MSI writes coming from the card on the >> real RID. > > I guess you're making the assumption that DMA might use phantom > function aliases (aliases on the same bus as the actual device), but > MSI writes will not? Indeed - that's certainly the case for the Marvell 88SE9128 that I have here (and it was a report involving a similar Marvell SATA card that first brought this to our attention). I appreciate that there's not necessarily a general right answer, but the generic MSI infrastructure is rather built around the notion of a device having a single ID, so this is really a case of picking the most likely compromise to cover the greatest number of cases. >> Improve the alias walk to only account for the topological aliases that >> matter, based on the logic from the Intel IRQ remapping code. > > Can you include a specific function reference here? It sounds like > it'd be useful to compare this with the Intel IRQ remapping code. set_msi_sid() in drivers/iommu/intel_irq_remapping.c - one small difference from VT-d is that here we don't have an equivalent of the "verify bus" option, only of considering the full ID. >> Signed-off-by: Robin Murphy >> --- >> drivers/pci/msi.c | 18 +- >> 1 file changed, 13 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c >> index ba44fdfda66b..7b34c434970e 100644 >> --- a/drivers/pci/msi.c >> +++ b/drivers/pci/msi.c >> @@ -1468,13 +1468,21 @@ struct irq_domain *pci_msi_create_irq_domain(struct >> fwnode_handle *fwnode, >> } >> EXPORT_SYMBOL_GPL(pci_msi_create_irq_domain); >> >> +/* >> + * If the alias device/RID is on a different bus, it's a topological alias >> + * we should care about; otherwise, it's a DMA quirk and we don't. >> + */ >> static int get_msi_id_cb(struct pci_dev *pdev, u16 alias, void *data) >> { >> u32 *pa = data; >> +u8 bus = PCI_BUS_NUM(*pa); >> + >> +if (pdev->bus->number != bus || PCI_BUS_NUM(alias) != bus) >> +*pa = alias; >> >> -*pa = alias; > > Aliases can arise both because of device defects, e.g., the phantom > function quirks, and because of bridges, e.g., all the stuff in > pci_for_each_dma_alias(). > > Why is it that the users of get_msi_id_cb() can get away with > collapsing any of these bridge aliases into the last one that > pci_for_each_dma_alias() finds? I guess this is another question > about exactly why DMA and MSI are different here. As before, it's mostly just a case of trying to pick the least-worst option to fit the constraints of the generic MSI layer (assuming a bridge in the PCI->PCIe direction that is always going to introduce an alias) - based on the assumption that the VT-d code has presumably worked well enough in practice doing a similar thing - even if that doesn't quite seem to match my reading of the bridge spec (AFAICS it may be true for reads, but writes are somewhat murkier). >From the DMA angle[1], we do have the ability to simply consider every alias for translation at the IOMMU, since the IOMMU layer already has provisions for devices having multiple IDs. Indeed in the Marvell case it is certainly necessary to consider both the real and phantom functions so that both DMA and MSIs even get through in the first place. Robin. [1]:https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg17979.html >> return 0; >> } >> + >> /** >> * pci_msi_domain_get_msi_rid - Get the MSI requester id (RID) >> * @domain: The interrupt domain >> @@ -1488,7 +1496,7 @@ static int get_msi_id_cb(struct pci_dev *pdev, u16 >> alias, void *data) >> u32 pci_msi_domain_get_msi_rid(struct irq_domain *domain, struct pci_dev >> *pdev) >> { >> struct device_node *of_node; >> -u32 rid = 0; >> +u32 rid = PCI_DEVID(pdev->bus->number, pdev->devfn); >> >> pci_for_each_dma_alias(pdev, get_msi_id_cb, &rid); >> >> @@ -1504,14 +1512,14 @@ u32 pci_msi_domain_get_msi_rid(struct irq_domain >> *domain, struct pci_dev *pdev) >> * @pdev: The PCI device >> * >> * Use the firmwa
Re: [PATCH 1/1] iommu/arm-smmu-v3: replace writel with writel_relaxed in queue_inc_prod
On 20/06/17 12:04, Zhen Lei wrote: > This function is protected by spinlock, and the latter will do memory > barrier implicitly. So that we can safely use writel_relaxed. In fact, the > dmb operation will lengthen the time protected by lock, which indirectly > increase the locking confliction in the stress scene. If you remove the DSB between writing the commands (to Normal memory) and writing the pointer (to Device memory), how can you guarantee that the complete command is visible to the SMMU and it isn't going to try to consume stale memory contents? The spinlock is irrelevant since it's taken *before* the command is written. Robin. > Signed-off-by: Zhen Lei > --- > drivers/iommu/arm-smmu-v3.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c > index 380969a..d2fbee3 100644 > --- a/drivers/iommu/arm-smmu-v3.c > +++ b/drivers/iommu/arm-smmu-v3.c > @@ -728,7 +728,7 @@ static void queue_inc_prod(struct arm_smmu_queue *q) > u32 prod = (Q_WRP(q, q->prod) | Q_IDX(q, q->prod)) + 1; > > q->prod = Q_OVF(q, q->prod) | Q_WRP(q, prod) | Q_IDX(q, prod); > - writel(q->prod, q->prod_reg); > + writel_relaxed(q->prod, q->prod_reg); > } > > /* > -- > 2.5.0 > >
Re: new dma-mapping tree, was Re: clean up and modularize arch dma_mapping interface V2
Hi Christoph, On 20/06/17 13:41, Christoph Hellwig wrote: > On Fri, Jun 16, 2017 at 08:10:15PM +0200, Christoph Hellwig wrote: >> I plan to create a new dma-mapping tree to collect all this work. >> Any volunteers for co-maintainers, especially from the iommu gang? > > Ok, I've created the new tree: > >git://git.infradead.org/users/hch/dma-mapping.git for-next > > Gitweb: > > > http://git.infradead.org/users/hch/dma-mapping.git/shortlog/refs/heads/for-next > > And below is the patch to add the MAINTAINERS entry, additions welcome. I'm happy to be a reviewer, since I've been working in this area for some time, particularly with the dma-iommu code and arm64 DMA ops. Robin. > Stephen, can you add this to linux-next? > > --- > From 335979c41912e6c101a20b719862b2d837370df1 Mon Sep 17 00:00:00 2001 > From: Christoph Hellwig > Date: Tue, 20 Jun 2017 11:17:30 +0200 > Subject: MAINTAINERS: add entry for dma mapping helpers > > This code has been spread between getting in through arch trees, the iommu > tree, -mm and the drivers tree. There will be a lot of work in this area, > including consolidating various arch implementations into more common > code, so ensure we have a proper git tree that facilitates cooperation with > the architecture maintainers. > > Signed-off-by: Christoph Hellwig > --- > MAINTAINERS | 13 + > 1 file changed, 13 insertions(+) > > diff --git a/MAINTAINERS b/MAINTAINERS > index 09b5ab6a8a5c..56859d53a424 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -2595,6 +2595,19 @@ S: Maintained > F: net/bluetooth/ > F: include/net/bluetooth/ > > +DMA MAPPING HELPERS > +M: Christoph Hellwig > +L: linux-kernel@vger.kernel.org > +T: git git://git.infradead.org/users/hch/dma-mapping.git > +W: http://git.infradead.org/users/hch/dma-mapping.git > +S: Supported > +F: lib/dma-debug.c > +F: lib/dma-noop.c > +F: lib/dma-virt.c > +F: drivers/base/dma-mapping.c > +F: drivers/base/dma-coherent.c > +F: include/linux/dma-mapping.h > + > BONDING DRIVER > M: Jay Vosburgh > M: Veaceslav Falico >
Re: [PATCH v5 3/7] drivers: dma-coherent: Account dma_pfn_offset when used with device tree
On 20/06/17 14:42, Christoph Hellwig wrote: > Wouldn't the smal patch below solve the same issue in a simple way? That assumes that all devices accessing a shared pool will have the same dma_pfn_offset as the first one, which cannot strictly be relied upon (even if it is highly likely in practice). Robin. > diff --git a/drivers/base/dma-coherent.c b/drivers/base/dma-coherent.c > index 640a7e63c453..e8f8447d705b 100644 > --- a/drivers/base/dma-coherent.c > +++ b/drivers/base/dma-coherent.c > @@ -290,9 +290,11 @@ EXPORT_SYMBOL(dma_mmap_from_coherent); > static int rmem_dma_device_init(struct reserved_mem *rmem, struct device > *dev) > { > struct dma_coherent_mem *mem = rmem->priv; > + dma_addr_t dev_addr = ((rmem->base >> PAGE_SHIFT) - > + dev->dma_pfn_offset) << PAGE_SHIFT; > > if (!mem && > - !dma_init_coherent_memory(rmem->base, rmem->base, rmem->size, > + !dma_init_coherent_memory(rmem->base, dev_addr, rmem->size, > DMA_MEMORY_MAP | DMA_MEMORY_EXCLUSIVE, > &mem)) { > pr_err("Reserved memory: failed to init DMA memory pool at %pa, > size %ld MiB\n", >
Re: [PATCH v5 4/7] drivers: dma-coherent: Introduce default DMA pool
On 20/06/17 14:49, Christoph Hellwig wrote: > On Wed, May 24, 2017 at 11:24:29AM +0100, Vladimir Murzin wrote: >> This patch introduces default coherent DMA pool similar to default CMA >> area concept. To keep other users safe code kept under CONFIG_ARM. > > I don't see a CONFIG_ARM in the code, although parts of it are added > under CONFIG_OF_RESERVED_MEM. It's in rmem_dma_setup() (line 325) currently enforcing no-map for ARM. > But overall this code look a bit odd to me. As far as I can tell > the dma-coherent.c code is for the case where we have a special > piece of coherent memory close to a device. True, but the case here is where we need a special piece of coherent memory for *all* devices, and it was more complicated *not* to reuse the existing infrastructure. This would already be achievable by specifying a separate rmem carveout per device, but the shared pool just makes life easier, and mirrors the functionality dma-contiguous already supports. > If you're allocating out of the global allocator the memory should > come from the normal dma_ops ->alloc allocator - and also take > the attrs into account (e.g. for DMA_ATTR_NON_CONSISTENT or > DMA_ATTR_NO_KERNEL_MAPPING requests you don't need coherent memory) The context here is noMMU but with caches - the problem being that the normal allocator will give back kernel memory, and there's no way to make that coherent with devices short of not enabling the caches in the first place, which is obviously undesirable. The trick is that RAM is aliased (in hardware) at two addresses, one of which makes CPU accesses non-cacheable, so by only ever accessing the RAM set aside for the coherent DMA pool using the non-cacheable alias (represented by the dma_pfn_offset) we can achieve DMA coherency. It perhaps seems a bit backwards, but we do actually end up honouring DMA_ATTR_NON_CONSISTENT to a degree in patch #5, as such requests are the only ones allowed to fall back to the normal dma_ops allocator. Robin.
Re: CONFIG_DMA_NOOP_OPS breaks ARM arch
On 16/10/17 09:12, Vladimir Murzin wrote: > + Robin and Christoph > > On 16/10/17 06:27, Marian Mihailescu wrote: >> I am using 4.14-rc4 with a patch on top that includes >> arch/arm/include/asm/dma-mapping.h in a module. >> >> I have MMU enabled, so >> select DMA_NOOP_OPS if !MMU >> does nothing for me, and I get a compile error because dma_noop_ops is >> unknown. > > Can you post an error message here, please? > >> >> Maybe I should include linux/dma-mapping.h? That is the *only* dma-mapping header you should be referring to. Including the arch/*/asm/ version directly is just broken and asking for trouble. Robin. > Where to include? In your driver or what? > >> >> Thanks for the quick reply. > > with CONFIG_MMU compiler should optimise out dma_noop_ops in: > > return IS_ENABLED(CONFIG_MMU) ? &arm_dma_ops : &dma_noop_ops; > > What toolchain are you using? > > Cheers > Vladimir > >> >> >> On Mon, Oct 16, 2017 at 2:28 PM, Randy Dunlap wrote: >>> On 10/15/17 20:29, Randy Dunlap wrote: On 10/15/17 20:27, Randy Dunlap wrote: > On 10/15/17 19:27, Marian Mihailescu wrote: >> After commit 7844572c633964c864d9f32dc3f2a8ffe5d70371, dma_noop_ops >> are built only for architectures that use it. >> >> For ARM architecture, CONFIG_DMA_NOOP_OPS is not selected, and cannot >> be selected. >>> >>> What kernel version are you looking at? >>> I see that it is selected: >>> >>> --- a/arch/arm/Kconfig >>> +++ b/arch/arm/Kconfig >>> @@ -22,6 +22,7 @@ config ARM >>> select CLONE_BACKWARDS >>> select CPU_PM if (SUSPEND || CPU_IDLE) >>> select DCACHE_WORD_ACCESS if HAVE_EFFICIENT_UNALIGNED_ACCESS >>> + select DMA_NOOP_OPS if !MMU >>> select EDAC_SUPPORT >>> select EDAC_ATOMIC_SCRUB >>> select GENERIC_ALLOCATOR >>> >>> >>> That's in commit ID 1c51c429f30ea10428337f3a33c12059ba59f668 from May 24, >>> 2017. >>> >> However, arch/arm/include/asm/dma-mapping.h is referencing dma_noop_ops: >> >> static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type >> *bus) >> { >> return IS_ENABLED(CONFIG_MMU) ? &arm_dma_ops : &dma_noop_ops; >> } >> >> I will let a maintainer suggest the best resolution for this :) >> > > add Bart and iommu mailing list. > and add Vladimir. >>> >>> >>> -- >>> ~Randy >
Re: ARM64: Regression with commit e3067861ba66 ("arm64: add basic VMAP_STACK support")
On 16/10/17 14:48, Mark Rutland wrote: > Hi Leo, > > On Mon, Oct 16, 2017 at 09:17:23AM +0800, Leo Yan wrote: >> On Tue, Oct 10, 2017 at 05:03:44PM +0100, Robin Murphy wrote: >>> On 10/10/17 16:45, Mark Rutland wrote: >>>> On Tue, Oct 10, 2017 at 10:27:25PM +0800, Leo Yan wrote: >>>>> I work mainline kernel on Hikey620 board, I find it's easily to >>>>> introduce the panic and report the log as below. So I bisect the kernel >>>>> and finally narrow down the commit e3067861ba66 ("arm64: add basic >>>>> VMAP_STACK support") which introduce this issue. >>>>> >>>>> I tried to remove 'select HAVE_ARCH_VMAP_STACK' from >>>>> arch/arm64/Kconfig, then I can see the panic issue will dismiss. So >>>>> could you check this and have insight for this issue? >>>> >>>> Given the stuff in the backtrace, my suspicion is something is trying to >>>> perform DMA to/from the stack, getting junk addresses form the attempted >>>> virt<->phys conversions. >>>> >>>> Could you try enabling both VMAP_STACK and CONFIG_DEBUG_VIRTUAL? >>> >>> CONFIG_DMA_API_DEBUG should scream about drivers trying to use stack >>> addresses either way, too. >> >> Thanks for suggestions, Mark & Robin. >> >> I enabled these debugging configs but cannot get clue from it; but >> occasionally found this issue is quite likely related with CA53 errata, >> especialy ERRATA_A53_855873 is the relative one. So I changed to use >> ARM-TF mainline code with ERRATA fixing, this issue can be dismissed. > > Thanks for the update. > > Just to confirm, with the updated firmware you no longer see the issue? > > I can't immediately see how that would be related. Cores up to r0p2 have the other errata to which ARM64_WORKAROUND_CLEAN_CACHE also applies anyway; r3p0+ have an ACTLR bit to do thee CVAC->CIVAC upgrade in hardware, and our policy is that we expect firmware to enable such hardware workarounds where possible. I assume that's why we don't explicitly document 855873 anywhere in Linux. Robin.
Re: ARM64: Regression with commit e3067861ba66 ("arm64: add basic VMAP_STACK support")
On 16/10/17 15:26, Mark Rutland wrote: > On Mon, Oct 16, 2017 at 03:12:45PM +0100, Robin Murphy wrote: >> On 16/10/17 14:48, Mark Rutland wrote: >>> Hi Leo, >>> >>> On Mon, Oct 16, 2017 at 09:17:23AM +0800, Leo Yan wrote: >>>> On Tue, Oct 10, 2017 at 05:03:44PM +0100, Robin Murphy wrote: >>>>> On 10/10/17 16:45, Mark Rutland wrote: >>>>>> On Tue, Oct 10, 2017 at 10:27:25PM +0800, Leo Yan wrote: >>>>>>> I work mainline kernel on Hikey620 board, I find it's easily to >>>>>>> introduce the panic and report the log as below. So I bisect the kernel >>>>>>> and finally narrow down the commit e3067861ba66 ("arm64: add basic >>>>>>> VMAP_STACK support") which introduce this issue. >>>>>>> >>>>>>> I tried to remove 'select HAVE_ARCH_VMAP_STACK' from >>>>>>> arch/arm64/Kconfig, then I can see the panic issue will dismiss. So >>>>>>> could you check this and have insight for this issue? >>>>>> >>>>>> Given the stuff in the backtrace, my suspicion is something is trying to >>>>>> perform DMA to/from the stack, getting junk addresses form the attempted >>>>>> virt<->phys conversions. >>>>>> >>>>>> Could you try enabling both VMAP_STACK and CONFIG_DEBUG_VIRTUAL? >>>>> >>>>> CONFIG_DMA_API_DEBUG should scream about drivers trying to use stack >>>>> addresses either way, too. >>>> >>>> Thanks for suggestions, Mark & Robin. >>>> >>>> I enabled these debugging configs but cannot get clue from it; but >>>> occasionally found this issue is quite likely related with CA53 errata, >>>> especialy ERRATA_A53_855873 is the relative one. So I changed to use >>>> ARM-TF mainline code with ERRATA fixing, this issue can be dismissed. >>> >>> Thanks for the update. >>> >>> Just to confirm, with the updated firmware you no longer see the issue? >>> >>> I can't immediately see how that would be related. >> >> Cores up to r0p2 have the other errata to which >> ARM64_WORKAROUND_CLEAN_CACHE also applies anyway; r3p0+ have an ACTLR >> bit to do thee CVAC->CIVAC upgrade in hardware, and our policy is that >> we expect firmware to enable such hardware workarounds where possible. I >> assume that's why we don't explicitly document 855873 anywhere in Linux. > > Sure, I also looked it up. ;) > > I meant that I couldn't immediately see why VMAP'd stacks were likely to > tickle issues with that more reliably. Ah, right - in context, "that" appeared to refer to "updated firmware", not "VMAP_STACK". Sorry. I guess the vmap addresses might tickle the "same L2 set" condition differently to when both stack and DMA buffer are linear map addresses. Robin.
Re: [PATCH V2] ARM: dts: rockchip:add sd card support for firefly reload board
On 26/05/17 08:07, Eddie Cai wrote: > firefly reload board not support sd card yet. so support it. I'm confused... According to pictures and the schematic the microSD socket and vcc_sd supply are on the baseboard, not the core module, and these nodes already exist in rk3288-firefly-reload.dts :/ Robin. > Signed-off-by: Eddie Cai > Reviewed-by: Shawn Lin > --- > arch/arm/boot/dts/rk3288-firefly-reload-core.dtsi | 31 > +++ > 1 file changed, 31 insertions(+) > > diff --git a/arch/arm/boot/dts/rk3288-firefly-reload-core.dtsi > b/arch/arm/boot/dts/rk3288-firefly-reload-core.dtsi > index 413b61f..2f41209 100644 > --- a/arch/arm/boot/dts/rk3288-firefly-reload-core.dtsi > +++ b/arch/arm/boot/dts/rk3288-firefly-reload-core.dtsi > @@ -57,6 +57,17 @@ > clock-output-names = "ext_gmac"; > }; > > + vcc_sd: sdmmc-regulator { > + compatible = "regulator-fixed"; > + gpio = <&gpio7 RK_PB3 GPIO_ACTIVE_LOW>; > + pinctrl-names = "default"; > + pinctrl-0 = <&sdmmc_pwr>; > + regulator-name = "vcc_sd"; > + regulator-min-microvolt = <330>; > + regulator-max-microvolt = <330>; > + startup-delay-us = <10>; > + vin-supply = <&vcc_io>; > + }; > > vcc_flash: flash-regulator { > compatible = "regulator-fixed"; > @@ -281,6 +292,26 @@ > rockchip,pins = <4 8 RK_FUNC_GPIO &pcfg_output_high>; > }; > }; > + > + sdmmc { > + sdmmc_pwr: sdmmc-pwr { > + rockchip,pins = <7 11 RK_FUNC_GPIO &pcfg_pull_none>; > + }; > + }; > +}; > + > +&sdmmc { > + bus-width = <4>; > + cap-mmc-highspeed; > + cap-sd-highspeed; > + card-detect-delay = <200>; > + disable-wp; > + num-slots = <1>; > + pinctrl-names = "default"; > + pinctrl-0 = <&sdmmc_clk>, <&sdmmc_cmd>, <&sdmmc_cd>, <&sdmmc_bus4>; > + vmmc-supply = <&vcc_sd>; > + vqmmc-supply = <&vccio_sd>; > + status = "okay"; > }; > > &tsadc { >
Re: Device address specific mapping of arm,mmu-500
On 30/05/17 17:49, Ray Jui wrote: > Hi Will, > > On 5/30/17 8:14 AM, Will Deacon wrote: >> On Mon, May 29, 2017 at 06:18:45PM -0700, Ray Jui wrote: >>> I'm writing to check with you to see if the latest arm-smmu.c driver in >>> v4.12-rc Linux for smmu-500 can support mapping that is only specific to >>> a particular physical address range while leave the rest still to be >>> handled by the client device. I believe this can already be supported by >>> the device tree binding of the generic IOMMU framework; however, it is >>> not clear to me whether or not the arm-smmu.c driver can support it. >>> >>> To give you some background information: >>> >>> We have a SoC that has PCIe root complex that has a build-in logic block >>> to forward MSI writes to ARM GICv3 ITS. Unfortunately, this logic block >>> has a HW bug that causes the MSI writes not parsed properly and can >>> potentially corrupt data in the internal FIFO. A workaround is to have >>> ARM MMU-500 takes care of all inbound transactions. I found that is >>> working after hooking up our PCIe root complex to MMU-500; however, even >>> with this optimized arm-smmu driver in v4.12, I'm still seeing a >>> significant Ethernet throughput drop in both the TX and RX directions. >>> The throughput drop is very significant at around 50% (but is already >>> much improved compared to other prior kernel versions at 70~90%). >> >> Did Robin's experiments help at all with this? >> >> http://www.linux-arm.org/git?p=linux-rm.git;a=shortlog;h=refs/heads/iommu/perf >> > > It looks like these are new optimizations that have not yet been merged > in v4.12? I'm going to give it a try. Actually, most of the stuff there did land in 4.12 - only the iommu/pgtable part is experimental stuff which hasn't been on the list yet (but hopefully should be soon). >>> One alternative is to only use MMU-500 for MSI writes towards >>> GITS_TRANSLATER register in the GICv3, i.e., if I can define a specific >>> region of physical address that I want MMU-500 to act on and leave the >>> rest of inbound transactions to be handled directly by our PCIe >>> controller, it can potentially work around the HW bug we have and at the >>> same time achieve optimal throughput. >> >> I don't think you can bypass the SMMU for MSIs unless you give them their >> own StreamIDs, which is likely to break things horribly in the kernel. You >> could try to create an identity mapping, but you'll still have the >> translation overhead and you'd probably end up having to supply your own DMA >> ops to manage the address space. I'm assuming that you need to prevent the >> physical address of the ITS from being allocated as an IOVA? > > Will, is that a HW limitation that the SMMU cannot be used, only for MSI > writes, in which case, the physical address range is very specific in > our ASIC that falls in the device memory region (e.g., below 0x8000)? Yes, either translation is enabled or it isn't - we don't have GART-style apertures. To segregate by address the best you can do is set up the page tables to identity-map all of the "untranslated" address space. As Will mentioned, if MSI writes could be distinguished from DMA writes by Stream ID, rather than by address, then there would be more options, but in the PCI case at least that's not generally possible. Robin. > In fact, what I need in this case is a static mapping from IOMMU on the > physical address of the GITS_TRANSLATER of the GICv3 ITS, which is the > address that MSI writes go to. This is to bypass the MSI forwarding > logic in our PCIe controller. At the same time, I can leave the rest of > inbound transactions to be handled by our PCIe controller without going > through the MMU. > >> >>> Any feedback from you is greatly appreciated! >> >> Fix the hardware ;) > > Indeed that has to happen with the next revision of the ASIC. But as you > can see I'm getting quite desperate here trying to find an interim solution. > >> >> Will >> > > Thanks for the help! > > Ray >
Re: [PATCH v1 0/3] manually add Error Synchronization Barrier at exception handler entry and exit
On 01/11/17 19:14, Dongjiu Geng wrote: > Some hardware platform can support RAS Extension, but not support IESB, > such as Huawei's platform, so software need to insert Synchronization Barrier > operations at exception handler entry. > > This series patches are based on James's series patches "SError rework + > RAS&IESB for firmware first support". In Huawei's platform, we do not > support IESB, so software needs to insert that. > > > Dongjiu Geng (3): > arm64: add a macro for SError synchronization > arm64: add error synchronization barrier in kernel_entry/kernel_exit > KVM: arm64: add ESB in exception handler entry and exit. > > James Morse (18): > arm64: explicitly mask all exceptions > arm64: introduce an order for exceptions > arm64: Move the async/fiq helpers to explicitly set process context > flags > arm64: Mask all exceptions during kernel_exit > arm64: entry.S: Remove disable_dbg > arm64: entry.S: convert el1_sync > arm64: entry.S convert el0_sync > arm64: entry.S: convert elX_irq > KVM: arm/arm64: mask/unmask daif around VHE guests > arm64: kernel: Survive corrected RAS errors notified by SError > arm64: cpufeature: Enable IESB on exception entry/return for > firmware-first > arm64: kernel: Prepare for a DISR user > KVM: arm64: Set an impdef ESR for Virtual-SError using VSESR_EL2. > KVM: arm64: Save/Restore guest DISR_EL1 > KVM: arm64: Save ESR_EL2 on guest SError > KVM: arm64: Handle RAS SErrors from EL1 on guest exit > KVM: arm64: Handle RAS SErrors from EL2 on guest exit > KVM: arm64: Take any host SError before entering the guest > > Xie XiuQi (2): > arm64: entry.S: move SError handling into a C function for future > expansion > arm64: cpufeature: Detect CPU RAS Extentions > > arch/arm64/Kconfig | 33 +- > arch/arm64/include/asm/assembler.h | 59 +--- > arch/arm64/include/asm/barrier.h | 1 + > arch/arm64/include/asm/cpucaps.h | 4 +- > arch/arm64/include/asm/daifflags.h | 61 + > arch/arm64/include/asm/esr.h | 17 +++ > arch/arm64/include/asm/exception.h | 14 ++ > arch/arm64/include/asm/irqflags.h| 40 ++-- > arch/arm64/include/asm/kvm_emulate.h | 10 > arch/arm64/include/asm/kvm_host.h| 16 +++ > arch/arm64/include/asm/processor.h | 2 + > arch/arm64/include/asm/sysreg.h | 6 +++ > arch/arm64/include/asm/traps.h | 36 +++ > arch/arm64/kernel/asm-offsets.c | 1 + > arch/arm64/kernel/cpufeature.c | 43 ++ > arch/arm64/kernel/debug-monitors.c | 5 +- > arch/arm64/kernel/entry.S| 88 > +--- > arch/arm64/kernel/hibernate.c| 5 +- > arch/arm64/kernel/machine_kexec.c| 4 +- > arch/arm64/kernel/process.c | 3 ++ > arch/arm64/kernel/setup.c| 8 ++-- > arch/arm64/kernel/signal.c | 8 +++- > arch/arm64/kernel/smp.c | 12 ++--- > arch/arm64/kernel/suspend.c | 7 +-- > arch/arm64/kernel/traps.c| 64 +- > arch/arm64/kvm/handle_exit.c | 19 +++- > arch/arm64/kvm/hyp-init.S| 3 ++ > arch/arm64/kvm/hyp/entry.S | 15 ++ > arch/arm64/kvm/hyp/hyp-entry.S | 1 + > arch/arm64/kvm/hyp/switch.c | 19 ++-- > arch/arm64/kvm/hyp/sysreg-sr.c | 6 +++ > arch/arm64/kvm/inject_fault.c| 13 +- > arch/arm64/kvm/sys_regs.c| 1 + > arch/arm64/mm/proc.S | 14 -- > virt/kvm/arm/arm.c | 4 ++ > 35 files changed, 527 insertions(+), 115 deletions(-) > create mode 100644 arch/arm64/include/asm/daifflags.h If you're sending a patch series, please have the cover letter describe *those patches*, rather than dozens of other patches that aren't part of it. This diffstat and summary is very confusing. Robin.
Re: [PATCH v1 1/3] arm64: add a macro for SError synchronization
On 01/11/17 19:14, Dongjiu Geng wrote: > ARMv8.2 adds a control bit to each SCTLR_ELx to insert implicit > Error Synchronization Barrier(IESB) operations at exception handler entry > and exit. But not all hardware platform which support RAS Extension > can support IESB. So for this case, software needs to manually insert > Error Synchronization Barrier(ESB) operations. > > In this macros, if system supports RAS Extensdddon instead of IESB, > it will insert an ESB instruction. > > Signed-off-by: Dongjiu Geng > --- > arch/arm64/include/asm/assembler.h | 9 + > 1 file changed, 9 insertions(+) > > diff --git a/arch/arm64/include/asm/assembler.h > b/arch/arm64/include/asm/assembler.h > index d4c0adf..e6c79c4 100644 > --- a/arch/arm64/include/asm/assembler.h > +++ b/arch/arm64/include/asm/assembler.h > @@ -517,4 +517,13 @@ > #endif > .endm > > + .macro error_synchronize > +alternative_if ARM64_HAS_IESB > + b 1f > +alternative_else_nop_endif > +alternative_if ARM64_HAS_RAS_EXTN > + esb > +alternative_else_nop_endif > +1: > + .endm Having a branch in here is pretty horrible, and furthermore using label number 1 has a pretty high chance of subtly breaking code where this macro is inserted. Can we not somehow nest or combine the alternative conditions here? Robin. > #endif /* __ASM_ASSEMBLER_H */ >
Re: [PATCH for-next 2/4] RDMA/hns: Add IOMMU enable support in hip08
On 01/11/17 07:46, Wei Hu (Xavier) wrote: > > > On 2017/10/12 20:59, Robin Murphy wrote: >> On 12/10/17 13:31, Wei Hu (Xavier) wrote: >>> >>> On 2017/10/1 0:10, Leon Romanovsky wrote: >>>> On Sat, Sep 30, 2017 at 05:28:59PM +0800, Wei Hu (Xavier) wrote: >>>>> If the IOMMU is enabled, the length of sg obtained from >>>>> __iommu_map_sg_attrs is not 4kB. When the IOVA is set with the sg >>>>> dma address, the IOVA will not be page continuous. and the VA >>>>> returned from dma_alloc_coherent is a vmalloc address. However, >>>>> the VA obtained by the page_address is a discontinuous VA. Under >>>>> these circumstances, the IOVA should be calculated based on the >>>>> sg length, and record the VA returned from dma_alloc_coherent >>>>> in the struct of hem. >>>>> >>>>> Signed-off-by: Wei Hu (Xavier) >>>>> Signed-off-by: Shaobo Xu >>>>> Signed-off-by: Lijun Ou >>>>> --- >>>> Doug, >>>> >>>> I didn't invest time in reviewing it, but having "is_vmalloc_addr" in >>>> driver code to deal with dma_alloc_coherent is most probably wrong. >>>> >>>> Thanks >>> Hi, Leon & Doug >>> We refered the function named __ttm_dma_alloc_page in the kernel >>> code as below: >>> And there are similar methods in bch_bio_map and mem_to_page >>> functions in current 4.14-rcx. >>> >>> static struct dma_page *__ttm_dma_alloc_page(struct dma_pool *pool) >>> { >>> struct dma_page *d_page; >>> >>> d_page = kmalloc(sizeof(struct dma_page), GFP_KERNEL); >>> if (!d_page) >>> return NULL; >>> >>> d_page->vaddr = dma_alloc_coherent(pool->dev, pool->size, >>>&d_page->dma, >>>pool->gfp_flags); >>> if (d_page->vaddr) { >>> if (is_vmalloc_addr(d_page->vaddr)) >>> d_page->p = vmalloc_to_page(d_page->vaddr); >>> else >>> d_page->p = virt_to_page(d_page->vaddr); >> There are cases on various architectures where neither of those is >> right. Whether those actually intersect with TTM or RDMA use-cases is >> another matter, of course. >> >> What definitely is a problem is if you ever take that page and end up >> accessing it through any virtual address other than the one explicitly >> returned by dma_alloc_coherent(). That can blow the coherency wide open >> and invite data loss, right up to killing the whole system with a >> machine check on certain architectures. >> >> Robin. > Hi, Robin > Thanks for your comment. > > We have one problem and the related code as below. > 1. call dma_alloc_coherent function serval times to alloc memory. > 2. vmap the allocated memory pages. > 3. software access memory by using the return virt addr of vmap > and hardware using the dma addr of dma_alloc_coherent. The simple answer is "don't do that". Seriously. dma_alloc_coherent() gives you a CPU virtual address and a DMA address with which to access your buffer, and that is the limit of what you may infer about it. You have no guarantee that the virtual address is either in the linear map or vmalloc, and not some other special place. You have no guarantee that the underlying memory even has an associated struct page at all. > When IOMMU is disabled in ARM64 architecture, we use virt_to_page() > before vmap(), it works. And when IOMMU is enabled using > virt_to_page() will cause calltrace later, we found the return > addr of dma_alloc_coherent is vmalloc addr, so we add the > condition judgement statement as below, it works. > for (i = 0; i < buf->nbufs; ++i) > pages[i] = > is_vmalloc_addr(buf->page_list[i].buf) ? > vmalloc_to_page(buf->page_list[i].buf) : > virt_to_page(buf->page_list[i].buf); > Can you give us suggestion? better method? Oh my goodness, having now taken a closer look at this driver, I'm lost for words in disbelief. To pick just one example: u32 bits_per_long = BITS_PER_LONG; ... if (bits_per_long == 64) { /* memory mapping nonsense */ } WTF does the size of a long have to do with DMA buffer management!? Of cou
Re: [PATCH v1 1/3] arm64: add a macro for SError synchronization
On 01/11/17 12:54, gengdongjiu wrote: > Hi Robin, > > On 2017/11/1 19:24, Robin Murphy wrote: >>> + esb >>> +alternative_else_nop_endif >>> +1: >>> + .endm >> Having a branch in here is pretty horrible, and furthermore using label >> number 1 has a pretty high chance of subtly breaking code where this >> macro is inserted. >> >> Can we not somehow nest or combine the alternative conditions here? > > I found it will report error if combine the alternative conditions here. > > For example: > > + .macro error_synchronize > +alternative_if ARM64_HAS_IESB > +alternative_if ARM64_HAS_RAS_EXTN > + esb > +alternative_else_nop_endif > +alternative_else_nop_endif > + .endm > > And even using b.eq/cbz instruction in the alternative instruction in > arch/arm64/kernel/entry.S, > it will report Error. > > For example below > > alternative_if ARM64_HAS_PAN > > b.eqx > alternative_else_nop_endif > > I do not dig it deeply, do you know the reason about it or good suggestion > about that? > Thanks a lot in advance. Actually, on second look ARM64_HAS_RAS_EXTN doesn't even matter - ESB is a hint, so if the CPU doesn't have RAS it should behave as a NOP anyway. On which note, since I don't see one here - are any of those other patches defining an "esb" assembly macro similar to the inline asm case? If not then this isn't going to build with older toolchains - perhaps we should just use the raw hint syntax directly. Robin.
Re: [PATCH] arm64: write_sysreg asm illegal for aarch32
On 01/11/17 16:58, Mark Salyzyn wrote: > Cross compiling to aarch32 (for vdso32) using clang correctly > identifies that (the unused) write_sysreg inline asm directive is > illegal in that architectural context: > > arch/arm64/include/asm/arch_timer.h: error: invalid input constraint 'rZ' in > asm > write_sysreg(cntkctl, cntkctl_el1); > ^ > arch/arm64/include/asm/sysreg.h: note: expanded from macro 'write_sysreg' > : : "rZ" (__val)); > ^ > > GCC normally checks for correctness everywhere. But uniquely for > unused asm, will optimize out and suppress the error report. It sounds more like some paths are wrong in the compat vDSO build if it's pulling in this header in the first place - nothing in this file is relevant to AArch32. Robin. > Signed-off-by: Mark Salyzyn > Cc: Catalin Marinas > Cc: Will Deacon > Cc: Christoffer Dall > Cc: Mark Rutland > Cc: Marc Zyngier > Cc: Suzuki K Poulose > Cc: Stefan Traby > Cc: Dave Martin > Cc: linux-arm-ker...@lists.infradead.org > Cc: linux-kernel@vger.kernel.org > --- > arch/arm64/include/asm/sysreg.h | 4 > 1 file changed, 4 insertions(+) > > diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h > index f707fed5886f..a7b61c9327db 100644 > --- a/arch/arm64/include/asm/sysreg.h > +++ b/arch/arm64/include/asm/sysreg.h > @@ -492,11 +492,15 @@ asm( > * The "Z" constraint normally means a zero immediate, but when combined with > * the "%x0" template means XZR. > */ > +#if defined(__aarch64__) > #define write_sysreg(v, r) do { \ > u64 __val = (u64)(v); \ > asm volatile("msr " __stringify(r) ", %x0" \ >: : "rZ" (__val)); \ > } while (0) > +#else > +#define write_sysreg(v, r) BUG() > +#endif > > /* > * For registers without architectural names, or simply unsupported by >
Re: [PATCH] arm64: mm: Set MAX_PHYSMEM_BITS based on ARM64_VA_BITS
On 13/11/17 10:32, Suzuki K Poulose wrote: On 12/11/17 17:55, Jerome Glisse wrote: On Fri, Nov 10, 2017 at 03:11:15PM +, Robin Murphy wrote: On 09/11/17 22:58, Krishna Reddy wrote: MAX_PHYSMEM_BITS greater than ARM64_VA_BITS is causing memory access fault, when HMM_DMIRROR test is enabled. In the failing case, ARM64_VA_BITS=39 and MAX_PHYSMEM_BITS=48. HMM_DMIRROR test selects phys memory range from end based on MAX_PHYSMEM_BITS and gets mapped into VA space linearly. As VA space is 39-bit and phys space is 48-bit, this has caused incorrect mapping and leads to memory access fault. Limiting the MAX_PHYSMEM_BITS to ARM64_VA_BITS fixes the issue and is the right thing instead of hard coding it as 48-bit always. [ 3.378655] Unable to handle kernel paging request at virtual address 3befd00 [ 3.378662] pgd = ff800a04b000 [ 3.378900] [3befd00] *pgd=81fa3003, *pud=81fa3003, *pmd=006268200711 [ 3.378933] Internal error: Oops: 9644 [#1] PREEMPT SMP [ 3.378938] Modules linked in: [ 3.378948] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.9.52-tegra-g91402fdc013b-dirty #51 [ 3.378950] Hardware name: quill (DT) [ 3.378954] task: ffc1ebac task.stack: ffc1eba64000 [ 3.378967] PC is at __memset+0x1ac/0x1d0 [ 3.378976] LR is at sparse_add_one_section+0xf8/0x174 [ 3.378981] pc : [] lr : [] pstate: 404000c5 [ 3.378983] sp : ffc1eba67a40 [ 3.378993] x29: ffc1eba67a40 x28: [ 3.378999] x27: 0003 x26: 0040 [ 3.379005] x25: 03ff x24: ffc1e9f6cf80 [ 3.379010] x23: ff8009ecb2d4 x22: 03befd00 [ 3.379015] x21: ffc1e9923ff0 x20: 0003 [ 3.379020] x19: ffef x18: [ 3.379025] x17: 24d7 x16: [ 3.379030] x15: ff8009cd8690 x14: ffc1e9f6c70c [ 3.379035] x13: ffc1e9f6c70b x12: 0030 [ 3.379039] x11: 0040 x10: 0101010101010101 [ 3.379044] x9 : x8 : 03befd00 [ 3.379049] x7 : x6 : 003f [ 3.379053] x5 : 0040 x4 : [ 3.379058] x3 : 0004 x2 : 00c0 [ 3.379063] x1 : x0 : 03befd00 [ 3.379064] [ 3.379069] Process swapper/0 (pid: 1, stack limit = 0xffc1eba64028) [ 3.379071] Call trace: [ 3.379079] [] __memset+0x1ac/0x1d0 What's the deal with this memset? AFAICS we're in __add_pages() from hmm_devmem_pages_create() calling add_pages() for private memory which it does not expect to be in the linear map anyway :/ FWIW I did keep looking, and I now see that, thanks to confusing inlining, this is probably the clearing of the vmemmap section in sparse_add_one_section(), rather than any touching of the new memory itself. The fact that the commit message doesn't even try to explain the real problem (seemingly that the index has overflowed the vmemmap area and wrapped past the top of the address space) only emphasises my concern that this is a bit of a hack, though. There appears to be a more fundamental problem being papered over here. Following some discussion with Suzuki and Catalin, there does seem to be a more general issue of the interaction between vmemmap and memory hotplug. Of course, arm64 doesn't support memory hotplug in mainline (it's something I've started looking at), nor other HMM dependencies, so there's already more going on here than meets the eye. Yes i think the dummy driver is use badly, if you want to test CDM memory with dummy driver you need to steal regular memory to act as CDM memory. You can take a look at following 2 patches: https://cgit.freedesktop.org/~glisse/linux/commit/?h=hmm-cdm-next&id=fcc1e94027dbee9525f75b2a9ad88b2e6279558a https://cgit.freedesktop.org/~glisse/linux/commit/?h=hmm-cdm-next&id=84204c5be742186236b371ea2f7ad39bf1770fe6 Note that this is only if your device have its own memory that is not reported as regular ram to kernel resource and if that memory is accessible by CPU in cache coherent way. For tegra platform i don't think you have any such memory. Thus you do not need to register any memory to use HMM. But we can talk about your platform in private mail under NDA if it is not the case. Note that no matter what i still think it make sense to properly define MAX_PHYSMEM_BITS like on x86 or powerpc. Its a bit tricky on arm64. Essentially the VMEMMAP can cover a region of (1 << (VA_BITS - 1)), the size of the linear map. However, it does allow the physical memory to be above VA_BITS. e.g, one can boot a 39bit VA kernel on a system where the RAM is at 40bits. We adjust the "vmemmap" to make sure that it points to the first "valid" pfn (based on the start address of the memory). If we reduce the MAX_PHYSMEM_BITS to VA_BITS, that could
Re: [PATCH 3/3] arm64: Add software workaround for Falkor erratum 1041
On 03/11/17 03:27, Shanker Donthineni wrote: > The ARM architecture defines the memory locations that are permitted > to be accessed as the result of a speculative instruction fetch from > an exception level for which all stages of translation are disabled. > Specifically, the core is permitted to speculatively fetch from the > 4KB region containing the current program counter and next 4KB. > > When translation is changed from enabled to disabled for the running > exception level (SCTLR_ELn[M] changed from a value of 1 to 0), the > Falkor core may errantly speculatively access memory locations outside > of the 4KB region permitted by the architecture. The errant memory > access may lead to one of the following unexpected behaviors. > > 1) A System Error Interrupt (SEI) being raised by the Falkor core due >to the errant memory access attempting to access a region of memory >that is protected by a slave-side memory protection unit. > 2) Unpredictable device behavior due to a speculative read from device >memory. This behavior may only occur if the instruction cache is >disabled prior to or coincident with translation being changed from >enabled to disabled. > > To avoid the errant behavior, software must execute an ISB immediately > prior to executing the MSR that will change SCTLR_ELn[M] from 1 to 0. > > Signed-off-by: Shanker Donthineni > --- > Documentation/arm64/silicon-errata.txt | 1 + > arch/arm64/Kconfig | 10 ++ > arch/arm64/include/asm/assembler.h | 17 + > arch/arm64/include/asm/cpucaps.h | 3 ++- > arch/arm64/kernel/cpu_errata.c | 16 > arch/arm64/kernel/efi-entry.S | 4 ++-- > arch/arm64/kernel/head.S | 4 ++-- > 7 files changed, 50 insertions(+), 5 deletions(-) > > diff --git a/Documentation/arm64/silicon-errata.txt > b/Documentation/arm64/silicon-errata.txt > index 66e8ce1..704770c0 100644 > --- a/Documentation/arm64/silicon-errata.txt > +++ b/Documentation/arm64/silicon-errata.txt > @@ -74,3 +74,4 @@ stable kernels. > | Qualcomm Tech. | Falkor v1 | E1003 | > QCOM_FALKOR_ERRATUM_1003| > | Qualcomm Tech. | Falkor v1 | E1009 | > QCOM_FALKOR_ERRATUM_1009| > | Qualcomm Tech. | QDF2400 ITS | E0065 | > QCOM_QDF2400_ERRATUM_0065 | > +| Qualcomm Tech. | Falkor v{1,2} | E1041 | > QCOM_FALKOR_ERRATUM_1041| > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index 0df64a6..7e933fb 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -539,6 +539,16 @@ config QCOM_QDF2400_ERRATUM_0065 > > If unsure, say Y. > > +config QCOM_FALKOR_ERRATUM_1041 > + bool "Falkor E1041: Speculative instruction fetches might cause errant > memory access" > + default y > + help > + Falkor CPU may speculatively fetch instructions from an improper > + memory location when MMU translation is changed from SCTLR_ELn[M]=1 > + to SCTLR_ELn[M]=0. Prefix an ISB instruction to fix the problem. > + > + If unsure, say Y. > + > endmenu > > > diff --git a/arch/arm64/include/asm/assembler.h > b/arch/arm64/include/asm/assembler.h > index b6dfb4f..4c91efb 100644 > --- a/arch/arm64/include/asm/assembler.h > +++ b/arch/arm64/include/asm/assembler.h > @@ -30,6 +30,7 @@ > #include > #include > #include > +#include > > /* > * Enable and disable interrupts. > @@ -514,6 +515,22 @@ > * reg: the value to be written. > */ > .macro write_sctlr, eln, reg > +#ifdef CONFIG_QCOM_FALKOR_ERRATUM_1041 > +alternative_if ARM64_WORKAROUND_QCOM_FALKOR_E1041 > + tbnz\reg, #0, 8000f // enable MMU? Do we really need the branch here? It's not like enabling the MMU is something we do on the syscall fastpath, and I can't imagine an extra ISB hurts much (and is probably comparable to a mispredicted branch anyway). In fact, is there any noticeable hit on other microarchitectures if we save the alternative bother and just do it unconditionally always? Robin. > + isb > +8000: > +alternative_else_nop_endif > +#endif > + msr sctlr_\eln, \reg > + .endm > + > + .macro early_write_sctlr, eln, reg > +#ifdef CONFIG_QCOM_FALKOR_ERRATUM_1041 > + tbnz\reg, #0, 8000f // enable MMU? > + isb > +8000: > +#endif > msr sctlr_\eln, \reg > .endm > > diff --git a/arch/arm64/include/asm/cpucaps.h > b/arch/arm64/include/asm/cpucaps.h > index 8da6216..7f7a59d 100644 > --- a/arch/arm64/include/asm/cpucaps.h > +++ b/arch/arm64/include/asm/cpucaps.h > @@ -40,7 +40,8 @@ > #define ARM64_WORKAROUND_858921 19 > #define ARM64_WORKAROUND_CAVIUM_3011520 > #define ARM64_HAS_DCPOP 21 > +#define ARM64_WORKAROUND_QCOM_FALKOR_E1041 22 > > -#define ARM64_NCAPS 22 > +#define ARM64_NCAPS 23 > > #endif /* __A
Re: [PATCH v2] iommu/iova: Use raw_cpu_ptr() instead of get_cpu_ptr() for ->fq
On 06/11/17 15:43, Alex Williamson wrote: > [cc +robin] > > On Thu, 2 Nov 2017 18:33:50 +0100 > Sebastian Andrzej Siewior wrote: > >> On 2017-09-21 17:21:40 [+0200], Sebastian Andrzej Siewior wrote: >>> get_cpu_ptr() disabled preemption and returns the ->fq object of the >>> current CPU. raw_cpu_ptr() does the same except that it not disable >>> preemption which means the scheduler can move it to another CPU after it >>> obtained the per-CPU object. >>> In this case this is not bad because the data structure itself is >>> protected with a spin_lock. This change shouldn't matter however on RT >>> it does because the sleeping lock can't be accessed with disabled >>> preemption. >> >> Did this make to your tree Jörg? > > Hi Sebastian, > > Joerg is out on paternity leave through the end of the year, I'm > filling in in the interim. I hadn't looked for patches this far back, > so thanks for pointing it out. Robin, any comments? Thanks, The reasoning seems sound - assuming we can't get preempted once the lock is actually taken, the worst side-effect of getting moved to another CPU in that slim window looks to be a little bit of lock contention. Operating on "someone else's" queue should have no correctness impact, and the cmpxchg after the lock is released isn't even working on percpu data so doesn't really care anyway. AFAICS it's more or less the direct equivalent of aaffaa8a3b59 ("iommu/iova: Don't disable preempt around this_cpu_ptr()"), which seems to have been working out OK. Robin. > Alex > >>> Cc: Joerg Roedel >>> Cc: io...@lists.linux-foundation.org >>> Reported-by: vina...@gmail.com >>> Signed-off-by: Sebastian Andrzej Siewior >>> --- >>> On 2017-09-19 11:41:19 [+0200], Joerg Roedel wrote: Hi Sebastian, >>> Hi Jörg, >>> I moved the flushing to driver/iommu/iova.c to share it with the Intel IOMMU and possibly other drivers too, so this patch does no longer apply to v4.14-rc1. Can you update the patch to these changes? >>> >>> Sure. >>> >>> v1…v2: move the change from amd_iommu.c to iova.c >>> >>> drivers/iommu/iova.c | 4 +--- >>> 1 file changed, 1 insertion(+), 3 deletions(-) >>> >>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c >>> index 33edfa794ae9..b30900025c62 100644 >>> --- a/drivers/iommu/iova.c >>> +++ b/drivers/iommu/iova.c >>> @@ -570,7 +570,7 @@ void queue_iova(struct iova_domain *iovad, >>> unsigned long pfn, unsigned long pages, >>> unsigned long data) >>> { >>> - struct iova_fq *fq = get_cpu_ptr(iovad->fq); >>> + struct iova_fq *fq = raw_cpu_ptr(iovad->fq); >>> unsigned long flags; >>> unsigned idx; >>> >>> @@ -600,8 +600,6 @@ void queue_iova(struct iova_domain *iovad, >>> if (atomic_cmpxchg(&iovad->fq_timer_on, 0, 1) == 0) >>> mod_timer(&iovad->fq_timer, >>> jiffies + msecs_to_jiffies(IOVA_FQ_TIMEOUT)); >>> - >>> - put_cpu_ptr(iovad->fq); >>> } >>> EXPORT_SYMBOL_GPL(queue_iova); >>> >>> -- >>> 2.14.1 >>> >> >> Sebastian >> ___ >> iommu mailing list >> io...@lists.linux-foundation.org >> https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/3] dmaengine: qcom_hidma: add support for the new revision
On 06/11/17 17:26, Sinan Kaya wrote: > Add support for probing the newer HW. > > Signed-off-by: Sinan Kaya > --- > drivers/dma/qcom/hidma.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/dma/qcom/hidma.c b/drivers/dma/qcom/hidma.c > index e366985..29d6aaa 100644 > --- a/drivers/dma/qcom/hidma.c > +++ b/drivers/dma/qcom/hidma.c > @@ -749,9 +749,13 @@ static bool hidma_msi_capable(struct device *dev) > return false; > > ret = strcmp(of_compat, "qcom,hidma-1.1"); > + if (ret) > + ret = strcmp(of_compat, "qcom,hidma-1.2"); > } else { > #ifdef CONFIG_ACPI > ret = strcmp(acpi_device_hid(adev), "QCOM8062"); > + if (ret) > + ret = strcmp(acpi_device_hid(adev), "QCOM8063"); This string-juggling looks to have already hit the point at which it doesn't scale well - it would be a lot nicer to make use of of_device_get_match_data() and the ACPI equivalent to abstract the version-specific data appropriately. Robin. > #endif > } > return ret == 0; > @@ -954,6 +958,7 @@ static int hidma_remove(struct platform_device *pdev) > static const struct acpi_device_id hidma_acpi_ids[] = { > {"QCOM8061"}, > {"QCOM8062"}, > + {"QCOM8063"}, > {}, > }; > MODULE_DEVICE_TABLE(acpi, hidma_acpi_ids); > @@ -962,6 +967,7 @@ static int hidma_remove(struct platform_device *pdev) > static const struct of_device_id hidma_match[] = { > {.compatible = "qcom,hidma-1.0",}, > {.compatible = "qcom,hidma-1.1",}, > + {.compatible = "qcom,hidma-1.2",}, > {}, > }; > MODULE_DEVICE_TABLE(of, hidma_match); >
Re: [PATCH 3/4] iommu/arm-smmu-v3: Use NUMA memory allocations for stream tables and comamnd queues
On 04/10/17 14:53, Ganapatrao Kulkarni wrote: > Hi Robin, > > > On Thu, Sep 21, 2017 at 5:28 PM, Robin Murphy wrote: >> [+Christoph and Marek] >> >> On 21/09/17 09:59, Ganapatrao Kulkarni wrote: >>> Introduce smmu_alloc_coherent and smmu_free_coherent functions to >>> allocate/free dma coherent memory from NUMA node associated with SMMU. >>> Replace all calls of dmam_alloc_coherent with smmu_alloc_coherent >>> for SMMU stream tables and command queues. >> >> This doesn't work - not only do you lose the 'managed' aspect and risk >> leaking various tables on probe failure or device removal, but more >> importantly, unless you add DMA syncs around all the CPU accesses to the >> tables, you lose the critical 'coherent' aspect, and that's a horribly >> invasive change that I really don't want to make. > > this implementation is similar to function used to allocate memory for > translation tables. The concept is similar, yes, and would work if implemented *correctly* with the aforementioned comprehensive and hugely invasive changes. The implementation as presented in this patch, however, is incomplete and badly broken. By way of comparison, the io-pgtable implementations contain all the necessary dma_sync_* calls, never relied on devres, and only have one DMA direction to worry about (hint: the queues don't all work identically). There are also a couple of practical reasons for using streaming mappings with the DMA == phys restriction there - tracking both the CPU and DMA addresses for each table would significantly increase the memory overhead, and using the cacheable linear map address in all cases sidesteps any potential problems with the atomic PTE updates. Neither of those concerns apply to the SMMUv3 data structures, which are textbook coherent DMA allocations (being tied to the lifetime of the device, rather than transient). > why do you see it affects to stream tables and not to page tables. > at runtime, both tables are accessed by SMMU only. > > As said in cover letter, having stream table from respective NUMA node > is yielding > around 30% performance! > please suggest, if there is any better way to address this issue? I fully agree that NUMA-aware allocations are a worthwhile thing that we want. I just don't like the idea of going around individual drivers replacing coherent API usage with bodged-up streaming mappings - I really think it's worth making the effort to to tackle it once, in the proper place, in a way that benefits all users together. Robin. >> >> Christoph, Marek; how reasonable do you think it is to expect >> dma_alloc_coherent() to be inherently NUMA-aware on NUMA-capable >> systems? SWIOTLB looks fairly straightforward to fix up (for the simple >> allocation case; I'm not sure it's even worth it for bounce-buffering), >> but the likes of CMA might be a little trickier... >> >> Robin. >> >>> Signed-off-by: Ganapatrao Kulkarni >>> --- >>> drivers/iommu/arm-smmu-v3.c | 57 >>> - >>> 1 file changed, 51 insertions(+), 6 deletions(-) >>> >>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c >>> index e67ba6c..bc4ba1f 100644 >>> --- a/drivers/iommu/arm-smmu-v3.c >>> +++ b/drivers/iommu/arm-smmu-v3.c >>> @@ -1158,6 +1158,50 @@ static void arm_smmu_init_bypass_stes(u64 *strtab, >>> unsigned int nent) >>> } >>> } >>> >>> +static void *smmu_alloc_coherent(struct arm_smmu_device *smmu, size_t size, >>> + dma_addr_t *dma_handle, gfp_t gfp) >>> +{ >>> + struct device *dev = smmu->dev; >>> + void *pages; >>> + dma_addr_t dma; >>> + int numa_node = dev_to_node(dev); >>> + >>> + pages = alloc_pages_exact_nid(numa_node, size, gfp | __GFP_ZERO); >>> + if (!pages) >>> + return NULL; >>> + >>> + if (!(smmu->features & ARM_SMMU_FEAT_COHERENCY)) { >>> + dma = dma_map_single(dev, pages, size, DMA_TO_DEVICE); >>> + if (dma_mapping_error(dev, dma)) >>> + goto out_free; >>> + /* >>> + * We depend on the SMMU being able to work with any physical >>> + * address directly, so if the DMA layer suggests otherwise by >>> + * translating or truncating them, that bodes very badly... >>> + */ >>> + if (dma != virt_to_phys(pages)) >>> + goto out_unmap; >
Re: [PATCH 4/4] dma-debug: Allow poisoning nonzero allocations
On 29/09/15 22:27, Andrew Morton wrote: [...] If I'm understanding things correctly, some allocators zero the memory by default and others do not. And we have an unknown number of drivers which are assuming that the memory is zeroed. Correct? That's precisely the motivation here, yes. If so, our options are a) audit all callers, find the ones which expect zeroed memory but aren't passing __GFP_ZERO and fix them. b) convert all allocators to zero the memory by default. Obviously, a) is better. How big a job is it? This I'm not so sure of, hence the very tentative first step. For a very crude guess at an an upper bound: $ git grep -E '(dma|pci)_alloc_co(her|nsist)ent' drivers/ | wc -l 1148 vs. $ git grep -E '(dma|pci)_zalloc_co(her|nsist)ent' drivers/ | wc -l 234 noting that the vast majority of the former are still probably benign, but picking out those which aren't from the code alone without knowledge of and/or access to the hardware might be non-trivial. This patch will help the process, if people use it. + if (IS_ENABLED(CONFIG_DMA_API_DEBUG_POISON) && !(flags & __GFP_ZERO)) + memset(virt, DMA_ALLOC_POISON, size); + This is likely to be slow in the case of non-cached memory and large allocations. The config option should come with a warning. It depends on DMA_API_DEBUG, which already has a stern performance warning, is additionally hidden behind EXPERT, and carries a slightly flippant yet largely truthful warning that actually using it could break pretty much every driver in your system; is that not enough? It might be helpful to provide a runtime knob as well - having to rebuild&reinstall just to enable/disable this feature is a bit painful. Good point - there's always the global DMA debug disable knob, but this particular feature probably does warrant finer-grained control to be really practical. Having thought about it some more, it's also probably wrong that this doesn't respect the dma_debug_driver filter, given that it is actually invasive; in fixing that, how about if it also *only* applied when a specific driver is filtered? Then there would be no problematic "break anything and everything" mode, and the existing debugfs controls should suffice. Robin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 2/3] Add iommu driver for hi6220 SoC platform
On 20/10/15 09:45, Chen Feng wrote: iommu/hisilicon: Add hi6220-SoC smmu driver A brief description of the smmu and what capabilities it provides wouldn't go amiss here. Signed-off-by: Chen Feng Signed-off-by: Yu Dongbin --- drivers/iommu/Kconfig| 8 + drivers/iommu/Makefile | 1 + drivers/iommu/hi6220_iommu.c | 482 +++ 3 files changed, 491 insertions(+) create mode 100644 drivers/iommu/hi6220_iommu.c diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 4664c2a..9b41708 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -343,6 +343,14 @@ config SPAPR_TCE_IOMMU Enables bits of IOMMU API required by VFIO. The iommu_ops is not implemented as it is not necessary for VFIO. +config HI6220_IOMMU + bool "Hi6220 IOMMU Support" + depends on ARM64 + select IOMMU_API + select IOMMU_IOVA + help + Enable IOMMU Driver for hi6220 SoC. + # ARM IOMMU support config ARM_SMMU bool "ARM Ltd. System MMU (SMMU) Support" diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index c6dcc51..ee4dfef 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -23,3 +23,4 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o obj-$(CONFIG_SHMOBILE_IOMMU) += shmobile-iommu.o obj-$(CONFIG_SHMOBILE_IPMMU) += shmobile-ipmmu.o obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o +obj-$(CONFIG_HI6220_IOMMU) += hi6220_iommu.o It would be nice to insert this before CONFIG_INTEL_IOMMU to try to maintain some semblance of order. diff --git a/drivers/iommu/hi6220_iommu.c b/drivers/iommu/hi6220_iommu.c new file mode 100644 index 000..6c5198d --- /dev/null +++ b/drivers/iommu/hi6220_iommu.c @@ -0,0 +1,482 @@ +/* + * Hisilicon Hi6220 IOMMU driver + * + * Copyright (c) 2015 Hisilicon Limited. + * + * Author: Chen Feng + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#define pr_fmt(fmt) "IOMMU: " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include Those last 3 spoil the ordering here. Plus, do you really need linux/interrupt.h twice? +#define SMMU_CTRL_OFFSET (0x) +#define SMMU_ENABLE_OFFSET (0x0004) +#define SMMU_PTBR_OFFSET (0x0008) +#define SMMU_START_OFFSET(0x000C) +#define SMMU_END_OFFSET (0x0010) +#define SMMU_INTMASK_OFFSET (0x0014) +#define SMMU_RINTSTS_OFFSET (0x0018) +#define SMMU_MINTSTS_OFFSET (0x001C) +#define SMMU_INTCLR_OFFSET (0x0020) +#define SMMU_STATUS_OFFSET (0x0024) +#define SMMU_AXIID_OFFSET(0x0028) +#define SMMU_CNTCTRL_OFFSET (0x002C) +#define SMMU_TRANSCNT_OFFSET (0x0030) +#define SMMU_L0TLBHITCNT_OFFSET (0x0034) +#define SMMU_L1TLBHITCNT_OFFSET (0x0038) +#define SMMU_WRAPCNT_OFFSET (0x003C) +#define SMMU_SEC_START_OFFSET(0x0040) +#define SMMU_SEC_END_OFFSET (0x0044) +#define SMMU_VERSION_OFFSET (0x0048) +#define SMMU_IPTSRC_OFFSET (0x004C) +#define SMMU_IPTPA_OFFSET(0x0050) +#define SMMU_TRBA_OFFSET (0x0054) +#define SMMU_BYS_START_OFFSET(0x0058) +#define SMMU_BYS_END_OFFSET (0x005C) +#define SMMU_RAM_OFFSET (0x1000) +#define SMMU_REGS_MAX(15) This is confusing; I count 24 registers listed above, what is 15 the maximum of? +#define SMMU_REGS_SGMT_END (0x60) +#define SMMU_CHIP_ID_V100(1) +#define SMMU_CHIP_ID_V200(2) + +#define SMMU_REGS_OPS_SEGMT_START(0xf00) +#define SMMU_REGS_OPS_SEGMT_NUMB (8) +#define SMMU_REGS_AXI_SEGMT_START(0xf80) +#define SMMU_REGS_AXI_SEGMT_NUMB (8) + +#define SMMU_INIT(0x1) +#define SMMU_RUNNING (0x2) +#define SMMU_SUSPEND (0x3) +#define SMMU_STOP(0x4) +#define SMMU_CTRL_INVALID(BIT(10)) +#define PAGE_ENTRY_VALID (0x1) + +#define IOVA_START_PFN (1) +#define IOPAGE_SHIFT (12) +#define IOVA_PFN(addr) ((addr) >> IOPAGE_SHIFT) +#define IOVA_PAGE_SZ (SZ_4K) +#define IOVA_START (0x2000) This doesn't match up - surely either IOVA_START_PFN should be 2, or IOVA_START should be 0x1000. +#define IOVA_END (0x8000) + +struct hi6220_smmu { + unsigned int irq; + irq_handler_t smmu_isr; + void __iomem *reg_base; + struct clk *smmu_peri_clk; + struct clk *smmu_clk; + struct clk *media_sc_clk; + size_t page_size; + spinlock_t spinlock; /*spinlock for tlb inv
Re: framebuffer corruption due to overlapping stp instructions on arm64
On 06/08/18 11:25, Mikulas Patocka wrote: [...] None of this explains why some transactions fail to make it across entirely. The overlapping writes in question write the same data to the memory locations that are covered by both, and so the ordering in which the transactions are received should not affect the outcome. You're right that the corruption couldn't be explained just by reordering writes. My hypothesis is that the PCIe controller tries to disambiguate the overlapping writes, but the disambiguation logic was not tested and it is buggy. If there's a barrier between the overlapping writes, the PCIe controller won't see any overlapping writes, so it won't trigger the faulty disambiguation logic and it works. Could the ARM engineers look if there's some chicken bit in Cortex-A72 that could insert barriers between non-cached writes automatically? I don't think there is, and even if there was I imagine it would have a pretty hideous effect on non-coherent DMA buffers and the various other places in which we have Normal-NC mappings of actual system RAM. I observe these kinds of corruptions: - failing to write a few bytes That could potentially be explained by the reordering/atomicity issues Matt mentioned, i.e. the load is observing part of the store, before the store has fully completed. - writing a few bytes that were written 16 bytes before - writing a few bytes that were written 16 bytes after Those sound more like the interconnect or root complex ignoring the byte strobes on an unaligned burst, of which I think the simplistic view would be "it's broken". FWIW I stuck my old Nvidia 7600GT card in my Arm Juno r2 board (2x Cortex-A72), built your test program natively with GCC 8.1.1 at -O2, and it's still happily flickering pixels in the corner of the console after nearly an hour (in parallel with some iperf3 just to ensure plenty of PCIe traffic). I would strongly suspect this issue is particular to Armada 8k, so its' probably one for the Marvell folks to take a closer look at - I believe some previous interconnect issues on those SoCs were actually fixable in firmware. Robin.
Re: [PATCH] arm64: Trap WFI executed in userspace
On 07/08/18 11:30, Dave Martin wrote: On Tue, Aug 07, 2018 at 11:24:34AM +0100, Marc Zyngier wrote: On 07/08/18 11:05, Dave Martin wrote: On Tue, Aug 07, 2018 at 10:33:26AM +0100, Marc Zyngier wrote: It recently came to light that userspace can execute WFI, and that the arm64 kernel doesn trap this event. This sounds rather benign, but the kernel should decide when it wants to wait for an interrupt, and not userspace. Let's trap WFI and treat it as a way to yield the CPU to another process. This doesn't amount to a justification. If the power controller is unexpectedly left in a bad state so that WFI will do something nasty to a cpu that may enter userspace, then we probably have bigger problems. So, maybe it really is pretty harmless to let userspace execute this. Or not. It is also a very good way for userspace to find out when an interrupt gets delivered and start doing all kind of probing on the kernel. The least the userspace knows about that, the better I feel. Possibly. I suspect there are other ways to guess pretty accurately when an interrupt occurs, but WFI allows greater precision. ...unless you're running in a VM and it traps to KVM anyway ;) I can't think of a legitimate reason for userspace to execute WFI however. Userspace doesn't have interrupts under Linux, so it makes no sense to wait for one. Have we seen anybody using WFI in userspace? It may be cleaner to map this to SIGILL rather than be permissive and regret it later. I couldn't find any user, and I'm happy to just send userspace to hell in that case. But it could also been said that since it was never prevented, it is a de-facto ABI. Agreed. I wonder whether it's sufficient to have this mapping to SIGILL in -next for a while and see whether anybody complains. I think we'd have to avoid that for compat, though, since v7 code would have the expectation that WFI can't be trapped by the kernel at all. Personally I'm in favour of this patch as-is, since the architectural intent of the instruction is essentially "I've got nothing better to do right now than wait for something to happen", so treating it as a poor man's sched_vield() stays close to that while mitigating the potential nefarious and/or minor DoS implications of letting it architecturally execute. Robin.
Re: [PATCH v14 4/4] iommu/arm-smmu: Add support for qcom,smmu-v2 variant
On 27/07/18 08:02, Vivek Gautam wrote: qcom,smmu-v2 is an arm,smmu-v2 implementation with specific clock and power requirements. This smmu core is used with multiple masters on msm8996, viz. mdss, video, etc. Add bindings for the same. Signed-off-by: Vivek Gautam Reviewed-by: Rob Herring Reviewed-by: Tomasz Figa --- Change since v13: - No change. .../devicetree/bindings/iommu/arm,smmu.txt | 42 ++ drivers/iommu/arm-smmu.c | 13 +++ 2 files changed, 55 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt index 8a6ffce12af5..7c71a6ed465a 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt @@ -17,10 +17,19 @@ conditions. "arm,mmu-401" "arm,mmu-500" "cavium,smmu-v2" +"qcom,-smmu-v2", "qcom,smmu-v2" depending on the particular implementation and/or the version of the architecture implemented. + A number of Qcom SoCs use qcom,smmu-v2 version of the IP. + "qcom,-smmu-v2" represents a soc specific compatible + string that should be present along with the "qcom,smmu-v2" + to facilitate SoC specific clocks/power connections and to + address specific bug fixes. As demonstrated in the GPU thread, this proves a bit too vague for a useful binding. Provided Qcom folks can reach a consensus on what a given SoC is actually called, I'd rather just unambiguously list whatever sets of fully-defined strings we need. Robin. + An example string would be - + "qcom,msm8996-smmu-v2", "qcom,smmu-v2". + - reg : Base address and size of the SMMU. - #global-interrupts : The number of global interrupts exposed by the @@ -71,6 +80,22 @@ conditions. or using stream matching with #iommu-cells = <2>, and may be ignored if present in such cases. +- clock-names:List of the names of clocks input to the device. The + required list depends on particular implementation and + is as follows: + - for "qcom,smmu-v2": +- "bus": clock required for downstream bus access and + for the smmu ptw, +- "iface": clock required to access smmu's registers + through the TCU's programming interface. + - unspecified for other implementations. + +- clocks: Specifiers for all clocks listed in the clock-names property, + as per generic clock bindings. + +- power-domains: Specifiers for power domains required to be powered on for + the SMMU to operate, as per generic power domain bindings. + ** Deprecated properties: - mmu-masters (deprecated in favour of the generic "iommus" binding) : @@ -137,3 +162,20 @@ conditions. iommu-map = <0 &smmu3 0 0x400>; ... }; + + /* Qcom's arm,smmu-v2 implementation */ + smmu4: iommu { + compatible = "qcom,msm8996-smmu-v2", "qcom,smmu-v2"; + reg = <0xd0 0x1>; + + #global-interrupts = <1>; + interrupts = , +, +; + #iommu-cells = <1>; + power-domains = <&mmcc MDSS_GDSC>; + + clocks = <&mmcc SMMU_MDP_AXI_CLK>, +<&mmcc SMMU_MDP_AHB_CLK>; + clock-names = "bus", "iface"; + }; diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index e558abf1ecfc..2b4edba188a5 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -119,6 +119,7 @@ enum arm_smmu_implementation { GENERIC_SMMU, ARM_MMU500, CAVIUM_SMMUV2, + QCOM_SMMUV2, }; struct arm_smmu_s2cr { @@ -1971,6 +1972,17 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU); ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500); ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2); +static const char * const qcom_smmuv2_clks[] = { + "bus", "iface", +}; + +static const struct arm_smmu_match_data qcom_smmuv2 = { + .version = ARM_SMMU_V2, + .model = QCOM_SMMUV2, + .clks = qcom_smmuv2_clks, + .num_clks = ARRAY_SIZE(qcom_smmuv2_clks), +}; + static const struct of_device_id arm_smmu_of_match[] = { { .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 }, { .compatible = "arm,smmu-v2", .data = &smmu_generic_v2 }, @@ -1978,6 +1990,7 @@ static const struct of_device_id arm_smmu_of_match[] = { { .compatible = "arm,mmu-401",
Re: [PATCH mmc-next v2 3/3] mmc: sdhci-of-dwcmshc: solve 128MB DMA boundary limitation
Hi Jisheng, On 26/07/18 08:14, Jisheng Zhang wrote: When using DMA, if the DMA addr spans 128MB boundary, we have to split the DMA transfer into two so that each one doesn't exceed the boundary. Out of interest, is the driver already setting its segment boundary mask appropriately? This sounds like the exact kind of hardware restriction that dma_parms is intended to describe, which scatterlist-generating code is *supposed* to already respect. Robin. Signed-off-by: Jisheng Zhang --- drivers/mmc/host/sdhci-of-dwcmshc.c | 42 + 1 file changed, 42 insertions(+) diff --git a/drivers/mmc/host/sdhci-of-dwcmshc.c b/drivers/mmc/host/sdhci-of-dwcmshc.c index 1b7cd144fb01..7e189514bc83 100644 --- a/drivers/mmc/host/sdhci-of-dwcmshc.c +++ b/drivers/mmc/host/sdhci-of-dwcmshc.c @@ -8,21 +8,51 @@ */ #include +#include #include #include #include "sdhci-pltfm.h" +#define BOUNDARY_OK(addr, len) \ + ((addr | (SZ_128M - 1)) == ((addr + len - 1) | (SZ_128M - 1))) + struct dwcmshc_priv { struct clk *bus_clk; }; +/* + * if DMA addr spans 128MB boundary, we split the DMA transfer into two + * so that the DMA transfer doesn't exceed the boundary. + */ +static unsigned int dwcmshc_adma_write_desc(struct sdhci_host *host, + void *desc, dma_addr_t addr, + int len, unsigned int cmd) +{ + int tmplen, offset; + + if (BOUNDARY_OK(addr, len) || !len) + return _sdhci_adma_write_desc(host, desc, addr, len, cmd); + + offset = addr & (SZ_128M - 1); + tmplen = SZ_128M - offset; + _sdhci_adma_write_desc(host, desc, addr, tmplen, cmd); + + addr += tmplen; + len -= tmplen; + desc += host->desc_sz; + _sdhci_adma_write_desc(host, desc, addr, len, cmd); + + return host->desc_sz * 2; +} + static const struct sdhci_ops sdhci_dwcmshc_ops = { .set_clock = sdhci_set_clock, .set_bus_width = sdhci_set_bus_width, .set_uhs_signaling = sdhci_set_uhs_signaling, .get_max_clock = sdhci_pltfm_clk_get_max_clock, .reset = sdhci_reset, + .adma_write_desc= dwcmshc_adma_write_desc, }; static const struct sdhci_pltfm_data sdhci_dwcmshc_pdata = { @@ -36,12 +66,24 @@ static int dwcmshc_probe(struct platform_device *pdev) struct sdhci_host *host; struct dwcmshc_priv *priv; int err; + u32 extra; host = sdhci_pltfm_init(pdev, &sdhci_dwcmshc_pdata, sizeof(struct dwcmshc_priv)); if (IS_ERR(host)) return PTR_ERR(host); + /* +* The DMA descriptor table number is calculated as the maximum +* number of segments times 2, to allow for an alignment +* descriptor for each segment, plus 1 for a nop end descriptor, +* plus extra number for cross 128M boundary handling. +*/ + extra = DIV_ROUND_UP(totalram_pages, SZ_128M / PAGE_SIZE); + if (extra > SDHCI_MAX_SEGS) + extra = SDHCI_MAX_SEGS; + host->adma_table_num = SDHCI_MAX_SEGS * 2 + 1 + extra; + pltfm_host = sdhci_priv(host); priv = sdhci_pltfm_priv(pltfm_host);
Re: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default coherent DMA mask)
On 28/07/18 17:58, Guenter Roeck wrote: On Fri, Jul 27, 2018 at 04:04:48PM +0200, Christoph Hellwig wrote: On Fri, Jul 27, 2018 at 03:18:14PM +0200, Krzysztof Kozlowski wrote: On 27 July 2018 at 15:11, Krzysztof Kozlowski wrote: Hi, On today's next, the bisect pointed commit ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d as fault for my boot failures with NFSv4 root on Toradex Colibri VF50 (Iris carrier board). Author: Robin Murphy Date: Mon Jul 23 23:16:12 2018 +0100 OF: Don't set default coherent DMA mask Board: Toradex Colibri VF50 (NXP VF500, Cortex A5, serial configured with DMA) on Iris Carrier. It looks like problem with Freescale Ethernet driver: [ 15.458477] fsl-edma 40018000.dma-controller: coherent DMA mask is unset [ 15.465284] fsl-lpuart 40027000.serial: Cannot prepare cyclic DMA [ 15.472086] Root-NFS: no NFS server address [ 15.476359] VFS: Unable to mount root fs via NFS, trying floppy. [ 15.484228] VFS: Cannot open root device "nfs" or unknown-block(2,0): error -6 [ 15.491664] Please append a correct "root=" boot option; here are the available partitions: [ 15.500188] 0100 16384 ram0 [ 15.500200] (driver?) [ 15.506406] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(2,0) [ 15.514747] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(2,0) ]--- Attached - defconfig and full boot log. Any hints? Let me know if you need any more information. My Exynos boards also fail to boot on missing network: https://krzk.eu/#/builders/21/builds/799/steps/10/logs/serial0 As expected there are plenty of "DMA mask not set" warnings... and later dwc3 driver fails with: dwc3: probe of 1240.dwc3 failed with error -12 which is probably the answer why LAN attached to USB is not present. Looks like all the drivers failed to set a dma mask and were lucky. I would call it a serious regression. Also, no longer setting a default coherent DMA mask is a quite substantial behavioral change, especially if and since the code worked just fine up to now. To reiterate, that particular side-effect was an unintentional oversight, and I was simply (un)lucky enough that none of the drivers I did test depended on that default mask. Sorry for the blip; please check whether it's now fixed in next-20180730 as it should be. Crash when booting sam460ex attached below, as is a bisect log. Nevertheless, like most of the others that came out of the woodwork, that appears to be a crash due to a broken cleanup path down the line from dma_alloc_coherent() returning NULL - that warrants fixing (or just removing) in its own right, because cleanup code which has never been tested and doesn't actually work is little more than a pointless waste of space. Robin. Guenter --- irq: type mismatch, failed to map hwirq-0 for interrupt-controller3! WARNING: CPU: 0 PID: 1 at ppc4xx_msi_probe+0x2dc/0x3b8 Modules linked in: CPU: 0 PID: 1 Comm: swapper Not tainted 4.18.0-rc6-00010-gff33d1030a6c #1 NIP: c001c460 LR: c001c29c CTR: REGS: cf82db60 TRAP: 0700 Not tainted (4.18.0-rc6-00010-gff33d1030a6c) MSR: 00029000 CR: 24002028 XER: GPR00: c001c29c cf82dc10 cf828000 d1021000 d1021000 cf882108 cf82db78 GPR08: c0377ae4 151b 24002028 c00025e8 GPR16: c0492380 004a GPR24: 00029000 000c 1000 cf8de410 c0494d60 00029000 cf8bebc0 cf8de400 NIP [c001c460] ppc4xx_msi_probe+0x2dc/0x3b8 LR [c001c29c] ppc4xx_msi_probe+0x118/0x3b8 Call Trace: [cf82dc10] [c001c29c] ppc4xx_msi_probe+0x118/0x3b8 (unreliable) [cf82dc70] [c0209fbc] platform_drv_probe+0x40/0x9c [cf82dc90] [c0208240] driver_probe_device+0x2a8/0x350 [cf82dcc0] [c0206204] bus_for_each_drv+0x60/0xac [cf82dcf0] [c0207e88] __device_attach+0xe8/0x160 [cf82dd20] [c02071e0] bus_probe_device+0xa0/0xbc [cf82dd40] [c02050c8] device_add+0x404/0x5c4 [cf82dd90] [c0288978] of_platform_device_create_pdata+0x88/0xd8 [cf82ddb0] [c0288b70] of_platform_bus_create+0x134/0x220 [cf82de10] [c0288bcc] of_platform_bus_create+0x190/0x220 [cf82de70] [c0288cf4] of_platform_bus_probe+0x98/0xec [cf82de90] [c0449650] __machine_initcall_canyonlands_ppc460ex_device_probe+0x38/0x54 [cf82dea0] [c0002404] do_one_initcall+0x40/0x188 [cf82df00] [c043daec] kernel_init_freeable+0x130/0x1d0 [cf82df30] [c0002600] kernel_init+0x18/0x104 [cf82df40] [c000c23c] ret_from_kernel_thread+0x14/0x1c Instruction dump: 386e 4bffa2a5 386f 7f44d378 4bffa299 4bfffe30 386e 4bffa28d 386f 7f24cb78 4bffa281 4bfffde4 <0fe0> 8129 2f89 409efe6c ---[ end trace 8cf551077ecfc429 ]--- ppc4xx-msi c1000.ppc4xx-msi: coherent DMA mask is unset Unable to handle kernel paging request for data at address 0x Faulting instruction address: 0xc001bff0 Oops: Kernel access of bad area, sig: 11 [#1] BE Canyonlands Modules li
Re: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default coherent DMA mask)
On 31/07/18 09:19, Stefan Agner wrote: On 30.07.2018 16:38, Robin Murphy wrote: On 28/07/18 17:58, Guenter Roeck wrote: On Fri, Jul 27, 2018 at 04:04:48PM +0200, Christoph Hellwig wrote: On Fri, Jul 27, 2018 at 03:18:14PM +0200, Krzysztof Kozlowski wrote: On 27 July 2018 at 15:11, Krzysztof Kozlowski wrote: Hi, On today's next, the bisect pointed commit ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d as fault for my boot failures with NFSv4 root on Toradex Colibri VF50 (Iris carrier board). Author: Robin Murphy Date: Mon Jul 23 23:16:12 2018 +0100 OF: Don't set default coherent DMA mask Board: Toradex Colibri VF50 (NXP VF500, Cortex A5, serial configured with DMA) on Iris Carrier. It looks like problem with Freescale Ethernet driver: [ 15.458477] fsl-edma 40018000.dma-controller: coherent DMA mask is unset [ 15.465284] fsl-lpuart 40027000.serial: Cannot prepare cyclic DMA [ 15.472086] Root-NFS: no NFS server address [ 15.476359] VFS: Unable to mount root fs via NFS, trying floppy. [ 15.484228] VFS: Cannot open root device "nfs" or unknown-block(2,0): error -6 [ 15.491664] Please append a correct "root=" boot option; here are the available partitions: [ 15.500188] 0100 16384 ram0 [ 15.500200] (driver?) [ 15.506406] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(2,0) [ 15.514747] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(2,0) ]--- Attached - defconfig and full boot log. Any hints? Let me know if you need any more information. My Exynos boards also fail to boot on missing network: https://krzk.eu/#/builders/21/builds/799/steps/10/logs/serial0 As expected there are plenty of "DMA mask not set" warnings... and later dwc3 driver fails with: dwc3: probe of 1240.dwc3 failed with error -12 which is probably the answer why LAN attached to USB is not present. Looks like all the drivers failed to set a dma mask and were lucky. I would call it a serious regression. Also, no longer setting a default coherent DMA mask is a quite substantial behavioral change, especially if and since the code worked just fine up to now. To reiterate, that particular side-effect was an unintentional oversight, and I was simply (un)lucky enough that none of the drivers I did test depended on that default mask. Sorry for the blip; please check whether it's now fixed in next-20180730 as it should be. Just for my understanding: Your first patch ("OF: Don't set default coherent DMA mask") sounded like that *not* setting default coherent DMA mask was intentionally. Since the commit message reads: "...the bus code has not initialised any default value" that was assuming that all bus code sets a default DMA mask which wasn't the case for "simple-bus". Yes, reading the patches in the order they were written is perhaps a little unclear, but hopefully the order in which they are now applied makes more sense. So I guess that is what ("of/platform: Initialise default DMA masks") makes up for in the typical device tree case ("simple-bus")? Indeed, I'd missed the fact that the now-out-of-place-looking initialisation in of_dma_configure() still actually belonged to of_platform_device_create_pdata() - that patch should make the assumptions of "OF: Don't set default coherent DMA mask" true again, even for OF-platform devices. Now, since almost all drivers are inside a soc "simple-bus" and DMA mask is set again, can/should we rely on the coherent DMA mask set? Or is the expectation still that this is set on driver level too? Ideally, we'd like all drivers to explicitly request their masks as the documentation in DMA-API-HOWTO.txt recommends, if only to ensure DMA is actually possible - there can be systems where even the default 32-bit mask is no good - but clearly we're a little way off trying to enforce that just yet. Robin.
Re: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default coherent DMA mask)
On 31/07/18 14:26, Guenter Roeck wrote: On 07/31/2018 05:32 AM, Robin Murphy wrote: On 31/07/18 09:19, Stefan Agner wrote: On 30.07.2018 16:38, Robin Murphy wrote: On 28/07/18 17:58, Guenter Roeck wrote: On Fri, Jul 27, 2018 at 04:04:48PM +0200, Christoph Hellwig wrote: On Fri, Jul 27, 2018 at 03:18:14PM +0200, Krzysztof Kozlowski wrote: On 27 July 2018 at 15:11, Krzysztof Kozlowski wrote: Hi, On today's next, the bisect pointed commit ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d as fault for my boot failures with NFSv4 root on Toradex Colibri VF50 (Iris carrier board). Author: Robin Murphy Date: Mon Jul 23 23:16:12 2018 +0100 OF: Don't set default coherent DMA mask Board: Toradex Colibri VF50 (NXP VF500, Cortex A5, serial configured with DMA) on Iris Carrier. It looks like problem with Freescale Ethernet driver: [ 15.458477] fsl-edma 40018000.dma-controller: coherent DMA mask is unset [ 15.465284] fsl-lpuart 40027000.serial: Cannot prepare cyclic DMA [ 15.472086] Root-NFS: no NFS server address [ 15.476359] VFS: Unable to mount root fs via NFS, trying floppy. [ 15.484228] VFS: Cannot open root device "nfs" or unknown-block(2,0): error -6 [ 15.491664] Please append a correct "root=" boot option; here are the available partitions: [ 15.500188] 0100 16384 ram0 [ 15.500200] (driver?) [ 15.506406] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(2,0) [ 15.514747] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(2,0) ]--- Attached - defconfig and full boot log. Any hints? Let me know if you need any more information. My Exynos boards also fail to boot on missing network: https://krzk.eu/#/builders/21/builds/799/steps/10/logs/serial0 As expected there are plenty of "DMA mask not set" warnings... and later dwc3 driver fails with: dwc3: probe of 1240.dwc3 failed with error -12 which is probably the answer why LAN attached to USB is not present. Looks like all the drivers failed to set a dma mask and were lucky. I would call it a serious regression. Also, no longer setting a default coherent DMA mask is a quite substantial behavioral change, especially if and since the code worked just fine up to now. To reiterate, that particular side-effect was an unintentional oversight, and I was simply (un)lucky enough that none of the drivers I did test depended on that default mask. Sorry for the blip; please check whether it's now fixed in next-20180730 as it should be. Just for my understanding: Your first patch ("OF: Don't set default coherent DMA mask") sounded like that *not* setting default coherent DMA mask was intentionally. Since the commit message reads: "...the bus code has not initialised any default value" that was assuming that all bus code sets a default DMA mask which wasn't the case for "simple-bus". Yes, reading the patches in the order they were written is perhaps a little unclear, but hopefully the order in which they are now applied makes more sense. So I guess that is what ("of/platform: Initialise default DMA masks") makes up for in the typical device tree case ("simple-bus")? Indeed, I'd missed the fact that the now-out-of-place-looking initialisation in of_dma_configure() still actually belonged to of_platform_device_create_pdata() - that patch should make the assumptions of "OF: Don't set default coherent DMA mask" true again, even for OF-platform devices. Now, since almost all drivers are inside a soc "simple-bus" and DMA mask is set again, can/should we rely on the coherent DMA mask set? Or is the expectation still that this is set on driver level too? Ideally, we'd like all drivers to explicitly request their masks as the documentation in DMA-API-HOWTO.txt recommends, if only to ensure DMA is actually possible - there can be systems where even the default 32-bit mask is no good - but clearly we're a little way off trying to enforce that just yet. Robin. Please note that sparc images still generate the warning (next-20180731). Ugh, OK, any ideas what sparc does to create these platform devices that isn't of_platform_device_create_pdata() and has somehow grown an implicit dependency on of_dma_configure() since 4.12? I'm looking, but nothing jumps out... Robin. sunlance ffd35110: DMA mask not set sunlance.c:v2.02 8/24/03 Miguel de Icaza (mig...@nuclecu.unam.mx) ioremap: done with statics, switching to malloc [ cut here ] WARNING: CPU: 0 PID: 1 at ./include/linux/dma-mapping.h:516 sparc_lance_probe_one+0x428/0x4f4 esp ffd38e90: DMA mask not set [ cut here ] WARNING: CPU: 0 PID: 1 at ./include/linux/dma-mapping.h:516 esp_sbus_probe+0x408/0x6e8 Guenter
Re: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default coherent DMA mask)
On 31/07/18 16:43, Guenter Roeck wrote: On Tue, Jul 31, 2018 at 03:09:34PM +0100, Robin Murphy wrote: Please note that sparc images still generate the warning (next-20180731). Ugh, OK, any ideas what sparc does to create these platform devices that isn't of_platform_device_create_pdata() and has somehow grown an implicit dependency on of_dma_configure() since 4.12? I'm looking, but nothing jumps out... I suspect it might be of_device_register(), called from arch/sparc/kernel/of_device_64.c:scan_one_device() arch/sparc/kernel/of_device_32.c:scan_one_device() Right, that's as far as I got as well, so I'm struggling to see how these things ever got DMA masks set before the of_dma_configure() call moved out of of_platform_device_create_pdata(), or why it wasn't a problem prior to the generic dma_ops rework if they didn't :/ Robin.
Re: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default coherent DMA mask)
On 31/07/18 16:53, Stefan Agner wrote: On 31.07.2018 14:32, Robin Murphy wrote: On 31/07/18 09:19, Stefan Agner wrote: On 30.07.2018 16:38, Robin Murphy wrote: On 28/07/18 17:58, Guenter Roeck wrote: On Fri, Jul 27, 2018 at 04:04:48PM +0200, Christoph Hellwig wrote: On Fri, Jul 27, 2018 at 03:18:14PM +0200, Krzysztof Kozlowski wrote: On 27 July 2018 at 15:11, Krzysztof Kozlowski wrote: Hi, On today's next, the bisect pointed commit ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d as fault for my boot failures with NFSv4 root on Toradex Colibri VF50 (Iris carrier board). Author: Robin Murphy Date: Mon Jul 23 23:16:12 2018 +0100 OF: Don't set default coherent DMA mask Board: Toradex Colibri VF50 (NXP VF500, Cortex A5, serial configured with DMA) on Iris Carrier. It looks like problem with Freescale Ethernet driver: [ 15.458477] fsl-edma 40018000.dma-controller: coherent DMA mask is unset [ 15.465284] fsl-lpuart 40027000.serial: Cannot prepare cyclic DMA [ 15.472086] Root-NFS: no NFS server address [ 15.476359] VFS: Unable to mount root fs via NFS, trying floppy. [ 15.484228] VFS: Cannot open root device "nfs" or unknown-block(2,0): error -6 [ 15.491664] Please append a correct "root=" boot option; here are the available partitions: [ 15.500188] 0100 16384 ram0 [ 15.500200] (driver?) [ 15.506406] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(2,0) [ 15.514747] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(2,0) ]--- Attached - defconfig and full boot log. Any hints? Let me know if you need any more information. My Exynos boards also fail to boot on missing network: https://krzk.eu/#/builders/21/builds/799/steps/10/logs/serial0 As expected there are plenty of "DMA mask not set" warnings... and later dwc3 driver fails with: dwc3: probe of 1240.dwc3 failed with error -12 which is probably the answer why LAN attached to USB is not present. Looks like all the drivers failed to set a dma mask and were lucky. I would call it a serious regression. Also, no longer setting a default coherent DMA mask is a quite substantial behavioral change, especially if and since the code worked just fine up to now. To reiterate, that particular side-effect was an unintentional oversight, and I was simply (un)lucky enough that none of the drivers I did test depended on that default mask. Sorry for the blip; please check whether it's now fixed in next-20180730 as it should be. Just for my understanding: Your first patch ("OF: Don't set default coherent DMA mask") sounded like that *not* setting default coherent DMA mask was intentionally. Since the commit message reads: "...the bus code has not initialised any default value" that was assuming that all bus code sets a default DMA mask which wasn't the case for "simple-bus". Yes, reading the patches in the order they were written is perhaps a little unclear, but hopefully the order in which they are now applied makes more sense. So I guess that is what ("of/platform: Initialise default DMA masks") makes up for in the typical device tree case ("simple-bus")? Indeed, I'd missed the fact that the now-out-of-place-looking initialisation in of_dma_configure() still actually belonged to of_platform_device_create_pdata() - that patch should make the assumptions of "OF: Don't set default coherent DMA mask" true again, even for OF-platform devices. Now, since almost all drivers are inside a soc "simple-bus" and DMA mask is set again, can/should we rely on the coherent DMA mask set? Or is the expectation still that this is set on driver level too? Ideally, we'd like all drivers to explicitly request their masks as the documentation in DMA-API-HOWTO.txt recommends, if only to ensure DMA is actually possible - there can be systems where even the default 32-bit mask is no good - but clearly we're a little way off trying to enforce that just yet. In the FEC driver case, there is an integrated DMA (uDMA). It has alignment restrictions, but can otherwise address the full 32-bit range. So something like this should do it right? if (dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32))) { dev_warn(dev, "No suitable DMA available\n"); return -ENODEV; } Yup, precisely. However, that, as far as I understand, still requires that the bus set up dma_mask properly. Should I be using dma_coerce_mask_and_coherent? AFAICS for FEC, the ColdFire instances have statically-set masks, the i.MX boardfiles get them set via platform+device_register_full(), and now that the bug-which-never-should-have-been is fixed the DT-based instances should be fine too, so you should be good to go. In general I'd say that the dma_coerce_mask*() routines are only really fo
Re: [PATCH v2 0/4] arm64 SMMUv3 PMU driver with IORT support
Hi Shameer, On 01/08/18 09:52, Shameerali Kolothum Thodi wrote: Hi Lorenzo/Robin, Just a gentle ping on this series. This is a v2 for smmu pmcg support based on Neil Leeder's v1[1]. Thanks for picking this up - it's not gone unnoticed ;) It'll take me a while to page all this stuff back in, so given where we are in the cycle I was planning to review it once rc1 is out, hope that's OK. Cheers, Robin. Main changes include, -an helper function to IORT to retrieve the associated SMMU info. -MSI support to the PMU driver. Please take a look and let me know your thoughts. Thanks, Shameer [1]https://www.spinics.net/lists/arm-kernel/msg598591.html -Original Message- From: Linuxarm [mailto:linuxarm-boun...@huawei.com] On Behalf Of Shameer Kolothum Sent: 24 July 2018 12:45 To: lorenzo.pieral...@arm.com; robin.mur...@arm.com Cc: mark.rutl...@arm.com; vkil...@codeaurora.org; neil.m.lee...@gmail.com; pa...@codeaurora.org; will.dea...@arm.com; rruig...@codeaurora.org; Linuxarm ; linux- ker...@vger.kernel.org; linux-a...@vger.kernel.org; linux-arm- ker...@lists.infradead.org Subject: [PATCH v2 0/4] arm64 SMMUv3 PMU driver with IORT support This adds a driver for the SMMUv3 PMU into the perf framework. It includes an IORT update to support PM Counter Groups. This is based on the initial work done by Neil Leeder[1] SMMUv3 PMCG devices are named as arm_smmu_v3_x_pmcg_y where x denotes the associated smmuv3 dev id(if any) and y denotes the pmu dev id. Usage example: For common arch supported events: perf stat -e arm_smmu_v3_0_pmcg_6/transaction,filter_enable=1, filter_span=1,filter_stream_id=0x42/ -a pwd For IMP DEF events: perf stat -e arm_smmu_v3.0_pmcg.6/event=id/ -a pwd Sanity tested on HiSilicon platform. Further testing on supported platforms are very much welcome. v1 --> v2 - Addressed comments from Robin. - Added an helper to retrieve the associated smmu dev and named PMUs to make the association visible to user. - Added MSI support for overflow irq [1]https://www.spinics.net/lists/arm-kernel/msg598591.html Neil Leeder (2): acpi: arm64: add iort support for PMCG perf: add arm64 smmuv3 pmu driver Shameer Kolothum (2): acpi: arm64: iort helper to find the associated smmu of pmcg node perf/smmuv3: Add MSI irq support drivers/acpi/arm64/iort.c | 179 +++-- drivers/perf/Kconfig | 9 + drivers/perf/Makefile | 1 + drivers/perf/arm_smmuv3_pmu.c | 901 ++ include/linux/acpi_iort.h | 4 + 5 files changed, 1063 insertions(+), 31 deletions(-) create mode 100644 drivers/perf/arm_smmuv3_pmu.c -- 2.7.4 ___ Linuxarm mailing list linux...@huawei.com http://hulk.huawei.com/mailman/listinfo/linuxarm
Re: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default coherent DMA mask)
On 31/07/18 18:38, Guenter Roeck wrote: On Tue, Jul 31, 2018 at 04:58:41PM +0100, Robin Murphy wrote: On 31/07/18 16:43, Guenter Roeck wrote: On Tue, Jul 31, 2018 at 03:09:34PM +0100, Robin Murphy wrote: Please note that sparc images still generate the warning (next-20180731). Ugh, OK, any ideas what sparc does to create these platform devices that isn't of_platform_device_create_pdata() and has somehow grown an implicit dependency on of_dma_configure() since 4.12? I'm looking, but nothing jumps out... I suspect it might be of_device_register(), called from arch/sparc/kernel/of_device_64.c:scan_one_device() arch/sparc/kernel/of_device_32.c:scan_one_device() Right, that's as far as I got as well, so I'm struggling to see how these things ever got DMA masks set before the of_dma_configure() call moved out of of_platform_device_create_pdata(), or why it wasn't a problem prior to the generic dma_ops rework if they didn't :/ Ah, ok. No idea, sorry. All I know is that the messages were first seen with next-20180727. OK, I spent this afternoon wrangling toolchains and QEMU to boot an instrumented 4.11 kernel, and the answer is that the warnings are arguably correct. These masks have indeed never been set where they should have been, but then the sbus_dma_ops don't reference them anyway. The coherent mask WARN_ON *should* have started appearing in 4.16 with 205e1b7f51e4("dma-mapping: warn when there is no coherent_dma_mask"), but happened to be hidden by the inadvertent side-effect of the prior dma_configure() change. Since there's seemingly no actual regression of functionality, I'm inclined to leave this in the hands of whoever cares about sparc32. Robin.
Re: [PATCH] arm_pmu: fix compiler warning in arm_pmu_device_probe
Hi Chris, On 01/08/18 22:45, Chris Packham wrote: GCC warns arm_pmu_platform.c:234:5: error: 'err' may be used uninitialized in this function [-Werror=maybe-uninitialized] This is because we rely on the for_each_cpu loop in armpmu_request_irqs to initialise err. The warning is a little bogus because we know if there were 0 CPUs this code would not be running. Initialise err to 0 to avoid the warning. Maybe initialising to something like -EINVAL would be more appropriate, just in case we did ever manage to get here with armpmu->supported_cpus unset? Robin. Signed-off-by: Chris Packham --- This has been reported before in https://lkml.org/lkml/2018/3/5/508 I'm not sure if it was dismmissed as "meh, gcc is wrong" or if it was just wainting for someone with some round tuits. drivers/perf/arm_pmu_platform.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/perf/arm_pmu_platform.c b/drivers/perf/arm_pmu_platform.c index 971ff336494a..96075cecb0ae 100644 --- a/drivers/perf/arm_pmu_platform.c +++ b/drivers/perf/arm_pmu_platform.c @@ -160,7 +160,7 @@ static int pmu_parse_irqs(struct arm_pmu *pmu) static int armpmu_request_irqs(struct arm_pmu *armpmu) { struct pmu_hw_events __percpu *hw_events = armpmu->hw_events; - int cpu, err; + int cpu, err = 0; for_each_cpu(cpu, &armpmu->supported_cpus) { int irq = per_cpu(hw_events->irq, cpu);
Re: [PATCH 4/7 v6] iommu/arm-smmu: Add support for the fsl-mc bus
On 09/07/18 12:18, Nipun Gupta wrote: Implement bus specific support for the fsl-mc bus including registering arm_smmu_ops and bus specific device add operations. I guess this is about as neat as it can get; Reviewed-by: Robin Murphy Signed-off-by: Nipun Gupta --- drivers/iommu/arm-smmu.c | 7 +++ drivers/iommu/iommu.c| 13 + include/linux/fsl/mc.h | 8 include/linux/iommu.h| 2 ++ 4 files changed, 30 insertions(+) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index f7a96bc..a011bb6 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -52,6 +52,7 @@ #include #include +#include #include "io-pgtable.h" #include "arm-smmu-regs.h" @@ -1459,6 +1460,8 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev) if (dev_is_pci(dev)) group = pci_device_group(dev); + else if (dev_is_fsl_mc(dev)) + group = fsl_mc_device_group(dev); else group = generic_device_group(dev); @@ -2037,6 +2040,10 @@ static void arm_smmu_bus_init(void) bus_set_iommu(&pci_bus_type, &arm_smmu_ops); } #endif +#ifdef CONFIG_FSL_MC_BUS + if (!iommu_present(&fsl_mc_bus_type)) + bus_set_iommu(&fsl_mc_bus_type, &arm_smmu_ops); +#endif } static int arm_smmu_device_probe(struct platform_device *pdev) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index d227b86..df2f49e 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -32,6 +32,7 @@ #include #include #include +#include #include static struct kset *iommu_group_kset; @@ -988,6 +989,18 @@ struct iommu_group *pci_device_group(struct device *dev) return iommu_group_alloc(); } +/* Get the IOMMU group for device on fsl-mc bus */ +struct iommu_group *fsl_mc_device_group(struct device *dev) +{ + struct device *cont_dev = fsl_mc_cont_dev(dev); + struct iommu_group *group; + + group = iommu_group_get(cont_dev); + if (!group) + group = iommu_group_alloc(); + return group; +} + /** * iommu_group_get_for_dev - Find or create the IOMMU group for a device * @dev: target device diff --git a/include/linux/fsl/mc.h b/include/linux/fsl/mc.h index f27cb14..dddaca1 100644 --- a/include/linux/fsl/mc.h +++ b/include/linux/fsl/mc.h @@ -351,6 +351,14 @@ struct fsl_mc_io { #define dev_is_fsl_mc(_dev) (0) #endif +/* Macro to check if a device is a container device */ +#define fsl_mc_is_cont_dev(_dev) (to_fsl_mc_device(_dev)->flags & \ + FSL_MC_IS_DPRC) + +/* Macro to get the container device of a MC device */ +#define fsl_mc_cont_dev(_dev) (fsl_mc_is_cont_dev(_dev) ? \ + (_dev) : (_dev)->parent) + /* * module_fsl_mc_driver() - Helper macro for drivers that don't do * anything special in module init/exit. This eliminates a lot of diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 7447b0b..209891d 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -389,6 +389,8 @@ static inline size_t iommu_map_sg(struct iommu_domain *domain, extern struct iommu_group *pci_device_group(struct device *dev); /* Generic device grouping function */ extern struct iommu_group *generic_device_group(struct device *dev); +/* FSL-MC device grouping function */ +struct iommu_group *fsl_mc_device_group(struct device *dev); /** * struct iommu_fwspec - per-device IOMMU instance data
Re: [PATCH 5/7 v6] bus/fsl-mc: support dma configure for devices on fsl-mc bus
On 09/07/18 12:18, Nipun Gupta wrote: This patch adds support of dma configuration for devices on fsl-mc bus using 'dma_configure' callback for busses. Also, directly calling arch_setup_dma_ops is removed from the fsl-mc bus. Reviewed-by: Robin Murphy Signed-off-by: Nipun Gupta Reviewed-by: Laurentiu Tudor --- drivers/bus/fsl-mc/fsl-mc-bus.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/bus/fsl-mc/fsl-mc-bus.c b/drivers/bus/fsl-mc/fsl-mc-bus.c index 5d8266c..fa43c7d 100644 --- a/drivers/bus/fsl-mc/fsl-mc-bus.c +++ b/drivers/bus/fsl-mc/fsl-mc-bus.c @@ -127,6 +127,16 @@ static int fsl_mc_bus_uevent(struct device *dev, struct kobj_uevent_env *env) return 0; } +static int fsl_mc_dma_configure(struct device *dev) +{ + struct device *dma_dev = dev; + + while (dev_is_fsl_mc(dma_dev)) + dma_dev = dma_dev->parent; + + return of_dma_configure(dev, dma_dev->of_node, 0); +} + static ssize_t modalias_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -148,6 +158,7 @@ struct bus_type fsl_mc_bus_type = { .name = "fsl-mc", .match = fsl_mc_bus_match, .uevent = fsl_mc_bus_uevent, + .dma_configure = fsl_mc_dma_configure, .dev_groups = fsl_mc_dev_groups, }; EXPORT_SYMBOL_GPL(fsl_mc_bus_type); @@ -633,10 +644,6 @@ int fsl_mc_device_add(struct fsl_mc_obj_desc *obj_desc, goto error_cleanup_dev; } - /* Objects are coherent, unless 'no shareability' flag set. */ - if (!(obj_desc->flags & FSL_MC_OBJ_FLAG_NO_MEM_SHAREABILITY)) - arch_setup_dma_ops(&mc_dev->dev, 0, 0, NULL, true); - /* * The device-specific probe callback will get invoked by device_add() */
Re: [PATCH 1/7 v6] Documentation: fsl-mc: add iommu-map device-tree binding for fsl-mc bus
On 09/07/18 12:18, Nipun Gupta wrote: The existing IOMMU bindings cannot be used to specify the relationship between fsl-mc devices and IOMMUs. This patch adds a generic binding for mapping fsl-mc devices to IOMMUs, using iommu-map property. No more nits from me :) Acked-by: Robin Murphy Signed-off-by: Nipun Gupta Reviewed-by: Rob Herring --- .../devicetree/bindings/misc/fsl,qoriq-mc.txt | 39 ++ 1 file changed, 39 insertions(+) diff --git a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt index 6611a7c..01fdc33 100644 --- a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt +++ b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt @@ -9,6 +9,25 @@ blocks that can be used to create functional hardware objects/devices such as network interfaces, crypto accelerator instances, L2 switches, etc. +For an overview of the DPAA2 architecture and fsl-mc bus see: +Documentation/networking/dpaa2/overview.rst + +As described in the above overview, all DPAA2 objects in a DPRC share the +same hardware "isolation context" and a 10-bit value called an ICID +(isolation context id) is expressed by the hardware to identify +the requester. + +The generic 'iommus' property is insufficient to describe the relationship +between ICIDs and IOMMUs, so an iommu-map property is used to define +the set of possible ICIDs under a root DPRC and how they map to +an IOMMU. + +For generic IOMMU bindings, see +Documentation/devicetree/bindings/iommu/iommu.txt. + +For arm-smmu binding, see: +Documentation/devicetree/bindings/iommu/arm,smmu.txt. + Required properties: - compatible @@ -88,14 +107,34 @@ Sub-nodes: Value type: Definition: Specifies the phandle to the PHY device node associated with the this dpmac. +Optional properties: + +- iommu-map: Maps an ICID to an IOMMU and associated iommu-specifier + data. + + The property is an arbitrary number of tuples of + (icid-base,iommu,iommu-base,length). + + Any ICID i in the interval [icid-base, icid-base + length) is + associated with the listed IOMMU, with the iommu-specifier + (i - icid-base + iommu-base). Example: +smmu: iommu@500 { + compatible = "arm,mmu-500"; + #iommu-cells = <1>; + stream-match-mask = <0x7C00>; + ... +}; + fsl_mc: fsl-mc@80c00 { compatible = "fsl,qoriq-mc"; reg = <0x0008 0x0c00 0 0x40>,/* MC portal base */ <0x 0x0834 0 0x4>; /* MC control reg */ msi-parent = <&its>; +/* define map for ICIDs 23-64 */ +iommu-map = <23 &smmu 23 41>; #address-cells = <3>; #size-cells = <1>;
Re: [PATCH] iommu/iova: Update cached node pointer when current node fails to get any free IOVA
On 12/07/18 08:45, Ganapatrao Kulkarni wrote: Hi Robin, On Mon, Jun 4, 2018 at 9:36 AM, Ganapatrao Kulkarni wrote: ping?? On Mon, May 21, 2018 at 6:45 AM, Ganapatrao Kulkarni wrote: On Thu, Apr 26, 2018 at 3:15 PM, Ganapatrao Kulkarni wrote: Hi Robin, On Mon, Apr 23, 2018 at 11:11 PM, Ganapatrao Kulkarni wrote: On Mon, Apr 23, 2018 at 10:07 PM, Robin Murphy wrote: On 19/04/18 18:12, Ganapatrao Kulkarni wrote: The performance drop is observed with long hours iperf testing using 40G cards. This is mainly due to long iterations in finding the free iova range in 32bit address space. In current implementation for 64bit PCI devices, there is always first attempt to allocate iova from 32bit(SAC preferred over DAC) address range. Once we run out 32bit range, there is allocation from higher range, however due to cached32_node optimization it does not suppose to be painful. cached32_node always points to recently allocated 32-bit node. When address range is full, it will be pointing to last allocated node (leaf node), so walking rbtree to find the available range is not expensive affair. However this optimization does not behave well when one of the middle node is freed. In that case cached32_node is updated to point to next iova range. The next iova allocation will consume free range and again update cached32_node to itself. From now on, walking over 32-bit range is more expensive. This patch adds fix to update cached node to leaf node when there are no iova free range left, which avoids unnecessary long iterations. The only trouble with this is that "allocation failed" doesn't uniquely mean "space full". Say that after some time the 32-bit space ends up empty except for one page at 0x1000 and one at 0x8000, then somebody tries to allocate 2GB. If we move the cached node down to the leftmost entry when that fails, all subsequent allocation attempts are now going to fail despite the space being 99.% free! I can see a couple of ways to solve that general problem of free space above the cached node getting lost, but neither of them helps with the case where there is genuinely insufficient space (and if anything would make it even slower). In terms of the optimisation you want here, i.e. fail fast when an allocation cannot possibly succeed, the only reliable idea which comes to mind is free-PFN accounting. I might give that a go myself to see how ugly it looks. did you get any chance to look in to this issue? i am waiting for your suggestion/patch for this issue! I got as far as [1], but I wasn't sure how much I liked it, since it still seems a little invasive for such a specific case (plus I can't remember if it's actually been debugged or not). I think in the end I started wondering whether it's even worth bothering with the 32-bit optimisation for PCIe devices - 4 extra bytes worth of TLP is surely a lot less significant than every transaction taking up to 50% more bus cycles was for legacy PCI. Robin. [1] http://www.linux-arm.org/git?p=linux-rm.git;a=commitdiff;h=a8e0e4af10ebebb3669750e05bf0028e5bd6afe8
Re: [RFC PATCH 0/4] Stop losing firmware-set DMA masks
On 11/07/18 15:40, Rob Herring wrote: On Tue, Jul 10, 2018 at 12:43 PM Robin Murphy wrote: Whilst the common firmware code invoked by dma_configure() initialises devices' DMA masks according to limitations described by the respective properties ("dma-ranges" for OF and _DMA/IORT for ACPI), the nature of the dma_set_mask() API leads to that information getting lost when well-behaved drivers probe and set a 64-bit mask, since in general there's no way to tell the difference between a firmware-described mask (which should be respected) and whatever default may have come from the bus code (which should be replaced outright). This can break DMA on systems with certain IOMMU topologies (e.g. [1]) where the IOMMU driver only knows its maximum supported address size, not how many of those address bits might actually be wired up between any of its input interfaces and the associated DMA master devices. Similarly, some PCIe root complexes only have a 32-bit native interface on their host bridge, which leads to the same DMA-address-truncation problem in systems with a larger physical memory map and RAM above 4GB (e.g. [2]). These patches attempt to deal with this in the simplest way possible by generalising the specific quirk for 32-bit bridges into an arbitrary mask which can then also be plumbed into the firmware code. In the interest of being minimally invasive, I've only included a point fix for the IOMMU issue as seen on arm64 - there may be further tweaks needed in DMA ops to catch all possible incarnations of this problem, but this initial RFC is mostly about the impact beyond the dma-mapping subsystem itself. Couldn't you set and use the device's parent's dma_mask instead. At least for DT, we should always have a parent device representing the bus. That would avoid further bloating of struct device. But then if the parent device did have a non-trivial driver which calls dma_set_mask(), we'd be back at square 1 :/ More realistically, I don't think that's viable for ACPI, at least with IORT, since the memory address size limit belongs to the endpoint itself, thus two devices with the same nominal parent in the Linux device model could still have different limits (where in DT you'd have to have to insert intermediate simple-bus nodes to model the same topology with dma-ranges). Plus either way it seems somewhat fragile for PCI where the host bridge may be some distance up the hierarchy. Robin.
Re: [PATCH v5 08/13] iommu/rockchip: Control clocks needed to access the IOMMU
On 24/01/18 10:35, Jeffy Chen wrote: From: Tomasz Figa Current code relies on master driver enabling necessary clocks before IOMMU is accessed, however there are cases when the IOMMU should be accessed while the master is not running yet, for example allocating V4L2 videobuf2 buffers, which is done by the VB2 framework using DMA mapping API and doesn't engage the master driver at all. This patch fixes the problem by letting clocks needed for IOMMU operation to be listed in Device Tree and making the driver enable them for the time of accessing the hardware. Signed-off-by: Jeffy Chen Signed-off-by: Tomasz Figa --- Changes in v5: Use clk_bulk APIs. Changes in v4: None Changes in v3: None Changes in v2: None .../devicetree/bindings/iommu/rockchip,iommu.txt | 8 +++ drivers/iommu/rockchip-iommu.c | 74 -- 2 files changed, 76 insertions(+), 6 deletions(-) diff --git a/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt b/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt index 2098f7732264..33dd853359fa 100644 --- a/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt +++ b/Documentation/devicetree/bindings/iommu/rockchip,iommu.txt @@ -14,6 +14,13 @@ Required properties: "single-master" device, and needs no additional information to associate with its master device. See: Documentation/devicetree/bindings/iommu/iommu.txt +Optional properties: +- clocks : A list of master clocks requires for the IOMMU to be accessible s/requires/required/ + by the host CPU. The number of clocks depends on the master + block and might as well be zero. See [1] for generic clock Oops, some subtleties of English here :) To say "the number of clocks ... might as well be zero" effectively implies "there's no point ever specifying any clocks". I guess what you really mean here is "...might well be...", i.e. it is both valid and reasonably likely to require zero clocks. + bindings description. + +[1] Documentation/devicetree/bindings/clock/clock-bindings.txt Optional properties: - rockchip,disable-mmu-reset : Don't use the mmu reset operation. @@ -27,5 +34,6 @@ Example: reg = <0xff940300 0x100>; interrupts = ; interrupt-names = "vopl_mmu"; + clocks = <&cru ACLK_VOP1>, <&cru DCLK_VOP1>, <&cru HCLK_VOP1>; #iommu-cells = <0>; }; diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c index c4131ca792e0..8a5e2a659b67 100644 --- a/drivers/iommu/rockchip-iommu.c +++ b/drivers/iommu/rockchip-iommu.c @@ -4,6 +4,7 @@ * published by the Free Software Foundation. */ +#include #include #include #include @@ -91,6 +92,8 @@ struct rk_iommu { struct device *dev; void __iomem **bases; int num_mmu; + struct clk_bulk_data *clocks; + int num_clocks; bool reset_disabled; struct iommu_device iommu; struct list_head node; /* entry in rk_iommu_domain.iommus */ @@ -450,6 +453,38 @@ static int rk_iommu_force_reset(struct rk_iommu *iommu) return 0; } +static int rk_iommu_of_get_clocks(struct rk_iommu *iommu) +{ + struct device_node *np = iommu->dev->of_node; + int ret; + int i; + + ret = of_count_phandle_with_args(np, "clocks", "#clock-cells"); + if (ret == -ENOENT) + return 0; + else if (ret < 0) + return ret; + + iommu->num_clocks = ret; + iommu->clocks = devm_kcalloc(iommu->dev, iommu->num_clocks, +sizeof(*iommu->clocks), GFP_KERNEL); + if (!iommu->clocks) + return -ENOMEM; + + for (i = 0; i < iommu->num_clocks; ++i) { + iommu->clocks[i].clk = of_clk_get(np, i); + if (IS_ERR(iommu->clocks[i].clk)) { + ret = PTR_ERR(iommu->clocks[i].clk); + goto err_clk_put; + } + } Just to confirm my understanding from a quick scan through the code, the reason we can't use clk_bulk_get() here is that currently, clocks[i].id being NULL means we'd end up just getting the first clock multiple times, right? I guess there could be other users who also want "just get whatever clocks I have" functionality, so it might be worth proposing that for the core API as a separate/follow-up patch, but it definitely doesn't need to be part of this series. I really don't know enough about correct clk API usage, but modulo the binding comments it certainly looks nice and tidy now; Acked-by: Robin Murphy Thanks, Robin. + + return 0; +err_clk_put: + clk
Re: [PATCH v2] iommu/mediatek: Move attach_device after iommu-group is ready for M4Uv1
On 23/01/18 08:39, Yong Wu wrote: In the commit 05f80300dc8b ("iommu: Finish making iommu_group support mandatory"), the iommu framework has supposed all the iommu drivers have their owner iommu-group, it get rid of the FIXME workarounds while the group is NULL. But the flow of Mediatek M4U gen1 looks a bit trick that it will hang at this case: == Unable to handle kernel NULL pointer dereference at virtual address 0030 PC is at mutex_lock+0x28/0x54 LR is at iommu_attach_device+0xa4/0xd4 pc : []lr : []psr: 6013 sp : df0edbb8 ip : df0edbc8 fp : df0edbc4 r10: c114da14 r9 : df2a3e40 r8 : 0003 r7 : df27a210 r6 : df2a90c4 r5 : 0030 r4 : r3 : df0f8000 r2 : f000 r1 : df29c610 r0 : 0030 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none xxx (mutex_lock) from [] (iommu_attach_device+0xa4/0xd4) (iommu_attach_device) from [] (__arm_iommu_attach_device+0x28/0x90) (__arm_iommu_attach_device) from [] (arm_iommu_attach_device+0x1c/0x30) (arm_iommu_attach_device) from [] (mtk_iommu_add_device+0xfc/0x214) (mtk_iommu_add_device) from [] (add_iommu_group+0x3c/0x68) (add_iommu_group) from [] (bus_for_each_dev+0x78/0xac) (bus_for_each_dev) from [] (bus_set_iommu+0xb0/0xec) (bus_set_iommu) from [] (mtk_iommu_probe+0x328/0x368) (mtk_iommu_probe) from [] (platform_drv_probe+0x5c/0xc0) (platform_drv_probe) from [] (driver_probe_device+0x2f4/0x4d8) (driver_probe_device) from [] (__driver_attach+0x10c/0x128) (__driver_attach) from [] (bus_for_each_dev+0x78/0xac) (bus_for_each_dev) from [] (driver_attach+0x2c/0x30) (driver_attach) from [] (bus_add_driver+0x1e0/0x278) (bus_add_driver) from [] (driver_register+0x88/0x108) (driver_register) from [] (__platform_driver_register+0x50/0x58) (__platform_driver_register) from [] (m4u_init+0x24/0x28) (m4u_init) from [] (do_one_initcall+0xf0/0x17c) = The root cause is that "arm_iommu_attach_device" is called before "iommu_group_get_for_dev" in the interface "mtk_iommu_add_device". Thus, We adjust the sequence of this two functions. Unfortunately, there is another issue after the solution above, From the function "iommu_attach_device", Only one device in each a iommu group is allowed. In Mediatek case, there is only one m4u group, all the devices are in one group. thus it get fail at this step. In order to satisfy this requirement, a new iommu group is allocated for each a iommu consumer device. But meanwhile, we still have to use the same domain for all the iommu group. Use a global variable "mtk_domain_v1" to save the global domain. Argh, sorry for breaking it! Seems I managed to forget just how horrible and fiddly all the arm_iommu_* stuff is :( CC: Robin Murphy CC: Honghui Zhang Fixes: 05f80300dc8b ("iommu: Finish making iommu_group support mandatory") Reported-by: Ryder Lee Tested-by: Bibby Hsieh Signed-off-by: Yong Wu --- changes since v1: Add mtk_domain_v1=NULL in domain_free for symmetry. v1: https://patchwork.kernel.org/patch/10176255/ --- drivers/iommu/mtk_iommu_v1.c | 42 +++--- 1 file changed, 19 insertions(+), 23 deletions(-) diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c index 542930c..86106bf 100644 --- a/drivers/iommu/mtk_iommu_v1.c +++ b/drivers/iommu/mtk_iommu_v1.c @@ -103,6 +103,9 @@ struct mtk_iommu_domain { struct mtk_iommu_data *data; }; +/* There is only a iommu domain in M4U gen1. */ +static struct mtk_iommu_domain *mtk_domain_v1; + static struct mtk_iommu_domain *to_mtk_domain(struct iommu_domain *dom) { return container_of(dom, struct mtk_iommu_domain, domain); @@ -251,10 +254,15 @@ static struct iommu_domain *mtk_iommu_domain_alloc(unsigned type) if (type != IOMMU_DOMAIN_UNMANAGED) return NULL; + /* Always return the same domain. */ + if (mtk_domain_v1) + return &mtk_domain_v1->domain; This seems a bit too fragile (and I vaguely recall we may have discussed and rejected this approach for the original driver), since any code doing: unused = iommu_domain_alloc(bus); iommu_domain_free(unused); will pull the rug out from under everyone's feet in a very nasty and unexpected manner. Given that mtk_iommu_create_mapping() is already a giant workaround for the ARM DMA code not understanding groups and default domains, I'd prefer not to have to regress "correct" driver behaviour for the sake of that; how about something like the below diff, is that enough to make things work? Robin. ->8- diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c index 542930cd183d..8b90b7a72238 100644 --- a/drivers/iommu/mtk_iommu_v1.c +++ b/drivers/iommu/mtk_iommu_v1.c @@ -376,6 +376,7 @@ static int mtk_iommu_create_mapping(struct device *dev,
Re: [PATCH v3] iommu/mediatek: Move attach_device after iommu-group is ready for M4Uv1
On 25/01/18 11:14, Yong Wu wrote: In the commit 05f80300dc8b, the iommu framework has supposed all the iommu drivers have their owner iommu-group, it get rid of the FIXME workarounds while the group is NULL. But the flow of Mediatek M4U gen1 looks a bit trick that it will hang at this case: == Unable to handle kernel NULL pointer dereference at virtual address 0030 pgd = c0004000 [0030] *pgd= PC is at mutex_lock+0x28/0x54 LR is at iommu_attach_device+0xa4/0xd4 pc : []lr : []psr: 6013 sp : df0edbb8 ip : df0edbc8 fp : df0edbc4 r10: c114da14 r9 : df2a3e40 r8 : 0003 r7 : df27a210 r6 : df2a90c4 r5 : 0030 r4 : r3 : df0f8000 r2 : f000 r1 : df29c610 r0 : 0030 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none xxx (mutex_lock) from [] (iommu_attach_device+0xa4/0xd4) (iommu_attach_device) from [] (__arm_iommu_attach_device+0x28/0x90) (__arm_iommu_attach_device) from [] (arm_iommu_attach_device+0x1c/0x30) (arm_iommu_attach_device) from [] (mtk_iommu_add_device+0xfc/0x214) (mtk_iommu_add_device) from [] (add_iommu_group+0x3c/0x68) (add_iommu_group) from [] (bus_for_each_dev+0x78/0xac) (bus_for_each_dev) from [] (bus_set_iommu+0xb0/0xec) (bus_set_iommu) from [] (mtk_iommu_probe+0x328/0x368) (mtk_iommu_probe) from [] (platform_drv_probe+0x5c/0xc0) (platform_drv_probe) from [] (driver_probe_device+0x2f4/0x4d8) (driver_probe_device) from [] (__driver_attach+0x10c/0x128) (__driver_attach) from [] (bus_for_each_dev+0x78/0xac) (bus_for_each_dev) from [] (driver_attach+0x2c/0x30) (driver_attach) from [] (bus_add_driver+0x1e0/0x278) (bus_add_driver) from [] (driver_register+0x88/0x108) (driver_register) from [] (__platform_driver_register+0x50/0x58) (__platform_driver_register) from [] (m4u_init+0x24/0x28) (m4u_init) from [] (do_one_initcall+0xf0/0x17c) = The root cause is that the device's iommu-group is NULL while arm_iommu_attach_device is called. This patch prepare a new iommu-group for the iommu consumer devices to fix this issue. CC: Robin Murphy CC: Honghui Zhang Fixes: 05f80300dc8b ('iommu: Finish making iommu_group support mandatory') Reported-by: Ryder Lee Signed-off-by: Yong Wu --- changes notes: v3: don't use the global variable and allocate a new iommu group before arm_iommu_attach_device following Robin's suggestion. v2: http://lists.infradead.org/pipermail/linux-mediatek/2018-January/011810.html Add mtk_domain_v1=NULL in domain_free for symmetry. v1: https://patchwork.kernel.org/patch/10176255/ --- drivers/iommu/mtk_iommu_v1.c | 49 ++-- 1 file changed, 25 insertions(+), 24 deletions(-) diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c index 542930c..aca76d2 100644 --- a/drivers/iommu/mtk_iommu_v1.c +++ b/drivers/iommu/mtk_iommu_v1.c @@ -418,20 +418,12 @@ static int mtk_iommu_create_mapping(struct device *dev, m4udev->archdata.iommu = mtk_mapping; } - ret = arm_iommu_attach_device(dev, mtk_mapping); - if (ret) - goto err_release_mapping; - return 0; - -err_release_mapping: - arm_iommu_release_mapping(mtk_mapping); - m4udev->archdata.iommu = NULL; - return ret; } static int mtk_iommu_add_device(struct device *dev) { + struct dma_iommu_mapping *mtk_mapping; struct of_phandle_args iommu_spec; struct of_phandle_iterator it; struct mtk_iommu_data *data; @@ -452,9 +444,30 @@ static int mtk_iommu_add_device(struct device *dev) if (!dev->iommu_fwspec || dev->iommu_fwspec->ops != &mtk_iommu_ops) return -ENODEV; /* Not a iommu client device */ + /* +* This is a short-term bodge because the ARM DMA code doesn't +* understand multi-device groups, but we have to call into it +* successfully (and not just rely on a normal IOMMU API attach +* here) in order to set the correct DMA API ops on @dev. +*/ + group = iommu_group_alloc(); + if (IS_ERR(group)) + return PTR_ERR(group); + + err = iommu_group_add_device(group, dev); + iommu_group_put(group); + if (err) + return err; + data = dev->iommu_fwspec->iommu_priv; - iommu_device_link(&data->iommu, dev); + mtk_mapping = data->dev->archdata.iommu; + err = arm_iommu_attach_device(dev, mtk_mapping); + if (err) { + iommu_group_remove_device(dev); + return err; + } + iommu_device_link(&data->iommu, dev); group = iommu_group_get_for_dev(dev); This call now does nothing, so you may as well remove it (and the subsequent iommu_group_put)... if (IS_ERR(group)) return PTR_ERR(group); @@ -479,20 +492,8 @@ static void mtk_iommu_remove_device(struct
Re: [PATCH RFC 0/3] API for 128-bit IO access
On 25/01/18 11:38, Yury Norov wrote: On Wed, Jan 24, 2018 at 11:28:55AM +0100, Arnd Bergmann wrote: On Wed, Jan 24, 2018 at 10:05 AM, Yury Norov wrote: This series adds API for 128-bit memory IO access and enables it for ARM64. The original motivation for 128-bit API came from new Cavium network device driver. The hardware requires 128-bit access to make things work. See description in patch 3 for details. We might also want to do something similar to the include/linux/io-64-nonatomic-lo-hi.h and hi-lo.h files, to simulate 128-bit access using pairs of 64-bit access on other targets. It's apparently driver specific which half you need to do first to make it work, so we need both. OK, will do. Also, starting from ARMv8.4, stp and ldp instructions become atomic, and API for 128-bit access would be helpful in core arm64 code. This series is RFC. I'd like to collect opinions on idea and implementation details. * I didn't implement all 128-bit operations existing for 64-bit variables and other types (__swab128p etc). Do we need them all right now, or we can add them when actually needed? I think in this case it's better to do them all at once. Ack. * u128 name is already used in crypto code. So here I use __uint128_t that comes from GCC for 128-bit types. Should I rename existing type in crypto and make core code for 128-bit variables consistent with u64, u32 etc? (I think yes, but would like to ask crypto people for it.) Hmm, that file probably predates the __uint128_t support. My guess would be that the crypto code using it can actually benefit from the new types as well, so maybe move the existing file to include/linux/int128.h and add an #if/#else logic to it so we use 'typedef __uint128_t __u128' if that is available. It sounds reasonable. But I worry about testing that changes. Hope, crypto community will help with it. * Some compilers don't support __uint128_t, so I protected all generic code with config option HAVE_128BIT_ACCESS. I think it's OK, but... That would be nicely solved by using the #if/#else definition above. * For 128-bit read/write functions I take suffix 'o', which means read/write the octet of bytes. Is this name OK? Can't think of anything better. It's not an octet though, but 16 bytes ('q' is for quadword, meaning four 16-bit words in Intel terminology). Ah, sure. Octet of words. Will change it. * my mips-linux-gnu-gcc v6.3.0 doesn't support __uint128_t, and I don't have other BE setup on hand, so BE case is formally not tested. BE code for arm64 is looking well though. I've run it through my collection of compilers, it seems that most but not all 64-bit targets support it (exceptions appear to be older versions of gcc for s390x and parisc), and none of the 32-bit targets do: Thanks for doing this test. Looking at this I realize that this is not the architecture feature but compiler feature. So if we add 128-bit interface, it would be reasonable to add it for all targets that compiled with toolchain supporting 128-bit accesses. There's already the option ARCH_SUPPORTS_INT128 that is enabled for x86_64 in arch/x86/Kconfig and conditionally enabled for arm64 in arch/arm64/Makefile: KBUILD_CFLAGS+= $(call cc-ifversion, -ge, 0500, -DCONFIG_ARCH_SUPPORTS_INT128) It is used in include/linux/math64.h and in lib/ubsan.c, not so wide. So I find things little messed. Crypto code ignores compilers' ability to operate with 128-bit numbers. Ubsan and math64 relies on compiler version (at least for arm64, and I doubt it would work correctly with clang). And now I introduce HAVE_128BIT_ACCESS with the same meaning for memory access. I think it's time to unify 128-bit code: - enable CONFIG_ARCH_SUPPORTS_INT128 if compiler supports it, ie check it like you do below; - declare u128 as structure or typedef depending on ARCH_SUPPORTS_INT128 in generic include/linux/int128.h, as you suggest here; - switch this series to ARCH_SUPPORTS_INT128. Does it sound reasonable? It probably is about time to formalise the current scattered fragments of uint_128_t support, but to reiterate Will's comment it is almost certainly not worth the effort to implement 'generic' I/O accessors which only work under implementation-defined and undiscoverable hardware conditions, and will be unusable on the overwhelming majority of systems. Just open-code LDP/STP accessors in the one driver which needs them and (by definition) only runs on SoCs where they *are* known to work correctly. Robin. Yury $ for i in /home/arnd/cross-gcc/bin/*gcc-[3-8]* ; do echo -n $i" " ; echo '__uint128_t v;' | $i -xc -S - -o /dev/null && echo ok ; done /home/arnd/cross-gcc/bin/aarch64-linux-gcc-4.8.5 ok /home/arnd/cross-gcc/bin/aarch64-linux-gcc-4.9.3 ok /home/arnd/cross-gcc/bin/aarch64-linux-gcc-4.9.4 ok /home/arnd/cross-gcc/bin/aarch64-linux-gcc-5.2.1 ok /home/arnd/cross-gcc/bin/aarch64-linux-gcc-5.4.1 ok /home/arnd/cross-gcc/bin/aarch64-linux-gcc-5.5.0 ok /home/arnd/cross-
[PATCH 1/2] acpica: iort: Update SMMU models for IORT rev. C
IORT revision C has been published with a number of new SMMU implementation identifiers; define them. CC: Rafael J. Wysocki CC: Robert Moore CC: Lv Zheng Signed-off-by: Robin Murphy --- include/acpi/actbl2.h | 8 1 file changed, 8 insertions(+) diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h index 7aee9fb3bd1f..0242be07f292 100644 --- a/include/acpi/actbl2.h +++ b/include/acpi/actbl2.h @@ -777,6 +777,8 @@ struct acpi_iort_smmu { #define ACPI_IORT_SMMU_V2 0x0001 /* Generic SMMUv2 */ #define ACPI_IORT_SMMU_CORELINK_MMU400 0x0002 /* ARM Corelink MMU-400 */ #define ACPI_IORT_SMMU_CORELINK_MMU500 0x0003 /* ARM Corelink MMU-500 */ +#define ACPI_IORT_SMMU_CORELINK_MMU401 0x0004 /* ARM Corelink MMU-401 */ +#define ACPI_IORT_SMMU_CAVIUM_SMMUV20x0005 /* Cavium ThunderX SMMUv2 */ /* Masks for Flags field above */ @@ -795,6 +797,12 @@ struct acpi_iort_smmu_v3 { u32 sync_gsiv; }; +/* Values for Model field above */ + +#define ACPI_IORT_SMMU_V3 0x /* Generic SMMUv3 */ +#define ACPI_IORT_SMMU_HISILICON_HI161X 0x0001 /* HiSilicon Hi161x SMMUv3 */ +#define ACPI_IORT_SMMU_CAVIUM_CN99XX0x0002 /* Cavium CN99xx SMMUv3 */ + /* Masks for Flags field above */ #define ACPI_IORT_SMMU_V3_COHACC_OVERRIDE (1) -- 2.12.2.dirty
[PATCH 2/2] iommu/arm-smmu: Plumb in new ACPI identifiers
Revision C of IORT now allows us to identify ARM MMU-401 and the Cavium ThunderX implementation; wire them up so that the appropriate quirks get enabled when booting with ACPI. Signed-off-by: Robin Murphy --- drivers/iommu/arm-smmu.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 6dadd51d486c..d9ec840defc9 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -2018,6 +2018,10 @@ static int acpi_smmu_get_data(u32 model, struct arm_smmu_device *smmu) smmu->version = ARM_SMMU_V1; smmu->model = GENERIC_SMMU; break; + case ACPI_IORT_SMMU_CORELINK_MMU401: + smmu->version = ARM_SMMU_V1_64K; + smmu->model = GENERIC_SMMU; + break; case ACPI_IORT_SMMU_V2: smmu->version = ARM_SMMU_V2; smmu->model = GENERIC_SMMU; @@ -2026,6 +2030,10 @@ static int acpi_smmu_get_data(u32 model, struct arm_smmu_device *smmu) smmu->version = ARM_SMMU_V2; smmu->model = ARM_MMU500; break; + case ACPI_IORT_SMMU_CAVIUM_SMMUV2: + smmu->version = ARM_SMMU_V2; + smmu->model = CAVIUM_SMMUV2; + break; default: ret = -ENODEV; } -- 2.12.2.dirty
Re: [patch 11/18] iommu/of: Adjust system_state check
On 14/05/17 19:27, Thomas Gleixner wrote: > To enable smp_processor_id() and might_sleep() debug checks earlier, it's > required to add system states between SYSTEM_BOOTING and SYSTEM_RUNNING. > > Adjust the system_state check in of_iommu_driver_present() to handle the > extra states. FWIW, Acked-by: Robin Murphy > Signed-off-by: Thomas Gleixner > Cc: Joerg Roedel > Cc: io...@lists.linux-foundation.org > --- > drivers/iommu/of_iommu.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > --- a/drivers/iommu/of_iommu.c > +++ b/drivers/iommu/of_iommu.c > @@ -103,7 +103,7 @@ static bool of_iommu_driver_present(stru >* it never will be. We don't want to defer indefinitely, nor attempt >* to dereference __iommu_of_table after it's been freed. >*/ > - if (system_state > SYSTEM_BOOTING) > + if (system_state >= SYSTEM_RUNNING) > return false; > > return of_match_node(&__iommu_of_table, np); >
Re: [PATCH 27/33] dma-direct: use node local allocations for coherent memory
On 10/01/18 15:30, Christoph Hellwig wrote: On Wed, Jan 10, 2018 at 12:06:22PM +, Robin Murphy wrote: On 10/01/18 08:00, Christoph Hellwig wrote: To preserve the x86 behavior. And combined with patch 10/22 of the SWIOTLB refactoring, this means SWIOTLB allocations will also end up NUMA-aware, right? Great, that's what we want on arm64 too :) Well, only for swiotlb allocations that can be satisfied by dma_direct_alloc. If we actually have to fall back to the swiotlb buffers there is not node affinity yet. Yeah, when I looked into it I reached the conclusion that per-node bounce buffers probably weren't worth it - if you have to bounce you've already pretty much lost the performance game, and if the CPU doing the bouncing happens to be on a different node from the device you've certainly lost either way. Per-node CMA zones we definitely *would* like, but that's a future problem (it looks technically feasible without huge infrastructure changes, but fiddly). Robin.
Re: [PATCH 31/33] dma-direct: reject too small dma masks
On 10/01/18 15:32, Christoph Hellwig wrote: On Wed, Jan 10, 2018 at 11:49:34AM +, Robin Murphy wrote: +#ifdef CONFIG_ZONE_DMA + if (mask < DMA_BIT_MASK(ARCH_ZONE_DMA_BITS)) + return 0; +#else + /* +* Because 32-bit DMA masks are so common we expect every architecture +* to be able to satisfy them - either by not supporting more physical +* memory, or by providing a ZONE_DMA32. If neither is the case, the +* architecture needs to use an IOMMU instead of the direct mapping. +*/ + if (mask < DMA_BIT_MASK(32)) + return 0; Do you think it's worth the effort to be a little more accommodating here? i.e.: return dma_max_pfn(dev) >= max_pfn; We seem to have a fair few 28-31 bit masks for older hardware which probably associates with host systems packing equivalently small amounts of RAM. And those devices don't have a ZONE_DMA? I think we could do something like that, but I'd rather have it as a separate commit with a good explanation. Maybe you can just send on on top of the series? Good point - other than the IXP4xx platform and possibly the Broadcom network drivers, it's probably only x86-relevant stuff where the concern is moot. Let's just keep the simple assumption then, until actually proven otherwise. Robin.
Re: [PATCH 10/22] swiotlb: refactor coherent buffer allocation
On 10/01/18 15:46, Christoph Hellwig wrote: On Wed, Jan 10, 2018 at 12:22:18PM +, Robin Murphy wrote: + if (phys_addr == SWIOTLB_MAP_ERROR) + goto out_warn; -/* Confirm address can be DMA'd by device */ - if (dev_addr + size - 1 > dma_mask) { - printk("hwdev DMA mask = 0x%016Lx, dev_addr = 0x%016Lx\n", - (unsigned long long)dma_mask, - (unsigned long long)dev_addr); + *dma_handle = swiotlb_phys_to_dma(dev, phys_addr); nit: this should probably go after the dma_coherent_ok() check (as with the original logic). But the originall logic also needs the dma_addr_t for the dma_coherent_ok check: dev_addr = swiotlb_phys_to_dma(hwdev, paddr); /* Confirm address can be DMA'd by device */ if (dev_addr + size - 1 > dma_mask) { ... goto err_warn; } or do you mean assining to *dma_handle? The dma_handle is not valid for a failure return, so I don't think this should matter. Yeah, only the assignment - as I said, it's just a stylistic nit; no big deal either way. + if (ret) { + *dma_handle = swiotlb_virt_to_bus(hwdev, ret); + if (dma_coherent_ok(hwdev, *dma_handle, size)) { + memset(ret, 0, size); + return ret; + } Aren't we leaking the pages here? Yes, that free_pages got lost somewhere in the rebases, I've added it back. Cool. Robin.
Re: [PATCH 21/22] arm64: replace ZONE_DMA with ZONE_DMA32
On 10/01/18 15:55, Christoph Hellwig wrote: On Wed, Jan 10, 2018 at 04:55:17PM +0100, Christoph Hellwig wrote: On Wed, Jan 10, 2018 at 12:58:14PM +, Robin Murphy wrote: On 10/01/18 08:09, Christoph Hellwig wrote: arm64 uses ZONE_DMA for allocations below 32-bits. These days we name the zone for that ZONE_DMA32, which will allow to use the dma-direct and generic swiotlb code as-is, so rename it. I do wonder if we could also "upgrade" GFP_DMA to GFP_DMA32 somehow when !ZONE_DMA - there are almost certainly arm64 drivers out there using a combination of GFP_DMA and streaming mappings which will no longer get the guaranteed 32-bit addresses they expect after this. I'm not sure quite how feasible that is, though :/ I can't find anything obvious in the tree. The alternative would be to keep ZONE_DMA and set ARCH_ZONE_DMA_BITS. That said, I do agree that this is an appropriate change (the legacy of GFP_DMA is obviously horrible), so, provided we get plenty of time to find and fix the fallout when it lands: Reviewed-by: Robin Murphy I was hoping to get this into 4.15. What would be proper time to fix the fallout? Err, 4.16 of course. Hee hee - cramming it into 4.15 is exactly what I wouldn't want to do, even if Linus would accept it :) Landing it this merge window for 4.16-rc1 sounds good if we can manage that. Robin.
Re: [PATCH 08/22] swiotlb: wire up ->dma_supported in swiotlb_dma_ops
On 10/01/18 15:35, Christoph Hellwig wrote: On Wed, Jan 10, 2018 at 12:16:15PM +, Robin Murphy wrote: On 10/01/18 08:09, Christoph Hellwig wrote: To properly reject too small DMA masks based on the addressability of the bounce buffer. I reckon this is self-evident enough that it should simply be squashed into the previous patch. x86 didn't wire it up before, so I want a clear blaimpoint for this change instead of mixing it up. That almost makes sense, if x86 were using this generic swiotlb_dma_ops already. AFAICS it's only ia64, unicore and tile who end up using it, and they all had swiotlb_dma_supported hooked up to begin with. Am I missing something? If regressions are going to happen, they'll surely point at whichever commit pulls the ops into the relevant arch code - there doesn't seem to be a great deal of value in having a piecemeal history of said ops *before* that point. Robin.
Re: [PATCH 1/9] iommu/of: Drop early initialisation hooks
Hi Jeffy, On 11/01/18 11:14, JeffyChen wrote: Hi Marek, Thanks for your reply. On 01/11/2018 05:40 PM, Marek Szyprowski wrote: Hi Jeffy, On 2018-01-11 09:22, Jeffy Chen wrote: With the probe-deferral mechanism, early initialisation hooks are no longer needed. Suggested-by: Robin Murphy In fact, shortly after I said that I had a "how hard can it be?" moment and took a crack at it myself - sorry, I should probably have cc'd you on that series[1]. Signed-off-by: Jeffy Chen --- drivers/iommu/arm-smmu-v3.c | 2 +- drivers/iommu/arm-smmu.c | 12 ++-- drivers/iommu/exynos-iommu.c | 2 +- For Exynos IOMMU: Acked-by: Marek Szyprowski IPMMU and MSM IOMMU are no longer multi-platform safe after this patch. It breaks them in the same way as my commit 928055a01b3f ("iommu/exynos: Remove custom platform device registration code") broke Exynos IOMMU. You need a similar fix for them: https://www.spinics.net/lists/arm-kernel/msg627648.html hmmm, right, i did saw this fix in the rockchip iommu driver too. and there're also some other iommu drivers put bus_set_iommu in their probe() to avoid that. maybe we can do it in the iommu framework? for example: 1/ add a bus type member to struct iommu_device 2/ and a iommu_device_set_bus() 3/ do the bus_set_iommu stuff in iommu_device_register() 4/ undo bus_set_iommu in iommu_device_unregister() Ultimately we'd like to get rid of the bus relationship altogether, so I don't think it's really worth adding more infrastructure around it. Having of-iommu-based drivers set bus ops at probe time, and others conditionally from an initcall, is pretty clean and simple, so I'd rather stick with that approach for now. Robin. [1] https://lists.linuxfoundation.org/pipermail/iommu/2018-January/025395.html
Re: [PATCH] IIO: ADC: stm32-dfsdm: avoid unused-variable warning
On 11/01/18 10:34, Arnd Bergmann wrote: Building with CONFIG_OF disabled produces a compiler warning: drivers/iio/adc/stm32-dfsdm-core.c: In function 'stm32_dfsdm_probe': drivers/iio/adc/stm32-dfsdm-core.c:245:22: error: unused variable 'pnode' [-Werror=unused-variable] This removes the variable and open-codes it in the only place it gets used to avoid that warning. Fixes: bed73904e76f ("IIO: ADC: add stm32 DFSDM core support") Signed-off-by: Arnd Bergmann --- drivers/iio/adc/stm32-dfsdm-core.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/iio/adc/stm32-dfsdm-core.c b/drivers/iio/adc/stm32-dfsdm-core.c index 72427414db7f..6cd655f8239b 100644 --- a/drivers/iio/adc/stm32-dfsdm-core.c +++ b/drivers/iio/adc/stm32-dfsdm-core.c @@ -242,7 +242,6 @@ MODULE_DEVICE_TABLE(of, stm32_dfsdm_of_match); static int stm32_dfsdm_probe(struct platform_device *pdev) { struct dfsdm_priv *priv; - struct device_node *pnode = pdev->dev.of_node; const struct of_device_id *of_id; const struct stm32_dfsdm_dev_data *dev_data; struct stm32_dfsdm *dfsdm; @@ -254,7 +253,7 @@ static int stm32_dfsdm_probe(struct platform_device *pdev) priv->pdev = pdev; - of_id = of_match_node(stm32_dfsdm_of_match, pnode); + of_id = of_match_node(stm32_dfsdm_of_match, pdev->dev.of_node); if (!of_id->data) { dev_err(&pdev->dev, "Data associated to device is missing\n"); return -EINVAL; FWIW, it looks like this whole lot could be cleaned up by using of_device_get_match_data(). Robin.
Re: [PATCH 2/9] iommu/rockchip: Fix error handling in attach
On 11/01/18 08:22, Jeffy Chen wrote: From: Tomasz Figa Currently if the driver encounters an error while attaching device, it will leave the IOMMU in an inconsistent state. Even though it shouldn't really happen in reality, let's just add proper error path to keep things consistent. Signed-off-by: Tomasz Figa Signed-off-by: Jeffy Chen --- drivers/iommu/rockchip-iommu.c | 26 +- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c index 9d991c2d8767..ee805e1dfba7 100644 --- a/drivers/iommu/rockchip-iommu.c +++ b/drivers/iommu/rockchip-iommu.c @@ -826,17 +826,10 @@ static int rk_iommu_attach_device(struct iommu_domain *domain, ret = rk_iommu_force_reset(iommu); if (ret) - return ret; + goto err_disable_stall; iommu->domain = domain; - for (i = 0; i < iommu->num_irq; i++) { - ret = devm_request_irq(iommu->dev, iommu->irq[i], rk_iommu_irq, - IRQF_SHARED, dev_name(dev), iommu); - if (ret) - return ret; - } - for (i = 0; i < iommu->num_mmu; i++) { rk_iommu_write(iommu->bases[i], RK_MMU_DTE_ADDR, rk_domain->dt_dma); @@ -844,9 +837,16 @@ static int rk_iommu_attach_device(struct iommu_domain *domain, rk_iommu_write(iommu->bases[i], RK_MMU_INT_MASK, RK_MMU_IRQ_MASK); } + for (i = 0; i < iommu->num_irq; i++) { + ret = devm_request_irq(iommu->dev, iommu->irq[i], rk_iommu_irq, + IRQF_SHARED, dev_name(dev), iommu); Why aren't we simply requesting the IRQ once in rk_iommu_probe()? Given that the hardware doesn't handle multiple translation contexts, there doesn't seem to be much point in being this dynamic about it. Robin. + if (ret) + goto err_free_irq; + } + ret = rk_iommu_enable_paging(iommu); if (ret) - return ret; + goto err_free_irq; spin_lock_irqsave(&rk_domain->iommus_lock, flags); list_add_tail(&iommu->node, &rk_domain->iommus); @@ -857,6 +857,14 @@ static int rk_iommu_attach_device(struct iommu_domain *domain, rk_iommu_disable_stall(iommu); return 0; + +err_free_irq: + while (i--) + devm_free_irq(iommu->dev, iommu->irq[i], iommu); +err_disable_stall: + rk_iommu_disable_stall(iommu); + + return ret; } static void rk_iommu_detach_device(struct iommu_domain *domain,
Re: [PATCH v2 10/16] arm64: KVM: Report SMCCC_ARCH_WORKAROUND_1 BP hardening support
On 29/01/18 17:45, Marc Zyngier wrote: A new feature of SMCCC 1.1 is that it offers firmware-based CPU workarounds. In particular, SMCCC_ARCH_WORKAROUND_1 provides BP hardening for CVE-2017-5715. If the host has some mitigation for this issue, report that we deal with it using SMCCC_ARCH_WORKAROUND_1, as we apply the host workaround on every guest exit. Signed-off-by: Marc Zyngier --- include/linux/arm-smccc.h | 5 + virt/kvm/arm/psci.c | 17 +++-- 2 files changed, 20 insertions(+), 2 deletions(-) diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h index dc68aa5a7261..e1ef944ef1da 100644 --- a/include/linux/arm-smccc.h +++ b/include/linux/arm-smccc.h @@ -73,6 +73,11 @@ ARM_SMCCC_SMC_32,\ 0, 1) +#define ARM_SMCCC_ARCH_WORKAROUND_1 \ + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \ + ARM_SMCCC_SMC_32,\ + 0, 0x8000) + #ifndef __ASSEMBLY__ #include diff --git a/virt/kvm/arm/psci.c b/virt/kvm/arm/psci.c index a021b62ed762..5677d16abc71 100644 --- a/virt/kvm/arm/psci.c +++ b/virt/kvm/arm/psci.c @@ -407,14 +407,27 @@ static int kvm_psci_call(struct kvm_vcpu *vcpu) int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) { u32 func_id = smccc_get_function(vcpu); - u32 val; + u32 val, feature; switch (func_id) { case ARM_SMCCC_VERSION_FUNC_ID: val = ARM_SMCCC_VERSION_1_1; break; case ARM_SMCCC_ARCH_FEATURES_FUNC_ID: - val = -1; /* Nothing supported yet */ Conceptually, might it still make sense to initialise val to NOT_SUPPORTED here, then overwrite it if and when a feature actually is present? It would in this case save a few lines as well, but I know multiple assignment can be one of those religious issues, so I'm not too fussed either way. Robin. + feature = smccc_get_arg1(vcpu); + switch(feature) { +#ifdef CONFIG_ARM64 + case ARM_SMCCC_ARCH_WORKAROUND_1: + if (cpus_have_const_cap(ARM64_HARDEN_BRANCH_PREDICTOR)) + val = 0; + else + val = -1; + break; +#endif + default: + val = -1; + break; + } break; default: return kvm_psci_call(vcpu);
Re: [PATCH v2 13/16] firmware/psci: Expose SMCCC version through psci_ops
On 29/01/18 17:45, Marc Zyngier wrote: Since PSCI 1.0 allows the SMCCC version to be (indirectly) probed, let's do that at boot time, and expose the version of the calling convention as part of the psci_ops structure. Signed-off-by: Marc Zyngier --- drivers/firmware/psci.c | 21 + include/linux/psci.h| 6 ++ 2 files changed, 27 insertions(+) diff --git a/drivers/firmware/psci.c b/drivers/firmware/psci.c index e9493da2b111..dd035aaa1c33 100644 --- a/drivers/firmware/psci.c +++ b/drivers/firmware/psci.c @@ -511,6 +511,26 @@ static void __init psci_init_migrate(void) pr_info("Trusted OS resident on physical CPU 0x%lx\n", cpuid); } +static void __init psci_init_smccc(u32 ver) +{ + int feature = PSCI_RET_NOT_SUPPORTED; + + if (PSCI_VERSION_MAJOR(ver) >= 1) + feature = psci_features(ARM_SMCCC_VERSION_FUNC_ID); + + if (feature == PSCI_RET_NOT_SUPPORTED) { + psci_ops.variant = SMCCC_VARIANT_1_0; Presumably at some point in the future we might want to update PSCI itself to use 'fast' calls if available, at which point this initialisation is a bit backwards given that you've already made a call using *some* convention. Even I think relying on the enum value working out OK is a bit subtle, so perhaps it's best to default-initialise psci_ops.variant to 1.0 (either statically, or dynamically before the first call), then only update it to 1.1 if and when we discover that. Robin. + } else { + ver = invoke_psci_fn(ARM_SMCCC_VERSION_FUNC_ID, 0, 0, 0); + if (ver != ARM_SMCCC_VERSION_1_1) + psci_ops.variant = SMCCC_VARIANT_1_0; + else + psci_ops.variant = SMCCC_VARIANT_1_1; + } + + pr_info("SMC Calling Convention v1.%d\n", psci_ops.variant); +} + static void __init psci_0_2_set_functions(void) { pr_info("Using standard PSCI v0.2 function IDs\n"); @@ -557,6 +577,7 @@ static int __init psci_probe(void) psci_0_2_set_functions(); psci_init_migrate(); + psci_init_smccc(ver); if (PSCI_VERSION_MAJOR(ver) >= 1) { psci_init_cpu_suspend(); diff --git a/include/linux/psci.h b/include/linux/psci.h index f2679e5faa4f..83fd16a37be3 100644 --- a/include/linux/psci.h +++ b/include/linux/psci.h @@ -31,6 +31,11 @@ enum psci_conduit { PSCI_CONDUIT_HVC, }; +enum smccc_variant { + SMCCC_VARIANT_1_0, + SMCCC_VARIANT_1_1, +}; + struct psci_operations { u32 (*get_version)(void); int (*cpu_suspend)(u32 state, unsigned long entry_point); @@ -41,6 +46,7 @@ struct psci_operations { unsigned long lowest_affinity_level); int (*migrate_info_type)(void); enum psci_conduit conduit; + enum smccc_variant variant; }; extern struct psci_operations psci_ops;
Re: [PATCH v2 04/16] arm/arm64: KVM: Add PSCI_VERSION helper
On 29/01/18 17:45, Marc Zyngier wrote: As we're about to trigger a PSCI version explosion, it doesn't hurt to introduce a PSCI_VERSION helper that is going to be used everywhere. Signed-off-by: Marc Zyngier --- include/kvm/arm_psci.h | 5 +++-- virt/kvm/arm/psci.c| 2 +- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/include/kvm/arm_psci.h b/include/kvm/arm_psci.h index 2042bb909474..3a408c846c09 100644 --- a/include/kvm/arm_psci.h +++ b/include/kvm/arm_psci.h @@ -18,8 +18,9 @@ #ifndef __KVM_ARM_PSCI_H__ #define __KVM_ARM_PSCI_H__ -#define KVM_ARM_PSCI_0_1 1 -#define KVM_ARM_PSCI_0_2 2 +#define PSCI_VERSION(x,y) x) & 0x7fff) << 16) | ((y) & 0x)) I see virt/kvm/arm/psci.c already pulls in uapi/linux/psci.h, so maybe this guy could go in there alongside the other PSCI_VERSION_* gubbins? Robin. +#define KVM_ARM_PSCI_0_1 PSCI_VERSION(0, 1) +#define KVM_ARM_PSCI_0_2 PSCI_VERSION(0, 2) int kvm_psci_version(struct kvm_vcpu *vcpu); int kvm_psci_call(struct kvm_vcpu *vcpu); diff --git a/virt/kvm/arm/psci.c b/virt/kvm/arm/psci.c index b322e46fd142..c00bb324e14e 100644 --- a/virt/kvm/arm/psci.c +++ b/virt/kvm/arm/psci.c @@ -222,7 +222,7 @@ static int kvm_psci_0_2_call(struct kvm_vcpu *vcpu) * Bits[31:16] = Major Version = 0 * Bits[15:0] = Minor Version = 2 */ - val = 2; + val = KVM_ARM_PSCI_0_2; break; case PSCI_0_2_FN_CPU_SUSPEND: case PSCI_0_2_FN64_CPU_SUSPEND:
Re: [PATCH v6 2/6] iommu/arm-smmu: Add pm_runtime/sleep ops
On 19/01/18 11:43, Vivek Gautam wrote: From: Sricharan R The smmu needs to be functional only when the respective master's using it are active. The device_link feature helps to track such functional dependencies, so that the iommu gets powered when the master device enables itself using pm_runtime. So by adapting the smmu driver for runtime pm, above said dependency can be addressed. This patch adds the pm runtime/sleep callbacks to the driver and also the functions to parse the smmu clocks from DT and enable them in resume/suspend. Signed-off-by: Sricharan R Signed-off-by: Archit Taneja [vivek: Clock rework to request bulk of clocks] Signed-off-by: Vivek Gautam --- drivers/iommu/arm-smmu.c | 55 ++-- 1 file changed, 53 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 78d4c6b8f1ba..21acffe91a1c 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -48,6 +48,7 @@ #include #include #include +#include #include #include @@ -205,6 +206,9 @@ struct arm_smmu_device { u32 num_global_irqs; u32 num_context_irqs; unsigned int*irqs; + struct clk_bulk_data*clocks; + int num_clks; + const char * const *clk_names; This seems unnecessary, as we use it a grand total of of once, during initialisation when we have the source data directly to hand. Just pass data->clks into arm_smmu_init_clks() as an additional argument. Otherwise, I think this looks reasonable; it's about as unobtrusive as it's going to get. Robin. u32 cavium_id_base; /* Specific to Cavium */ @@ -1685,6 +1689,25 @@ static int arm_smmu_id_size_to_bits(int size) } } +static int arm_smmu_init_clocks(struct arm_smmu_device *smmu) +{ + int i; + int num = smmu->num_clks; + + if (num < 1) + return 0; + + smmu->clocks = devm_kcalloc(smmu->dev, num, + sizeof(*smmu->clocks), GFP_KERNEL); + if (!smmu->clocks) + return -ENOMEM; + + for (i = 0; i < num; i++) + smmu->clocks[i].id = smmu->clk_names[i]; + + return devm_clk_bulk_get(smmu->dev, num, smmu->clocks); +} + static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu) { unsigned long size; @@ -1897,10 +1920,12 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu) struct arm_smmu_match_data { enum arm_smmu_arch_version version; enum arm_smmu_implementation model; + const char * const *clks; + int num_clks; }; #define ARM_SMMU_MATCH_DATA(name, ver, imp) \ -static struct arm_smmu_match_data name = { .version = ver, .model = imp } +static const struct arm_smmu_match_data name = { .version = ver, .model = imp } ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU); ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU); @@ -2001,6 +2026,8 @@ static int arm_smmu_device_dt_probe(struct platform_device *pdev, data = of_device_get_match_data(dev); smmu->version = data->version; smmu->model = data->model; + smmu->clk_names = data->clks; + smmu->num_clks = data->num_clks; parse_driver_options(smmu); @@ -2099,6 +2126,10 @@ static int arm_smmu_device_probe(struct platform_device *pdev) smmu->irqs[i] = irq; } + err = arm_smmu_init_clocks(smmu); + if (err) + return err; + err = arm_smmu_device_cfg_probe(smmu); if (err) return err; @@ -2197,7 +2228,27 @@ static int __maybe_unused arm_smmu_pm_resume(struct device *dev) return 0; } -static SIMPLE_DEV_PM_OPS(arm_smmu_pm_ops, NULL, arm_smmu_pm_resume); +static int __maybe_unused arm_smmu_runtime_resume(struct device *dev) +{ + struct arm_smmu_device *smmu = dev_get_drvdata(dev); + + return clk_bulk_prepare_enable(smmu->num_clks, smmu->clocks); +} + +static int __maybe_unused arm_smmu_runtime_suspend(struct device *dev) +{ + struct arm_smmu_device *smmu = dev_get_drvdata(dev); + + clk_bulk_disable_unprepare(smmu->num_clks, smmu->clocks); + + return 0; +} + +static const struct dev_pm_ops arm_smmu_pm_ops = { + SET_SYSTEM_SLEEP_PM_OPS(NULL, arm_smmu_pm_resume) + SET_RUNTIME_PM_OPS(arm_smmu_runtime_suspend, + arm_smmu_runtime_resume, NULL) +}; static struct platform_driver arm_smmu_driver = { .driver = {
Re: [PATCH v6 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
On 19/01/18 11:43, Vivek Gautam wrote: From: Sricharan R The smmu device probe/remove and add/remove master device callbacks gets called when the smmu is not linked to its master, that is without the context of the master device. So calling runtime apis in those places separately. Signed-off-by: Sricharan R [vivek: Cleanup pm runtime calls] Signed-off-by: Vivek Gautam --- drivers/iommu/arm-smmu.c | 45 + 1 file changed, 41 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 21acffe91a1c..95478bfb182c 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -914,11 +914,15 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain) struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); struct arm_smmu_device *smmu = smmu_domain->smmu; struct arm_smmu_cfg *cfg = &smmu_domain->cfg; - int irq; + int ret, irq; if (!smmu || domain->type == IOMMU_DOMAIN_IDENTITY) return; + ret = pm_runtime_get_sync(smmu->dev); + if (ret) + return; + /* * Disable the context bank and free the page tables before freeing * it. @@ -933,6 +937,8 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain) free_io_pgtable_ops(smmu_domain->pgtbl_ops); __arm_smmu_free_bitmap(smmu->context_map, cfg->cbndx); + + pm_runtime_put_sync(smmu->dev); } static struct iommu_domain *arm_smmu_domain_alloc(unsigned type) @@ -1408,12 +1414,20 @@ static int arm_smmu_add_device(struct device *dev) while (i--) cfg->smendx[i] = INVALID_SMENDX; - ret = arm_smmu_master_alloc_smes(dev); + ret = pm_runtime_get_sync(smmu->dev); if (ret) goto out_cfg_free; + ret = arm_smmu_master_alloc_smes(dev); + if (ret) { + pm_runtime_put_sync(smmu->dev); + goto out_cfg_free; Please keep to the existing pattern and put this on the cleanup path with a new label, rather than inline. + } + iommu_device_link(&smmu->iommu, dev); + pm_runtime_put_sync(smmu->dev); + return 0; out_cfg_free: @@ -1428,7 +1442,7 @@ static void arm_smmu_remove_device(struct device *dev) struct iommu_fwspec *fwspec = dev->iommu_fwspec; struct arm_smmu_master_cfg *cfg; struct arm_smmu_device *smmu; - + int ret; if (!fwspec || fwspec->ops != &arm_smmu_ops) return; @@ -1436,8 +1450,21 @@ static void arm_smmu_remove_device(struct device *dev) cfg = fwspec->iommu_priv; smmu = cfg->smmu; + /* +* The device link between the master device and +* smmu is already purged at this point. +* So enable the power to smmu explicitly. +*/ I don't understand this comment, especially since we don't even introduce device links until the following patch... :/ + + ret = pm_runtime_get_sync(smmu->dev); + if (ret) + return; + iommu_device_unlink(&smmu->iommu, dev); arm_smmu_master_free_smes(fwspec); + + pm_runtime_put_sync(smmu->dev); + iommu_group_remove_device(dev); kfree(fwspec->iommu_priv); iommu_fwspec_free(dev); @@ -2130,6 +2157,14 @@ static int arm_smmu_device_probe(struct platform_device *pdev) if (err) return err; + platform_set_drvdata(pdev, smmu); + + pm_runtime_enable(dev); + + err = pm_runtime_get_sync(dev); + if (err) + return err; + err = arm_smmu_device_cfg_probe(smmu); if (err) return err; @@ -2171,9 +2206,9 @@ static int arm_smmu_device_probe(struct platform_device *pdev) return err; } - platform_set_drvdata(pdev, smmu); arm_smmu_device_reset(smmu); arm_smmu_test_smr_masks(smmu); + pm_runtime_put_sync(dev); /* * For ACPI and generic DT bindings, an SMMU will be probed before @@ -2212,6 +2247,8 @@ static int arm_smmu_device_remove(struct platform_device *pdev) /* Turn the thing off */ writel(sCR0_CLIENTPD, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0); + pm_runtime_force_suspend(smmu->dev); Why do we need this? I guess it might be a Qualcomm-ism as I don't see anyone else calling it from .remove other than a couple of other qcom_* drivers. Given that we only get here during system shutdown (or the root user intentionally pissing about with driver unbinding), it doesn't seem like a point where power saving really matters all that much. I'd also naively expect that anything this device was the last consumer off would get turned off by core code anyway once it's removed, but maybe things aren't that slick; I dunno :/ Robin. + return 0; }
Re: [PATCH v6 4/6] iommu/arm-smmu: Add the device_link between masters and smmu
On 19/01/18 11:43, Vivek Gautam wrote: From: Sricharan R Finally add the device link between the master device and smmu, so that the smmu gets runtime enabled/disabled only when the master needs it. This is done from add_device callback which gets called once when the master is added to the smmu. Don't we need to balance this with a device_link_del() in .remove_device (like exynos-iommu does)? Robin. Signed-off-by: Sricharan R --- drivers/iommu/arm-smmu.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 95478bfb182c..33bbcfedb896 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -1367,6 +1367,7 @@ static int arm_smmu_add_device(struct device *dev) struct arm_smmu_device *smmu; struct arm_smmu_master_cfg *cfg; struct iommu_fwspec *fwspec = dev->iommu_fwspec; + struct device_link *link; int i, ret; if (using_legacy_binding) { @@ -1428,6 +1429,16 @@ static int arm_smmu_add_device(struct device *dev) pm_runtime_put_sync(smmu->dev); + /* +* Establish the link between smmu and master, so that the +* smmu gets runtime enabled/disabled as per the master's +* needs. +*/ + link = device_link_add(dev, smmu->dev, DL_FLAG_PM_RUNTIME); + if (!link) + dev_warn(smmu->dev, "Unable to create device link between %s and %s\n", +dev_name(smmu->dev), dev_name(dev)); + return 0; out_cfg_free:
Re: [PATCH 1/2] iommu: Fix iommu_unmap and iommu_unmap_fast return type
Hi Suravee, On 31/01/18 01:48, Suravee Suthikulpanit wrote: Currently, iommu_unmap and iommu_unmap_fast return unmapped pages with size_t. However, the actual value returned could be error codes (< 0), which can be misinterpreted as large number of unmapped pages. Therefore, change the return type to ssize_t. Cc: Joerg Roedel Cc: Alex Williamson Signed-off-by: Suravee Suthikulpanit --- drivers/iommu/amd_iommu.c | 6 +++--- drivers/iommu/intel-iommu.c | 4 ++-- Er, there are a few more drivers than that implementing iommu_ops ;) It seems like it might be more sensible to fix the single instance of a driver returning -EINVAL (which appears to be a "should never happen if used correctly" kinda thing anyway) and leave the API-internal callback prototype as-is. I do agree the inconsistency of iommu_unmap() itself wants sorting, though (particularly the !IOMMU_API stubs which are wrong either way). Robin. drivers/iommu/iommu.c | 16 include/linux/iommu.h | 20 ++-- 4 files changed, 23 insertions(+), 23 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 7d5eb00..3609f51 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -3030,11 +3030,11 @@ static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova, return ret; } -static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova, - size_t page_size) +static ssize_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova, + size_t page_size) { struct protection_domain *domain = to_pdomain(dom); - size_t unmap_size; + ssize_t unmap_size; if (domain->mode == PAGE_MODE_NONE) return -EINVAL; diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 4a2de34..15ba866 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5068,8 +5068,8 @@ static int intel_iommu_map(struct iommu_domain *domain, return ret; } -static size_t intel_iommu_unmap(struct iommu_domain *domain, - unsigned long iova, size_t size) +static ssize_t intel_iommu_unmap(struct iommu_domain *domain, +unsigned long iova, size_t size) { struct dmar_domain *dmar_domain = to_dmar_domain(domain); struct page *freelist = NULL; diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 3de5c0b..8f7da8a 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1557,12 +1557,12 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova, } EXPORT_SYMBOL_GPL(iommu_map); -static size_t __iommu_unmap(struct iommu_domain *domain, - unsigned long iova, size_t size, - bool sync) +static ssize_t __iommu_unmap(struct iommu_domain *domain, +unsigned long iova, size_t size, +bool sync) { const struct iommu_ops *ops = domain->ops; - size_t unmapped_page, unmapped = 0; + ssize_t unmapped_page, unmapped = 0; unsigned long orig_iova = iova; unsigned int min_pagesz; @@ -1617,15 +1617,15 @@ static size_t __iommu_unmap(struct iommu_domain *domain, return unmapped; } -size_t iommu_unmap(struct iommu_domain *domain, - unsigned long iova, size_t size) +ssize_t iommu_unmap(struct iommu_domain *domain, + unsigned long iova, size_t size) { return __iommu_unmap(domain, iova, size, true); } EXPORT_SYMBOL_GPL(iommu_unmap); -size_t iommu_unmap_fast(struct iommu_domain *domain, - unsigned long iova, size_t size) +ssize_t iommu_unmap_fast(struct iommu_domain *domain, +unsigned long iova, size_t size) { return __iommu_unmap(domain, iova, size, false); } diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 41b8c57..78df048 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -199,8 +199,8 @@ struct iommu_ops { void (*detach_dev)(struct iommu_domain *domain, struct device *dev); int (*map)(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot); - size_t (*unmap)(struct iommu_domain *domain, unsigned long iova, -size_t size); + ssize_t (*unmap)(struct iommu_domain *domain, unsigned long iova, +size_t size); size_t (*map_sg)(struct iommu_domain *domain, unsigned long iova, struct scatterlist *sg, unsigned int nents, int prot); void (*flush_iotlb_all)(struct iommu_domain *domain); @@ -299,10 +299,10 @@ extern void iommu_detach_device(struct iommu_domain *domain, extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev); extern int iommu_ma
Re: [PATCH 1/2] iommu: Fix iommu_unmap and iommu_unmap_fast return type
On 01/02/18 05:03, Suravee Suthikulpanit wrote: Hi Robin, On 2/1/18 1:02 AM, Robin Murphy wrote: Hi Suravee, On 31/01/18 01:48, Suravee Suthikulpanit wrote: Currently, iommu_unmap and iommu_unmap_fast return unmapped pages with size_t. However, the actual value returned could be error codes (< 0), which can be misinterpreted as large number of unmapped pages. Therefore, change the return type to ssize_t. Cc: Joerg Roedel Cc: Alex Williamson Signed-off-by: Suravee Suthikulpanit --- drivers/iommu/amd_iommu.c | 6 +++--- drivers/iommu/intel-iommu.c | 4 ++-- Er, there are a few more drivers than that implementing iommu_ops ;) Ahh right. It seems like it might be more sensible to fix the single instance of a driver returning -EINVAL (which appears to be a "should never happen if used correctly" kinda thing anyway) and leave the API-internal callback prototype as-is. I do agree the inconsistency of iommu_unmap() itself wants sorting, though (particularly the !IOMMU_API stubs which are wrong either way). Robin. Make sense. I'll leave the API alone, and change the code to not returning error then. Actually, on a second look I think that check in amd_iommu is 99% redundant anyway, since PAGE_MODE_NONE is only normally set for IOMMU_DOMAIN_IDENTITY domains, thus iommu_unmap() would have bailed out from the __IOMMU_DOMAIN_PAGING check before ops->unmap could be called. AFAICS the only way to hit it at all is if a caller somehow managed to get hold of the dev_state->domain set up in amd_iommu_init_device(), then tried to unmap something from that, which seems such a very wrong thing to do it should probably just crash and burn with extreme prejudice anyway. Cheers, Robin.
Re: [PATCH v3 13/18] firmware/psci: Expose PSCI conduit
On 01/02/18 11:46, Marc Zyngier wrote: In order to call into the firmware to apply workarounds, it is useful to find out whether we're using HVC or SMC. Let's expose this through the psci_ops. Reviewed-by: Robin Murphy Acked-by: Lorenzo Pieralisi Signed-off-by: Marc Zyngier --- drivers/firmware/psci.c | 28 +++- include/linux/psci.h| 7 +++ 2 files changed, 30 insertions(+), 5 deletions(-) diff --git a/drivers/firmware/psci.c b/drivers/firmware/psci.c index 8b25d31e8401..e9493da2b111 100644 --- a/drivers/firmware/psci.c +++ b/drivers/firmware/psci.c @@ -59,7 +59,9 @@ bool psci_tos_resident_on(int cpu) return cpu == resident_cpu; } -struct psci_operations psci_ops; +struct psci_operations psci_ops = { + .conduit = PSCI_CONDUIT_NONE, +}; typedef unsigned long (psci_fn)(unsigned long, unsigned long, unsigned long, unsigned long); @@ -210,6 +212,22 @@ static unsigned long psci_migrate_info_up_cpu(void) 0, 0, 0); } +static void set_conduit(enum psci_conduit conduit) +{ + switch (conduit) { + case PSCI_CONDUIT_HVC: + invoke_psci_fn = __invoke_psci_fn_hvc; + break; + case PSCI_CONDUIT_SMC: + invoke_psci_fn = __invoke_psci_fn_smc; + break; + default: + WARN(1, "Unexpected PSCI conduit %d\n", conduit); + } + + psci_ops.conduit = conduit; +} + static int get_set_conduit_method(struct device_node *np) { const char *method; @@ -222,9 +240,9 @@ static int get_set_conduit_method(struct device_node *np) } if (!strcmp("hvc", method)) { - invoke_psci_fn = __invoke_psci_fn_hvc; + set_conduit(PSCI_CONDUIT_HVC); } else if (!strcmp("smc", method)) { - invoke_psci_fn = __invoke_psci_fn_smc; + set_conduit(PSCI_CONDUIT_SMC); } else { pr_warn("invalid \"method\" property: %s\n", method); return -EINVAL; @@ -654,9 +672,9 @@ int __init psci_acpi_init(void) pr_info("probing for conduit method from ACPI.\n"); if (acpi_psci_use_hvc()) - invoke_psci_fn = __invoke_psci_fn_hvc; + set_conduit(PSCI_CONDUIT_HVC); else - invoke_psci_fn = __invoke_psci_fn_smc; + set_conduit(PSCI_CONDUIT_SMC); return psci_probe(); } diff --git a/include/linux/psci.h b/include/linux/psci.h index f724fd8c78e8..f2679e5faa4f 100644 --- a/include/linux/psci.h +++ b/include/linux/psci.h @@ -25,6 +25,12 @@ bool psci_tos_resident_on(int cpu); int psci_cpu_init_idle(unsigned int cpu); int psci_cpu_suspend_enter(unsigned long index); +enum psci_conduit { + PSCI_CONDUIT_NONE, + PSCI_CONDUIT_SMC, + PSCI_CONDUIT_HVC, +}; + struct psci_operations { u32 (*get_version)(void); int (*cpu_suspend)(u32 state, unsigned long entry_point); @@ -34,6 +40,7 @@ struct psci_operations { int (*affinity_info)(unsigned long target_affinity, unsigned long lowest_affinity_level); int (*migrate_info_type)(void); + enum psci_conduit conduit; }; extern struct psci_operations psci_ops;
Re: [PATCH v3 14/18] firmware/psci: Expose SMCCC version through psci_ops
On 01/02/18 11:46, Marc Zyngier wrote: Since PSCI 1.0 allows the SMCCC version to be (indirectly) probed, let's do that at boot time, and expose the version of the calling convention as part of the psci_ops structure. Acked-by: Lorenzo Pieralisi Signed-off-by: Marc Zyngier --- drivers/firmware/psci.c | 19 +++ include/linux/psci.h| 6 ++ 2 files changed, 25 insertions(+) diff --git a/drivers/firmware/psci.c b/drivers/firmware/psci.c index e9493da2b111..8631906c414c 100644 --- a/drivers/firmware/psci.c +++ b/drivers/firmware/psci.c @@ -61,6 +61,7 @@ bool psci_tos_resident_on(int cpu) struct psci_operations psci_ops = { .conduit = PSCI_CONDUIT_NONE, + .smccc_version = SMCCC_VERSION_1_0, }; typedef unsigned long (psci_fn)(unsigned long, unsigned long, @@ -511,6 +512,23 @@ static void __init psci_init_migrate(void) pr_info("Trusted OS resident on physical CPU 0x%lx\n", cpuid); } +static void __init psci_init_smccc(u32 ver) +{ + int feature; + + feature = psci_features(ARM_SMCCC_VERSION_FUNC_ID); + + if (feature != PSCI_RET_NOT_SUPPORTED) { + ver = invoke_psci_fn(ARM_SMCCC_VERSION_FUNC_ID, 0, 0, 0); + if (ver != ARM_SMCCC_VERSION_1_1) + psci_ops.smccc_version = SMCCC_VERSION_1_0; AFAICS, unless you somehow run psci_probe() twice *and* have schizophrenic firmware, this assignment now does precisely nothing. With the condition flipped and the redundant else case removed (or an explanation of why I'm wrong...) Reviewed-by: Robin Murphy + else + psci_ops.smccc_version = SMCCC_VERSION_1_1; + } + + pr_info("SMC Calling Convention v1.%d\n", psci_ops.smccc_version); +} + static void __init psci_0_2_set_functions(void) { pr_info("Using standard PSCI v0.2 function IDs\n"); @@ -559,6 +577,7 @@ static int __init psci_probe(void) psci_init_migrate(); if (PSCI_VERSION_MAJOR(ver) >= 1) { + psci_init_smccc(ver); psci_init_cpu_suspend(); psci_init_system_suspend(); } diff --git a/include/linux/psci.h b/include/linux/psci.h index f2679e5faa4f..8b1b3b5935ab 100644 --- a/include/linux/psci.h +++ b/include/linux/psci.h @@ -31,6 +31,11 @@ enum psci_conduit { PSCI_CONDUIT_HVC, }; +enum smccc_version { + SMCCC_VERSION_1_0, + SMCCC_VERSION_1_1, +}; + struct psci_operations { u32 (*get_version)(void); int (*cpu_suspend)(u32 state, unsigned long entry_point); @@ -41,6 +46,7 @@ struct psci_operations { unsigned long lowest_affinity_level); int (*migrate_info_type)(void); enum psci_conduit conduit; + enum smccc_version smccc_version; }; extern struct psci_operations psci_ops;
Re: [PATCH v3 15/18] arm/arm64: smccc: Make function identifiers an unsigned quantity
On 01/02/18 11:46, Marc Zyngier wrote: Function identifiers are a 32bit, unsigned quantity. But we never tell so to the compiler, resulting in the following: 4ac: b26187e0mov x0, #0x8001 We thus rely on the firmware narrowing it for us, which is not always a reasonable expectation. I think technically it might be OK, since SMCCC states "A Function Identifier is passed in register W0.", which implies that a conforming implementation should also read w0, not x0, but it's certainly far easier to be completely right than to justify being possibly wrong. Reviewed-by: Robin Murphy Cc: sta...@vger.kernel.org Reported-by: Ard Biesheuvel Acked-by: Ard Biesheuvel Tested-by: Ard Biesheuvel Signed-off-by: Marc Zyngier --- include/linux/arm-smccc.h | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h index e1ef944ef1da..dd44d8458c04 100644 --- a/include/linux/arm-smccc.h +++ b/include/linux/arm-smccc.h @@ -14,14 +14,16 @@ #ifndef __LINUX_ARM_SMCCC_H #define __LINUX_ARM_SMCCC_H +#include + /* * This file provides common defines for ARM SMC Calling Convention as * specified in * http://infocenter.arm.com/help/topic/com.arm.doc.den0028a/index.html */ -#define ARM_SMCCC_STD_CALL 0 -#define ARM_SMCCC_FAST_CALL1 +#define ARM_SMCCC_STD_CALL _AC(0,U) +#define ARM_SMCCC_FAST_CALL_AC(1,U) #define ARM_SMCCC_TYPE_SHIFT 31 #define ARM_SMCCC_SMC_32 0
Re: [PATCH v3 16/18] arm/arm64: smccc: Implement SMCCC v1.1 inline primitive
)r1 asm("r1") = a1; \ + register typeof(a2)r2 asm("r2") = a2; \ + register typeof(a3)r3 asm("r3") = a3 + +#define __declare_arg_4(a0, a1, a2, a3, a4, res) \ + __declare_arg_3(a0, a1, a2, a3, res); \ + register typeof(a4) r4 asm("r4") = a4 + +#define __declare_arg_5(a0, a1, a2, a3, a4, a5, res) \ + __declare_arg_4(a0, a1, a2, a3, a4, res); \ + register typeof(a5) r5 asm("r5") = a5 + +#define __declare_arg_6(a0, a1, a2, a3, a4, a5, a6, res) \ + __declare_arg_5(a0, a1, a2, a3, a4, a5, res); \ + register typeof(a6) r6 asm("r6") = a6 + +#define __declare_arg_7(a0, a1, a2, a3, a4, a5, a6, a7, res) \ + __declare_arg_6(a0, a1, a2, a3, a4, a5, a6, res); \ + register typeof(a7) r7 asm("r7") = a7 + +#define ___declare_args(count, ...) __declare_arg_ ## count(__VA_ARGS__) +#define __declare_args(count, ...) ___declare_args(count, __VA_ARGS__) + +#define ___constraints(count) \ + : __constraint_write_ ## count \ + : __constraint_read_ ## count \ + : "memory" +#define __constraints(count) ___constraints(count) + +/* + * We have an output list that is not necessarily used, and GCC feels + * entitled to optimise the whole sequence away. "volatile" is what + * makes it stick. + */ +#define __arm_smccc_1_1(inst, ...) \ + do {\ + __declare_args(__count_args(__VA_ARGS__), __VA_ARGS__); \ + asm volatile(inst "\n"\ +__constraints(__count_args(__VA_ARGS__))); \ + if (___res) \ + *___res = (typeof(*___res)){r0, r1, r2, r3};\ ...especially since there's no obvious indication of where it comes from when you're looking here. Otherwise, though, this has already turned out pretty sleek; Reviewed-by: Robin Murphy + } while (0) + +/* + * arm_smccc_1_1_smc() - make an SMCCC v1.1 compliant SMC call + * + * This is a variadic macro taking one to eight source arguments, and + * an optional return structure. + * + * @a0-a7: arguments passed in registers 0 to 7 + * @res: result values from registers 0 to 3 + * + * This macro is used to make SMC calls following SMC Calling Convention v1.1. + * The content of the supplied param are copied to registers 0 to 7 prior + * to the SMC instruction. The return values are updated with the content + * from register 0 to 3 on return from the SMC instruction if not NULL. + */ +#define arm_smccc_1_1_smc(...) __arm_smccc_1_1(SMCCC_SMC_INST, __VA_ARGS__) + +/* + * arm_smccc_1_1_hvc() - make an SMCCC v1.1 compliant HVC call + * + * This is a variadic macro taking one to eight source arguments, and + * an optional return structure. + * + * @a0-a7: arguments passed in registers 0 to 7 + * @res: result values from registers 0 to 3 + * + * This macro is used to make HVC calls following SMC Calling Convention v1.1. + * The content of the supplied param are copied to registers 0 to 7 prior + * to the HVC instruction. The return values are updated with the content + * from register 0 to 3 on return from the HVC instruction if not NULL. + */ +#define arm_smccc_1_1_hvc(...) __arm_smccc_1_1(SMCCC_HVC_INST, __VA_ARGS__) + #endif /*__ASSEMBLY__*/ #endif /*__LINUX_ARM_SMCCC_H*/
Re: [PATCH v3 16/18] arm/arm64: smccc: Implement SMCCC v1.1 inline primitive
On 01/02/18 13:54, Marc Zyngier wrote: On 01/02/18 13:34, Robin Murphy wrote: On 01/02/18 11:46, Marc Zyngier wrote: One of the major improvement of SMCCC v1.1 is that it only clobbers the first 4 registers, both on 32 and 64bit. This means that it becomes very easy to provide an inline version of the SMC call primitive, and avoid performing a function call to stash the registers that would otherwise be clobbered by SMCCC v1.0. Signed-off-by: Marc Zyngier --- include/linux/arm-smccc.h | 143 ++ 1 file changed, 143 insertions(+) diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h index dd44d8458c04..575aabe85905 100644 --- a/include/linux/arm-smccc.h +++ b/include/linux/arm-smccc.h @@ -150,5 +150,148 @@ asmlinkage void __arm_smccc_hvc(unsigned long a0, unsigned long a1, #define arm_smccc_hvc_quirk(...) __arm_smccc_hvc(__VA_ARGS__) +/* SMCCC v1.1 implementation madness follows */ +#ifdef CONFIG_ARM64 + +#define SMCCC_SMC_INST "smc #0" +#define SMCCC_HVC_INST "hvc #0" Nit: Maybe the argument can go in the template and we just define the instruction mnemonics here? + +#endif + +#ifdef CONFIG_ARM #elif ? Sure, why not. +#include +#include + +#define SMCCC_SMC_INST __SMC(0) +#define SMCCC_HVC_INST __HVC(0) Oh, I see, it was to line up with this :( I do wonder if we could just embed an asm(".arch armv7-a+virt\n") (if even necessary) for ARM, then take advantage of the common mnemonics for all 3 instruction sets instead of needing manual encoding tricks? I don't think we should ever be pulling this file in for non-v7 builds. I suppose that strictly that appears to need binutils 2.21 rather than the offical supported minimum of 2.20, but are people going to be throwing SMCCC configs at antique toolchains in practice? It has been an issue in the past, back when we merged KVM. We settled on a hybrid solution where code outside of KVM would not rely on a newer toolchain, hence the macros that Dave introduced. Maybe we've moved on and we can take that bold step? Either way I think we can happily throw that on the "future cleanup" pile right now as it's not directly relevant to the purpose of the patch; I'm sure we don't want to make potential backporting even more difficult. + +#endif + +#define ___count_args(_0, _1, _2, _3, _4, _5, _6, _7, _8, x, ...) x + +#define __count_args(...) \ + ___count_args(__VA_ARGS__, 7, 6, 5, 4, 3, 2, 1, 0) + +#define __constraint_write_0 \ + "+r" (r0), "=&r" (r1), "=&r" (r2), "=&r" (r3) +#define __constraint_write_1 \ + "+r" (r0), "+r" (r1), "=&r" (r2), "=&r" (r3) +#define __constraint_write_2 \ + "+r" (r0), "+r" (r1), "+r" (r2), "=&r" (r3) +#define __constraint_write_3 \ + "+r" (r0), "+r" (r1), "+r" (r2), "+r" (r3) +#define __constraint_write_4 __constraint_write_3 +#define __constraint_write_5 __constraint_write_4 +#define __constraint_write_6 __constraint_write_5 +#define __constraint_write_7 __constraint_write_6 + +#define __constraint_read_0 +#define __constraint_read_1 +#define __constraint_read_2 +#define __constraint_read_3 +#define __constraint_read_4"r" (r4) +#define __constraint_read_5__constraint_read_4, "r" (r5) +#define __constraint_read_6__constraint_read_5, "r" (r6) +#define __constraint_read_7__constraint_read_6, "r" (r7) + +#define __declare_arg_0(a0, res) \ + struct arm_smccc_res *___res = res; \ Looks like the declaration of ___res could simply be factored out to the template... Tried that. But... + register u32 r0 asm("r0") = a0; \ + register unsigned long r1 asm("r1"); \ + register unsigned long r2 asm("r2"); \ + register unsigned long r3 asm("r3") + +#define __declare_arg_1(a0, a1, res) \ + struct arm_smccc_res *___res = res; \ + register u32 r0 asm("r0") = a0; \ + register typeof(a1)r1 asm("r1") = a1; \ + register unsigned long r2 asm("r2"); \ + register unsigned long r3 asm("r3") + +#define __declare_arg_2(a0, a1, a2, res) \ + str
Re: [PATCH] soc: mediatek: Handle return of of_match_device function
On 01/02/18 11:00, Himanshu Jha wrote: In scpsys_probe function, return value of of_match_device function which returns null is dereferenced without checking. Therefore, add a check for potential null dereference. Detected by CoverityScan, CID#1424087 "Dereference null return value" Fixes: commit 53fddb1a66dd ("soc: mediatek: reduce code duplication of scpsys_probe across all SoCs") Signed-off-by: Himanshu Jha --- drivers/soc/mediatek/mtk-scpsys.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/soc/mediatek/mtk-scpsys.c b/drivers/soc/mediatek/mtk-scpsys.c index 435ce5e..6e7f196 100644 --- a/drivers/soc/mediatek/mtk-scpsys.c +++ b/drivers/soc/mediatek/mtk-scpsys.c @@ -981,6 +981,9 @@ static int scpsys_probe(struct platform_device *pdev) int i, ret; match = of_match_device(of_scpsys_match_tbl, &pdev->dev); + if (!match) + return -EINVAL; + soc = (const struct scp_soc_data *)match->data; You could of course replace the whole sequence with an of_device_get_match_data() call, which happens to be inherently safe against the no-match case even when that *is* impossible by design. Robin. scp = init_scp(pdev, soc->domains, soc->num_domains, &soc->regs,
Re: [PATCH] soc: mediatek: Handle return of of_match_device function
On 01/02/18 15:09, Geert Uytterhoeven wrote: On Thu, Feb 1, 2018 at 4:02 PM, Robin Murphy wrote: On 01/02/18 11:00, Himanshu Jha wrote: In scpsys_probe function, return value of of_match_device function which returns null is dereferenced without checking. Therefore, add a check for potential null dereference. Detected by CoverityScan, CID#1424087 "Dereference null return value" Fixes: commit 53fddb1a66dd ("soc: mediatek: reduce code duplication of scpsys_probe across all SoCs") Signed-off-by: Himanshu Jha --- drivers/soc/mediatek/mtk-scpsys.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/soc/mediatek/mtk-scpsys.c b/drivers/soc/mediatek/mtk-scpsys.c index 435ce5e..6e7f196 100644 --- a/drivers/soc/mediatek/mtk-scpsys.c +++ b/drivers/soc/mediatek/mtk-scpsys.c @@ -981,6 +981,9 @@ static int scpsys_probe(struct platform_device *pdev) int i, ret; match = of_match_device(of_scpsys_match_tbl, &pdev->dev); + if (!match) + return -EINVAL; + soc = (const struct scp_soc_data *)match->data; You could of course replace the whole sequence with an of_device_get_match_data() call, which happens to be inherently safe against the no-match case even when that *is* impossible by design. +1 scp = init_scp(pdev, soc->domains, soc->num_domains, &soc->regs, ... followed by the static analyser gang complaining we may dereference NULL pointer soc... Well, if the static analysers can't track the provenance of dev->driver->of_match_table, let's keep ignoring them until they get cleverer :P Robin. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [PATCH v6 2/2] media: V3s: Add support for Allwinner CSI.
On 31/01/18 09:37, Arnd Bergmann wrote: On Wed, Jan 31, 2018 at 8:29 AM, Maxime Ripard wrote: Hi Thierry, On Tue, Jan 30, 2018 at 11:01:50AM +0100, Thierry Reding wrote: On Tue, Jan 30, 2018 at 10:59:16AM +0100, Thierry Reding wrote: On Tue, Jan 30, 2018 at 10:24:48AM +0100, Arnd Bergmann wrote: On Tue, Jan 30, 2018 at 8:54 AM, Maxime Ripard wrote: On Mon, Jan 29, 2018 at 03:34:02PM +0100, Arnd Bergmann wrote: On Mon, Jan 29, 2018 at 10:25 AM, Linus Walleij wrote: On Mon, Jan 29, 2018 at 9:25 AM, Maxime Ripard wrote: On Sat, Jan 27, 2018 at 05:14:26PM +0100, Linus Walleij wrote: At one point we had discussed adding a 'dma-masters' property that lists all the buses on which a device can be a dma master, and the respective properties of those masters (iommu, coherency, offset, ...). IIRC at the time we decided that we could live without that complexity, but perhaps we cannot. Are you talking about this ? https://elixir.free-electrons.com/linux/latest/source/Documentation/devicetree/bindings/dma/dma.txt#L41 It doesn't seem to be related to that issue to me. And in our particular cases, all the devices are DMA masters, the RAM is just mapped to another address. No, that's not the one I was thinking of. The idea at the time was much more generic, and not limited to dma engines. I don't recall the details, but I think that Thierry was either involved or made the proposal at the time. Yeah, I vaguely remember discussing something like this before. A quick search through my inbox yielded these two threads, mostly related to IOMMU but I think there were some mentions about dma-ranges and so on as well. I'll have to dig deeper into those threads to refresh my memories, but I won't get around to it until later today. If someone wants to read up on this in the meantime, here are the links: https://lkml.org/lkml/2014/4/27/346 http://lists.infradead.org/pipermail/linux-arm-kernel/2014-May/257200.html From a quick glance the issue of dma-ranges was something that we hand- waved at the time. Also found this, which seems to be relevant as well: http://lists.infradead.org/pipermail/linux-arm-kernel/2014-May/252715.html Adding Dave. Thanks for the pointers, I started to read through it. I guess we have to come up with two solutions here: a short term one to address the users we already have in the kernel properly, and a long term one where we could use that discussion as a starting point. For the short term one, could we just set the device dma_pfn_offset to PHYS_OFFSET at probe time, and use our dma_addr_t directly later on, or would this also cause some issues? That would certainly be an improvement over the current version, it keeps the hack more localized. That's fine with me. Note that both PHYS_OFFSET and dma_pfn_offset have architecture specific meanings and they could in theory change, so ideally we'd do that fixup somewhere in arch/arm/mach-sunxi/ at boot time before the driver gets probed, but this wouldn't work on arm64 if we need it there too. dma_pfn_offset certainly *shouldn't* be architecture-specific; it's just historically not been used consistently everywhere. That should now be addressed by Christoph's big dma_direct_ops cleanup (now merged) which fills in the places it was being missed in generic code. From quickly skimming this thread, it sounds like that is really should be sufficient for this particular hardware - if all its DMA goes through an interconnect which makes RAM appear at 0, then you give it "dma-ranges = <0 PHYS_OFFSET size>;" and things should 'just work' provided the DMA API is otherwise being used correctly. If different devices have differing DMA master address maps (such that you can't just place a single dma-ranges on your "soc" node) then the trick is to wrap them in intermediate "simple-bus" nodes with a straight-though "ranges;" and the appropriate "dma-ranges = ...". Yes, it's a bit of a bodge, but then pretending AXI/APB interconnects are buses in the sense that DT assumes is already a bit of a bodge. For the long term plan, from what I read from the discussion, it's mostly centered around IOMMU indeed, and we don't have that. What we would actually need is to break the assumption that the DMA "parent" bus is the DT node's parent bus. And I guess this could be done quite easily by adding a dma-parent property with a phandle to the bus controller, that would have a dma-ranges property of its own with the proper mapping described there. It should be simple enough to support, but is there anything that could prevent something like that to work properly (such as preventing further IOMMU-related developments that were described in those mail threads). I've thought about it a little bit now. A dma-parent property would nicely solve two problems: - a device on a memory mapped control bus that is a bus master on a different bus. This is the case we are talking about here AFAICT - a device that is on a
Re: [PATCH] ARM-SMMU: Delete error messages for a failed memory allocation in three functions
On 20/01/18 14:36, SF Markus Elfring wrote: From: Markus Elfring Date: Sat, 20 Jan 2018 15:30:17 +0100 Omit extra messages for a memory allocation failure in these functions. Why? It's your job as patch author to convince reviewers and maintainers why your change is a good thing and they should spend their time looking at it, much less consider merging it. This may as well be "delete some stuff because I feel like it". Do bear in mind the nature of these drivers; Arm SMMUs are not something you find in microcontrollers. On systems using these drivers, it will make no difference whatsoever to anyone if the many-megabyte kernel image is 47 bytes (or whatever) smaller. This issue was detected by using the Coccinelle software. I think I'm going to have to start treating mention of Coccinelle as a potential disclaimer saying "I haven't attempted to understand the code I'm changing" :( Signed-off-by: Markus Elfring --- drivers/iommu/arm-smmu-v3.c | 9 +++-- drivers/iommu/arm-smmu.c| 9 +++-- 2 files changed, 6 insertions(+), 12 deletions(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index f122071688fd..5c2a7103d494 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -2134,10 +2134,8 @@ static int arm_smmu_init_l1_strtab(struct arm_smmu_device *smmu) void *strtab = smmu->strtab_cfg.strtab; cfg->l1_desc = devm_kzalloc(smmu->dev, size, GFP_KERNEL); - if (!cfg->l1_desc) { - dev_err(smmu->dev, "failed to allocate l1 stream table desc\n"); OK, I'll stop playing *completely* dumb; I do know you would get a splat if kmalloc() ever did fail. But what you're removing isn't printk("failed to allocate memory\n"), it's a message which says exactly what allocation failed *for which device*. Can you clarify how I'm going to diagnose this particular problem from the generic splat when all I have is en email from a customer with a dmesg dump? + if (!cfg->l1_desc) return -ENOMEM; - } for (i = 0; i < cfg->num_l1_ents; ++i) { arm_smmu_write_strtab_l1_desc(strtab, &cfg->l1_desc[i]); @@ -2828,10 +2826,9 @@ static int arm_smmu_device_probe(struct platform_device *pdev) bool bypass; smmu = devm_kzalloc(dev, sizeof(*smmu), GFP_KERNEL); - if (!smmu) { - dev_err(dev, "failed to allocate arm_smmu_device\n"); + if (!smmu) return -ENOMEM; - } + smmu->dev = dev; if (dev->of_node) { diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 78d4c6b8f1ba..a4da4a870a2e 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -2048,10 +2048,9 @@ static int arm_smmu_device_probe(struct platform_device *pdev) int num_irqs, i, err; smmu = devm_kzalloc(dev, sizeof(*smmu), GFP_KERNEL); - if (!smmu) { - dev_err(dev, "failed to allocate arm_smmu_device\n"); + if (!smmu) return -ENOMEM; - } + smmu->dev = dev; if (dev->of_node) @@ -2084,10 +2083,8 @@ static int arm_smmu_device_probe(struct platform_device *pdev) smmu->irqs = devm_kzalloc(dev, sizeof(*smmu->irqs) * num_irqs, GFP_KERNEL); - if (!smmu->irqs) { - dev_err(dev, "failed to allocate %d irqs\n", num_irqs); This more than any other is removing potentially useful information: "failed to allocate 37890756 irqs", for instance, would indicate a bug which is very much *not* an out-of-memory condition. Robin. + if (!smmu->irqs) return -ENOMEM; - } for (i = 0; i < num_irqs; ++i) { int irq = platform_get_irq(pdev, i);
Re: [PATCH] iommu/of: Only do IOMMU lookup for available ones
Hi Jeffy, On 03/01/18 06:09, Jeffy Chen wrote: The for_each_matching_node_and_match() would return every matching nodes including unavailable ones. It's pointless to init unavailable IOMMUs, so add a sanity check to avoid that. Even better would be to clean up the last remaining init_fn user and get rid of the whole business. With the probe-deferral mechanism, early initialisation hooks are no longer needed, and the IOMMU_OF_DECLARE section really only remains as a way of detecting builtin drivers long before their registration initcalls run. Robin. Signed-off-by: Jeffy Chen --- drivers/iommu/of_iommu.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c index 50947ebb6d17..6f7456caa30d 100644 --- a/drivers/iommu/of_iommu.c +++ b/drivers/iommu/of_iommu.c @@ -240,6 +240,9 @@ static int __init of_iommu_init(void) for_each_matching_node_and_match(np, matches, &match) { const of_iommu_init_fn init_fn = match->data; + if (!of_device_is_available(np)) + continue; + if (init_fn && init_fn(np)) pr_err("Failed to initialise IOMMU %pOF\n", np); }
Re: [PATCH] ARM: realview: remove eb-mp clcd IRQ
Hi Linus, On 21/12/17 22:08, Linus Walleij wrote: On Thu, Dec 21, 2017 at 10:31 PM, Arnd Bergmann wrote: We get a dtc warning about the CLCD interrupt being invalid: arch/arm/boot/dts/arm-realview-eb-11mp-ctrevb.dtb: Warning (interrupts_property): interrupts size is (8), expected multiple of 12 in /fpga/charlcd@10008000 According to the datasheet I found and the old board file, this line is not connected, so I'm removing the respective properties here. Link: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0411d/index.html Signed-off-by: Arnd Bergmann There is some confusion here. There is CLCD "Color LCD" which is just a code name for PrimeCell PL111 and there is the actual character LCD which is a hardware thin to talk to a character LCD with some characters on. diff --git a/arch/arm/boot/dts/arm-realview-eb-mp.dtsi b/arch/arm/boot/dts/arm-realview-eb-mp.dtsi So this DTS is for the ARM 11 MP core tile which is described in DUI0318F. It doesn't even list an IRQ for the character LCD. &charlcd { - interrupt-parent = <&intc>; - interrupts = <0 IRQ_TYPE_LEVEL_HIGH>; This was probably me thinking to go back and fill in the right IRQ and forgetting to actually do it. Sorry :( + /* CLCD is not connected here */ Call it character LCD instead to avoid confusion please. + /delete-property/interrupt-parent; + /delete-property/interrupts; I don't understand this delete-property business (first time I see it, but the top level DTSI (arm-realview-eb.dtsi) does not define any interrupt so can't you just delete this whole &charlcd? I do think the reference design has a character LCD, and I do think it has an interrupt, it's just undocumented so someone with this board would have to test it manually to figure out which line it is. Whoever uses this design will get to it if ever. FWIW the EB baseboard is *physically* the same regardless of the CPU, it's just flashed with a Core-Tile-specific FPGA bitstream. I've just tried firing up an 11MPCore one, and indeed the character LCD does light up with the kernel version. I can't convince the recalcitrant beast to actually get to userspace, though, so I can't confirm what the interrupt's deal is. The baseboard manual (DUI0303E) says it's interrupt 22 on the board-level secondary GICs, and since neither the CT11MP nor its corresponding FPGA (AN152) mention any alternate routing direct to the Core Tile GIC, I'd guess it probably still is. On the other hand, though, it also says this: "... However this interrupt signal is reserved for future use and you must use a polling routine instead of an interrupt service routine." So maybe it's appropriate to just remove the interrupt everywhere :/ Robin.