[PATCH v6 0/6] arm64: dts: NXP: add basic dts file for LX2160A SoC
Changes for v6: - Added comment for clock unit-sysclk node name in SoC device tree Changes for v5: - Updated temperature sensor regulator name in board device tree - Sorted nodes alphabatically and unit-address in SoC/board device tree - Identation, new line update in SoC/board device tree - Updated nodes name as per DT spec generic name recommendation in SoC DT - Updated macro define for interrupt/gpio property - Updated i2c node property name scl-gpio - Removed device_type property except cpu/memory node - Added esdhc controller nodes in SoC/RDB board device tree - Added aliases for uart/crypto nodes - Add SoC die attribute definition for LX2160A Changes for v4: - Updated bindings for lx2160a clockgen and dcfg - Modified commit message for lx2160a clockgen changes - Updated interrupt property with macro definition - Added required enable-method property to each core node with psci value - Removed unused node syscon in device tree - Removed blank lines in device tree fsl-lx2160a.dtsi - Updated uart node compatible sbsa-uart first - Added and defined vcc-supply property to temperature sensor node in device tree fsl-lx2160a-rdb.dts Changes for v3: -Split clockgen support patch into below two patches: - a)Updated array size of cmux_to_group[] with NUM_CMUX+1 to include -1 terminator and p4080 cmux_to_group[] array with -1 terminator - b)Add clockgen support for lx2160a Changes for v2: - Modified cmux_to_group array to include -1 terminator - Revert NUM_CMUX to original value 8 from 16 - Remove â LX2160A is 16 core, so modified value for NUM_CMUXX in patch "[PATCH 3/5] drivers: clk-qoriq: Add clockgen support for lx2160a" description - Populated cache properties for L1 and L2 cache in lx2160a device-tree. - Removed reboot node from lx2160a device-tree as PSCI is implemented. - Removed incorrect comment for timer node interrupt property in lx2160a device-tree. - Modified pmu node compatible property from "arm,armv8-pmuv3" to "arm,cortex-a72-pmu" in lx2160a device-tree - Non-standard aliases removed in lx2160a rdb board device-tree - Updated i2c child nodes to generic name in lx2160a rdb device-tree. Changes for v1: - Add compatible string for LX2160A clockgen support - Add compatible string to initialize LX2160A guts driver - Add compatible string for LX2160A support in dt-bindings - Add dts file to enable support for LX2160A SoC and LX2160A RDB (Reference design board) Vabhav Sharma (4): dt-bindings: arm64: add compatible for LX2160A soc/fsl/guts: Add definition for LX2160A arm64: dts: add QorIQ LX2160A SoC support arm64: dts: add LX2160ARDB board support Yogesh Gaur (2): clk: qoriq: increase array size of cmux_to_group clk: qoriq: Add clockgen support for lx2160a Documentation/devicetree/bindings/arm/fsl.txt | 14 +- .../devicetree/bindings/clock/qoriq-clock.txt | 1 + arch/arm64/boot/dts/freescale/Makefile | 1 + arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts | 119 arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi | 766 + drivers/clk/clk-qoriq.c| 16 +- drivers/cpufreq/qoriq-cpufreq.c| 1 + drivers/soc/fsl/guts.c | 6 + 8 files changed, 921 insertions(+), 3 deletions(-) create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi -- 2.7.4
[PATCH v6 1/6] dt-bindings: arm64: add compatible for LX2160A
Add compatible for LX2160A SoC,QDS and RDB board Add lx2160a compatible for clockgen and dcfg Signed-off-by: Vabhav Sharma Reviewed-by: Rob Herring --- Documentation/devicetree/bindings/arm/fsl.txt | 14 +- Documentation/devicetree/bindings/clock/qoriq-clock.txt | 1 + 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/arm/fsl.txt b/Documentation/devicetree/bindings/arm/fsl.txt index 8a1baa2..71adce2 100644 --- a/Documentation/devicetree/bindings/arm/fsl.txt +++ b/Documentation/devicetree/bindings/arm/fsl.txt @@ -130,7 +130,7 @@ core start address and release the secondary core from holdoff and startup. - compatible: Should contain a chip-specific compatible string, Chip-specific strings are of the form "fsl,-dcfg", The following s are known to be supported: - ls1012a, ls1021a, ls1043a, ls1046a, ls2080a. + ls1012a, ls1021a, ls1043a, ls1046a, ls2080a, lx2160a. - reg : should contain base address and length of DCFG memory-mapped registers @@ -222,3 +222,15 @@ Required root node properties: LS2088A ARMv8 based RDB Board Required root node properties: - compatible = "fsl,ls2088a-rdb", "fsl,ls2088a"; + +LX2160A SoC +Required root node properties: +- compatible = "fsl,lx2160a"; + +LX2160A ARMv8 based QDS Board +Required root node properties: +- compatible = "fsl,lx2160a-qds", "fsl,lx2160a"; + +LX2160A ARMv8 based RDB Board +Required root node properties: +- compatible = "fsl,lx2160a-rdb", "fsl,lx2160a"; diff --git a/Documentation/devicetree/bindings/clock/qoriq-clock.txt b/Documentation/devicetree/bindings/clock/qoriq-clock.txt index 97f46ad..3fb9995 100644 --- a/Documentation/devicetree/bindings/clock/qoriq-clock.txt +++ b/Documentation/devicetree/bindings/clock/qoriq-clock.txt @@ -37,6 +37,7 @@ Required properties: * "fsl,ls1046a-clockgen" * "fsl,ls1088a-clockgen" * "fsl,ls2080a-clockgen" + * "fsl,lx2160a-clockgen" Chassis-version clock strings include: * "fsl,qoriq-clockgen-1.0": for chassis 1.0 clocks * "fsl,qoriq-clockgen-2.0": for chassis 2.0 clocks -- 2.7.4
[PATCH v6 2/6] soc/fsl/guts: Add definition for LX2160A
Adding compatible string "lx2160a-dcfg" to initialize guts driver for lx2160 and SoC die attribute definition for LX2160A Signed-off-by: Vabhav Sharma Signed-off-by: Yinbo Zhu Acked-by: Li Yang --- drivers/soc/fsl/guts.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/soc/fsl/guts.c b/drivers/soc/fsl/guts.c index 302e0c8..bcab1ee 100644 --- a/drivers/soc/fsl/guts.c +++ b/drivers/soc/fsl/guts.c @@ -100,6 +100,11 @@ static const struct fsl_soc_die_attr fsl_soc_die[] = { .svr = 0x8700, .mask = 0xfff7, }, + /* Die: LX2160A, SoC: LX2160A/LX2120A/LX2080A */ + { .die = "LX2160A", + .svr = 0x8736, + .mask = 0xff3f, + }, { }, }; @@ -222,6 +227,7 @@ static const struct of_device_id fsl_guts_of_match[] = { { .compatible = "fsl,ls1088a-dcfg", }, { .compatible = "fsl,ls1012a-dcfg", }, { .compatible = "fsl,ls1046a-dcfg", }, + { .compatible = "fsl,lx2160a-dcfg", }, {} }; MODULE_DEVICE_TABLE(of, fsl_guts_of_match); -- 2.7.4
[PATCH v6 4/6] clk: qoriq: Add clockgen support for lx2160a
From: Yogesh Gaur Add clockgen support for lx2160a. Added entry for compat 'fsl,lx2160a-clockgen'. Signed-off-by: Tang Yuantian Signed-off-by: Yogesh Gaur Signed-off-by: Vabhav Sharma Acked-by: Stephen Boyd Acked-by: Viresh Kumar --- drivers/clk/clk-qoriq.c | 12 drivers/cpufreq/qoriq-cpufreq.c | 1 + 2 files changed, 13 insertions(+) diff --git a/drivers/clk/clk-qoriq.c b/drivers/clk/clk-qoriq.c index e152bfb..99675de 100644 --- a/drivers/clk/clk-qoriq.c +++ b/drivers/clk/clk-qoriq.c @@ -570,6 +570,17 @@ static const struct clockgen_chipinfo chipinfo[] = { .flags = CG_VER3 | CG_LITTLE_ENDIAN, }, { + .compat = "fsl,lx2160a-clockgen", + .cmux_groups = { + &clockgen2_cmux_cga12, &clockgen2_cmux_cgb + }, + .cmux_to_group = { + 0, 0, 0, 0, 1, 1, 1, 1, -1 + }, + .pll_mask = 0x37, + .flags = CG_VER3 | CG_LITTLE_ENDIAN, + }, + { .compat = "fsl,p2041-clockgen", .guts_compat = "fsl,qoriq-device-config-1.0", .init_periph = p2041_init_periph, @@ -1424,6 +1435,7 @@ CLK_OF_DECLARE(qoriq_clockgen_ls1043a, "fsl,ls1043a-clockgen", clockgen_init); CLK_OF_DECLARE(qoriq_clockgen_ls1046a, "fsl,ls1046a-clockgen", clockgen_init); CLK_OF_DECLARE(qoriq_clockgen_ls1088a, "fsl,ls1088a-clockgen", clockgen_init); CLK_OF_DECLARE(qoriq_clockgen_ls2080a, "fsl,ls2080a-clockgen", clockgen_init); +CLK_OF_DECLARE(qoriq_clockgen_lx2160a, "fsl,lx2160a-clockgen", clockgen_init); /* Legacy nodes */ CLK_OF_DECLARE(qoriq_sysclk_1, "fsl,qoriq-sysclk-1.0", sysclk_init); diff --git a/drivers/cpufreq/qoriq-cpufreq.c b/drivers/cpufreq/qoriq-cpufreq.c index 3d773f6..83921b7 100644 --- a/drivers/cpufreq/qoriq-cpufreq.c +++ b/drivers/cpufreq/qoriq-cpufreq.c @@ -295,6 +295,7 @@ static const struct of_device_id node_matches[] __initconst = { { .compatible = "fsl,ls1046a-clockgen", }, { .compatible = "fsl,ls1088a-clockgen", }, { .compatible = "fsl,ls2080a-clockgen", }, + { .compatible = "fsl,lx2160a-clockgen", }, { .compatible = "fsl,p4080-clockgen", }, { .compatible = "fsl,qoriq-clockgen-1.0", }, { .compatible = "fsl,qoriq-clockgen-2.0", }, -- 2.7.4
[PATCH v6 5/6] arm64: dts: add QorIQ LX2160A SoC support
LX2160A SoC is based on Layerscape Chassis Generation 3.2 Architecture. LX2160A features an advanced 16 64-bit ARM v8 CortexA72 processor cores in 8 cluster, CCN508, GICv3,two 64-bit DDR4 memory controller, 8 I2C controllers, 3 dspi, 2 esdhc,2 USB 3.0, mmu 500, 3 SATA, 4 PL011 SBSA UARTs etc. Signed-off-by: Ramneek Mehresh Signed-off-by: Zhang Ying-22455 Signed-off-by: Nipun Gupta Signed-off-by: Priyanka Jain Signed-off-by: Yogesh Gaur Signed-off-by: Sriram Dash Signed-off-by: Vabhav Sharma Signed-off-by: Horia Geanta Signed-off-by: Ran Wang Signed-off-by: Yinbo Zhu --- arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi | 766 + 1 file changed, 766 insertions(+) create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi new file mode 100644 index 000..9fcfd48 --- /dev/null +++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi @@ -0,0 +1,766 @@ +// SPDX-License-Identifier: (GPL-2.0 OR MIT) +// +// Device Tree Include file for Layerscape-LX2160A family SoC. +// +// Copyright 2018 NXP + +#include +#include + +/memreserve/ 0x8000 0x0001; + +/ { + compatible = "fsl,lx2160a"; + interrupt-parent = <&gic>; + #address-cells = <2>; + #size-cells = <2>; + + cpus { + #address-cells = <1>; + #size-cells = <0>; + + // 8 clusters having 2 Cortex-A72 cores each + cpu@0 { + device_type = "cpu"; + compatible = "arm,cortex-a72"; + enable-method = "psci"; + reg = <0x0>; + clocks = <&clockgen 1 0>; + d-cache-size = <0x8000>; + d-cache-line-size = <64>; + d-cache-sets = <128>; + i-cache-size = <0xC000>; + i-cache-line-size = <64>; + i-cache-sets = <192>; + next-level-cache = <&cluster0_l2>; + }; + + cpu@1 { + device_type = "cpu"; + compatible = "arm,cortex-a72"; + enable-method = "psci"; + reg = <0x1>; + clocks = <&clockgen 1 0>; + d-cache-size = <0x8000>; + d-cache-line-size = <64>; + d-cache-sets = <128>; + i-cache-size = <0xC000>; + i-cache-line-size = <64>; + i-cache-sets = <192>; + next-level-cache = <&cluster0_l2>; + }; + + cpu@100 { + device_type = "cpu"; + compatible = "arm,cortex-a72"; + enable-method = "psci"; + reg = <0x100>; + clocks = <&clockgen 1 1>; + d-cache-size = <0x8000>; + d-cache-line-size = <64>; + d-cache-sets = <128>; + i-cache-size = <0xC000>; + i-cache-line-size = <64>; + i-cache-sets = <192>; + next-level-cache = <&cluster1_l2>; + }; + + cpu@101 { + device_type = "cpu"; + compatible = "arm,cortex-a72"; + enable-method = "psci"; + reg = <0x101>; + clocks = <&clockgen 1 1>; + d-cache-size = <0x8000>; + d-cache-line-size = <64>; + d-cache-sets = <128>; + i-cache-size = <0xC000>; + i-cache-line-size = <64>; + i-cache-sets = <192>; + next-level-cache = <&cluster1_l2>; + }; + + cpu@200 { + device_type = "cpu"; + compatible = "arm,cortex-a72"; + enable-method = "psci"; + reg = <0x200>; + clocks = <&clockgen 1 2>; + d-cache-size = <0x8000>; + d-cache-line-size = <64>; + d-cache-sets = <128>; + i-cache-size = <0xC000>; + i-cache-line-size = <64>; + i-cache-sets = <192>; + next-level-cache = <&cluster2_l2>; + }; + + cpu@201 { + device_type = "cpu"; + compatible = "arm,cortex-a72"; + enable-method = "psci"; + reg = <0x201>; + clocks = <&clockgen 1 2>; + d-cache-size = <0x8000>; + d-c
[PATCH v6 3/6] clk: qoriq: increase array size of cmux_to_group
From: Yogesh Gaur Increase size of cmux_to_group array, to accomdate entry of -1 termination. Added -1, terminated, entry for 4080_cmux_grpX. Signed-off-by: Yogesh Gaur Signed-off-by: Vabhav Sharma Acked-by: Stephen Boyd --- drivers/clk/clk-qoriq.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/clk/clk-qoriq.c b/drivers/clk/clk-qoriq.c index 3a1812f..e152bfb 100644 --- a/drivers/clk/clk-qoriq.c +++ b/drivers/clk/clk-qoriq.c @@ -79,7 +79,7 @@ struct clockgen_chipinfo { const struct clockgen_muxinfo *cmux_groups[2]; const struct clockgen_muxinfo *hwaccel[NUM_HWACCEL]; void (*init_periph)(struct clockgen *cg); - int cmux_to_group[NUM_CMUX]; /* -1 terminates if fewer than NUM_CMUX */ + int cmux_to_group[NUM_CMUX+1]; /* array should be -1 terminated */ u32 pll_mask; /* 1 << n bit set if PLL n is valid */ u32 flags; /* CG_xxx */ }; @@ -601,7 +601,7 @@ static const struct clockgen_chipinfo chipinfo[] = { &p4080_cmux_grp1, &p4080_cmux_grp2 }, .cmux_to_group = { - 0, 0, 0, 0, 1, 1, 1, 1 + 0, 0, 0, 0, 1, 1, 1, 1, -1 }, .pll_mask = 0x1f, }, -- 2.7.4
[PATCH v6 6/6] arm64: dts: add LX2160ARDB board support
LX2160A reference design board (RDB) is a high-performance computing, evaluation, and development platform with LX2160A SoC. Signed-off-by: Priyanka Jain Signed-off-by: Sriram Dash Signed-off-by: Vabhav Sharma Signed-off-by: Horia Geanta Signed-off-by: Ran Wang Signed-off-by: Zhang Ying-22455 Signed-off-by: Yinbo Zhu Acked-by: Li Yang --- arch/arm64/boot/dts/freescale/Makefile| 1 + arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts | 119 ++ 2 files changed, 120 insertions(+) create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts diff --git a/arch/arm64/boot/dts/freescale/Makefile b/arch/arm64/boot/dts/freescale/Makefile index 86e18ad..445b72b 100644 --- a/arch/arm64/boot/dts/freescale/Makefile +++ b/arch/arm64/boot/dts/freescale/Makefile @@ -13,3 +13,4 @@ dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2080a-rdb.dtb dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2080a-simu.dtb dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2088a-qds.dtb dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2088a-rdb.dtb +dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-lx2160a-rdb.dtb diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts b/arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts new file mode 100644 index 000..6481e5f --- /dev/null +++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts @@ -0,0 +1,119 @@ +// SPDX-License-Identifier: (GPL-2.0 OR MIT) +// +// Device Tree file for LX2160ARDB +// +// Copyright 2018 NXP + +/dts-v1/; + +#include "fsl-lx2160a.dtsi" + +/ { + model = "NXP Layerscape LX2160ARDB"; + compatible = "fsl,lx2160a-rdb", "fsl,lx2160a"; + + aliases { + crypto = &crypto; + serial0 = &uart0; + }; + + chosen { + stdout-path = "serial0:115200n8"; + }; + + sb_3v3: regulator-sb3v3 { + compatible = "regulator-fixed"; + regulator-name = "MC34717-3.3VSB"; + regulator-min-microvolt = <330>; + regulator-max-microvolt = <330>; + regulator-boot-on; + regulator-always-on; + }; +}; + +&crypto { + status = "okay"; +}; + +&esdhc0 { + sd-uhs-sdr104; + sd-uhs-sdr50; + sd-uhs-sdr25; + sd-uhs-sdr12; + status = "okay"; +}; + +&esdhc1 { + mmc-hs200-1_8v; + mmc-hs400-1_8v; + bus-width = <8>; + status = "okay"; +}; + +&i2c0 { + status = "okay"; + + i2c-mux@77 { + compatible = "nxp,pca9547"; + reg = <0x77>; + #address-cells = <1>; + #size-cells = <0>; + + i2c@2 { + #address-cells = <1>; + #size-cells = <0>; + reg = <0x2>; + + power-monitor@40 { + compatible = "ti,ina220"; + reg = <0x40>; + shunt-resistor = <1000>; + }; + }; + + i2c@3 { + #address-cells = <1>; + #size-cells = <0>; + reg = <0x3>; + + temperature-sensor@4c { + compatible = "nxp,sa56004"; + reg = <0x4c>; + vcc-supply = <&sb_3v3>; + }; + + temperature-sensor@4d { + compatible = "nxp,sa56004"; + reg = <0x4d>; + vcc-supply = <&sb_3v3>; + }; + }; + }; +}; + +&i2c4 { + status = "okay"; + + rtc@51 { + compatible = "nxp,pcf2129"; + reg = <0x51>; + // IRQ10_B + interrupts = <0 150 0x4>; + }; +}; + +&uart0 { + status = "okay"; +}; + +&uart1 { + status = "okay"; +}; + +&usb0 { + status = "okay"; +}; + +&usb1 { + status = "okay"; +}; -- 2.7.4
Re: [PATCH] seccomp: Add pkru into seccomp_data
* Michael Sammler: > Thank you for the pointer about the POWER implementation. I am not > familiar with POWER in general and its protection key feature at > all. Would the AMR register be the correct register to expose here? Yes, according to my notes, the register is called AMR (special purpose register 13). > I understand your concern about exposing the number of protection keys > in the ABI. One idea would be to state, that the pkru field (which > should probably be renamed) contains an architecture specific value, > which could then be the PKRU on x86 and AMR (or another register) on > POWER. This new field should probably be extended to __u64 and the > reserved field removed. POWER also has proper read/write bit separation, not PKEY_DISABLE_ACCESS (disable read and write) and PKEY_DISABLE_WRITE like Intel. It's currently translated by the kernel, but I really need a PKEY_DISABLE_READ bit in glibc to implement pkey_get in case the memory is write-only. > Another idea would be to not add a field in the seccomp_data > structure, but instead provide a new BPF instruction, which reads the > value of a specified protection key. I would prefer that if it's possible. We should make sure that the bits are the same as those returned from pkey_get. I have an implementation on POWER, but have yet to figure out the implications for 32-bit because I do not know the AMR register size there. Thanks, Florian
Re: [PATCH v2 0/2] arm64: Cut rebuild time when changing CONFIG_BLK_DEV_INITRD
On Wed, Oct 24, 2018 at 02:55:17PM -0500, Rob Herring wrote: > On Wed, Oct 24, 2018 at 2:33 PM Florian Fainelli wrote: > > > > Hi all, > > > > While investigating why ARM64 required a ton of objects to be rebuilt > > when toggling CONFIG_DEV_BLK_INITRD, it became clear that this was > > because we define __early_init_dt_declare_initrd() differently and we do > > that in arch/arm64/include/asm/memory.h which gets included by a fair > > amount of other header files, and translation units as well. > > I scratch my head sometimes as to why some config options rebuild so > much stuff. One down, ? to go. :) > > > Changing the value of CONFIG_DEV_BLK_INITRD is a common thing with build > > systems that generate two kernels: one with the initramfs and one > > without. buildroot is one of these build systems, OpenWrt is also > > another one that does this. > > > > This patch series proposes adding an empty initrd.h to satisfy the need > > for drivers/of/fdt.c to unconditionally include that file, and moves the > > custom __early_init_dt_declare_initrd() definition away from > > asm/memory.h > > > > This cuts the number of objects rebuilds from 1920 down to 26, so a > > factor 73 approximately. > > > > Apologies for the long CC list, please let me know how you would go > > about merging that and if another approach would be preferable, e.g: > > introducing a CONFIG_ARCH_INITRD_BELOW_START_OK Kconfig option or > > something like that. > > There may be a better way as of 4.20 because bootmem is now gone and > only memblock is used. This should unify what each arch needs to do > with initrd early. We need the physical address early for memblock > reserving. Then later on we need the virtual address to access the > initrd. Perhaps we should just change initrd_start and initrd_end to > physical addresses (or add 2 new variables would be less invasive and > allow for different translation than __va()). The sanity checks and > memblock reserve could also perhaps be moved to a common location. > > Alternatively, given arm64 is the only oddball, I'd be fine with an > "if (IS_ENABLED(CONFIG_ARM64))" condition in the default > __early_init_dt_declare_initrd as long as we have a path to removing > it like the above option. I think arm64 does not have to redefine __early_init_dt_declare_initrd(). Something like this might be just all we need (completely untested, probably it won't even compile): diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 9d9582c..e9ca238 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -62,6 +62,9 @@ s64 memstart_addr __ro_after_init = -1; phys_addr_t arm64_dma_phys_limit __ro_after_init; #ifdef CONFIG_BLK_DEV_INITRD + +static phys_addr_t initrd_start_phys, initrd_end_phys; + static int __init early_initrd(char *p) { unsigned long start, size; @@ -71,8 +74,8 @@ static int __init early_initrd(char *p) if (*endp == ',') { size = memparse(endp + 1, NULL); - initrd_start = start; - initrd_end = start + size; + initrd_start_phys = start; + initrd_end_phys = end; } return 0; } @@ -407,14 +410,27 @@ void __init arm64_memblock_init(void) memblock_add(__pa_symbol(_text), (u64)(_end - _text)); } - if (IS_ENABLED(CONFIG_BLK_DEV_INITRD) && initrd_start) { + if (IS_ENABLED(CONFIG_BLK_DEV_INITRD) && + (initrd_start || initrd_start_phys)) { + /* +* FIXME: ensure proper precendence between +* early_initrd and DT when both are present +*/ + if (initrd_start) { + initrd_start_phys = __phys_to_virt(initrd_start); + initrd_end_phys = __phys_to_virt(initrd_end); + } else if (initrd_start_phys) { + initrd_start = __va(initrd_start_phys); + initrd_end = __va(initrd_start_phys); + } + /* * Add back the memory we just removed if it results in the * initrd to become inaccessible via the linear mapping. * Otherwise, this is a no-op */ - u64 base = initrd_start & PAGE_MASK; - u64 size = PAGE_ALIGN(initrd_end) - base; + u64 base = initrd_start_phys & PAGE_MASK; + u64 size = PAGE_ALIGN(initrd_end_phys) - base; /* * We can only add back the initrd memory if we don't end up @@ -458,7 +474,7 @@ void __init arm64_memblock_init(void) * pagetables with memblock. */ memblock_reserve(__pa_symbol(_text), _end - _text); -#ifdef CONFIG_BLK_DEV_INITRD +#if 0 if (initrd_start) { memblock_reserve(initrd_start, initrd_end - initrd_start); > Rob > -- Sincerely yours, Mike.
Re: [PATCH v2 0/2] arm64: Cut rebuild time when changing CONFIG_BLK_DEV_INITRD
On Thu, Oct 25, 2018 at 10:38:34AM +0100, Mike Rapoport wrote: > On Wed, Oct 24, 2018 at 02:55:17PM -0500, Rob Herring wrote: > > On Wed, Oct 24, 2018 at 2:33 PM Florian Fainelli > > wrote: > > > > > > Hi all, > > > > > > While investigating why ARM64 required a ton of objects to be rebuilt > > > when toggling CONFIG_DEV_BLK_INITRD, it became clear that this was > > > because we define __early_init_dt_declare_initrd() differently and we do > > > that in arch/arm64/include/asm/memory.h which gets included by a fair > > > amount of other header files, and translation units as well. > > > > I scratch my head sometimes as to why some config options rebuild so > > much stuff. One down, ? to go. :) > > > > > Changing the value of CONFIG_DEV_BLK_INITRD is a common thing with build > > > systems that generate two kernels: one with the initramfs and one > > > without. buildroot is one of these build systems, OpenWrt is also > > > another one that does this. > > > > > > This patch series proposes adding an empty initrd.h to satisfy the need > > > for drivers/of/fdt.c to unconditionally include that file, and moves the > > > custom __early_init_dt_declare_initrd() definition away from > > > asm/memory.h > > > > > > This cuts the number of objects rebuilds from 1920 down to 26, so a > > > factor 73 approximately. > > > > > > Apologies for the long CC list, please let me know how you would go > > > about merging that and if another approach would be preferable, e.g: > > > introducing a CONFIG_ARCH_INITRD_BELOW_START_OK Kconfig option or > > > something like that. > > > > There may be a better way as of 4.20 because bootmem is now gone and > > only memblock is used. This should unify what each arch needs to do > > with initrd early. We need the physical address early for memblock > > reserving. Then later on we need the virtual address to access the > > initrd. Perhaps we should just change initrd_start and initrd_end to > > physical addresses (or add 2 new variables would be less invasive and > > allow for different translation than __va()). The sanity checks and > > memblock reserve could also perhaps be moved to a common location. > > > > Alternatively, given arm64 is the only oddball, I'd be fine with an > > "if (IS_ENABLED(CONFIG_ARM64))" condition in the default > > __early_init_dt_declare_initrd as long as we have a path to removing > > it like the above option. > > I think arm64 does not have to redefine __early_init_dt_declare_initrd(). > Something like this might be just all we need (completely untested, > probably it won't even compile): The alternative solution would be to replace initrd_start/initrd_end with physical address versions of these everywhere - that's what we're passed from DT, it's what 32-bit ARM would prefer, and seemingly what 64-bit ARM would also like as well. Grepping for initrd_start in arch/*/mm shows that there's lots of architectures that have virtual/physical conversions on these, and a number that have obviously been derived from 32-bit ARM's approach (with maintaining a phys_initrd_start variable to simplify things). -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up
Re: [PATCH 1/4] treewide: remove unused address argument from pte_alloc functions (v2)
On Wed, Oct 24, 2018 at 10:37:16AM +0200, Peter Zijlstra wrote: > On Fri, Oct 12, 2018 at 06:31:57PM -0700, Joel Fernandes (Google) wrote: > > This series speeds up mremap(2) syscall by copying page tables at the > > PMD level even for non-THP systems. There is concern that the extra > > 'address' argument that mremap passes to pte_alloc may do something > > subtle architecture related in the future that may make the scheme not > > work. Also we find that there is no point in passing the 'address' to > > pte_alloc since its unused. So this patch therefore removes this > > argument tree-wide resulting in a nice negative diff as well. Also > > ensuring along the way that the enabled architectures do not do anything > > funky with 'address' argument that goes unnoticed by the optimization. > > Did you happen to look at the history of where that address argument > came from? -- just being curious here. ISTR something vague about > architectures having different paging structure for different memory > ranges. I see some archicetures (i.e. sparc and, I believe power) used the address for coloring. It's not needed anymore. Page allocator and SL?B are good enough now. See 3c936465249f ("[SPARC64]: Kill pgtable quicklists and use SLAB.") -- Kirill A. Shutemov
[PATCH 3/6] PCI: layerscape: Add the EP mode support
Add the EP mode support. Signed-off-by: Xiaowei Bao --- .../devicetree/bindings/pci/layerscape-pci.txt |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/Documentation/devicetree/bindings/pci/layerscape-pci.txt b/Documentation/devicetree/bindings/pci/layerscape-pci.txt index 66df1e8..d3d7be1 100644 --- a/Documentation/devicetree/bindings/pci/layerscape-pci.txt +++ b/Documentation/devicetree/bindings/pci/layerscape-pci.txt @@ -13,12 +13,15 @@ information. Required properties: - compatible: should contain the platform identifier such as: + RC mode: "fsl,ls1021a-pcie", "snps,dw-pcie" "fsl,ls2080a-pcie", "fsl,ls2085a-pcie", "snps,dw-pcie" "fsl,ls2088a-pcie" "fsl,ls1088a-pcie" "fsl,ls1046a-pcie" "fsl,ls1012a-pcie" + EP mode: +"fsl,ls-pcie-ep" - reg: base addresses and lengths of the PCIe controller register blocks. - interrupts: A list of interrupt outputs of the controller. Must contain an entry for each entry in the interrupt-names property. -- 1.7.1
[PATCH 2/6] ARM: dts: ls1021a: Add the status property disable PCIe
Add the status property disable the PCIe, the property will be enable by bootloader. Signed-off-by: Xiaowei Bao --- arch/arm/boot/dts/ls1021a.dtsi |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/arch/arm/boot/dts/ls1021a.dtsi b/arch/arm/boot/dts/ls1021a.dtsi index bdd6e66..b769e0e 100644 --- a/arch/arm/boot/dts/ls1021a.dtsi +++ b/arch/arm/boot/dts/ls1021a.dtsi @@ -736,6 +736,7 @@ < 0 0 2 &gic GIC_SPI 188 IRQ_TYPE_LEVEL_HIGH>, < 0 0 3 &gic GIC_SPI 190 IRQ_TYPE_LEVEL_HIGH>, < 0 0 4 &gic GIC_SPI 192 IRQ_TYPE_LEVEL_HIGH>; + status = "disabled"; }; pcie@350 { @@ -759,6 +760,7 @@ < 0 0 2 &gic GIC_SPI 189 IRQ_TYPE_LEVEL_HIGH>, < 0 0 3 &gic GIC_SPI 191 IRQ_TYPE_LEVEL_HIGH>, < 0 0 4 &gic GIC_SPI 193 IRQ_TYPE_LEVEL_HIGH>; + status = "disabled"; }; can0: can@2a7 { -- 1.7.1
[PATCH 6/6] misc: pci_endpoint_test: Add the layerscape EP device support
Add the layerscape EP device support in pci_endpoint_test driver. Signed-off-by: Xiaowei Bao --- drivers/misc/pci_endpoint_test.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c index 896e2df..744d10c 100644 --- a/drivers/misc/pci_endpoint_test.c +++ b/drivers/misc/pci_endpoint_test.c @@ -788,6 +788,8 @@ static void pci_endpoint_test_remove(struct pci_dev *pdev) static const struct pci_device_id pci_endpoint_test_tbl[] = { { PCI_DEVICE(PCI_VENDOR_ID_TI, PCI_DEVICE_ID_TI_DRA74x) }, { PCI_DEVICE(PCI_VENDOR_ID_TI, PCI_DEVICE_ID_TI_DRA72x) }, + /* 0x81c0: The device id of ls1046a in NXP. */ + { PCI_DEVICE(PCI_VENDOR_ID_FREESCALE, 0x81c0) }, { PCI_DEVICE(PCI_VENDOR_ID_SYNOPSYS, 0xedda) }, { } }; -- 1.7.1
[PATCH 4/6] arm64: dts: Add the PCIE EP node in dts
Add the PCIE EP node in dts for ls1046a. Signed-off-by: Xiaowei Bao --- arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 32 1 files changed, 32 insertions(+), 0 deletions(-) diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi index 64d334c..08b4f08 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi +++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi @@ -655,6 +655,17 @@ status = "disabled"; }; + pcie_ep@340 { + compatible = "fsl,ls-pcie-ep"; + reg = <0x00 0x0340 0x0 0x0010 + 0x40 0x 0x8 0x>; + reg-names = "regs", "addr_space"; + num-ib-windows = <6>; + num-ob-windows = <6>; + num-lanes = <2>; + status = "disabled"; + }; + pcie@350 { compatible = "fsl,ls1046a-pcie", "snps,dw-pcie"; reg = <0x00 0x0350 0x0 0x0010 /* controller registers */ @@ -681,6 +692,17 @@ status = "disabled"; }; + pcie_ep@350 { + compatible = "fsl,ls-pcie-ep"; + reg = <0x00 0x0350 0x0 0x0010 + 0x48 0x 0x8 0x>; + reg-names = "regs", "addr_space"; + num-ib-windows = <6>; + num-ob-windows = <6>; + num-lanes = <2>; + status = "disabled"; + }; + pcie@360 { compatible = "fsl,ls1046a-pcie", "snps,dw-pcie"; reg = <0x00 0x0360 0x0 0x0010 /* controller registers */ @@ -707,6 +729,16 @@ status = "disabled"; }; + pcie_ep@360 { + compatible = "fsl,ls-pcie-ep"; + reg = <0x00 0x0360 0x0 0x0010 + 0x50 0x 0x8 0x>; + reg-names = "regs", "addr_space"; + num-ib-windows = <6>; + num-ob-windows = <6>; + num-lanes = <2>; + status = "disabled"; + }; }; reserved-memory { -- 1.7.1
[PATCH 5/6] pci: layerscape: Add the EP mode support.
Add the PCIe EP mode support for layerscape platform. Signed-off-by: Xiaowei Bao --- drivers/pci/controller/dwc/Makefile|2 +- drivers/pci/controller/dwc/pci-layerscape-ep.c | 161 2 files changed, 162 insertions(+), 1 deletions(-) create mode 100644 drivers/pci/controller/dwc/pci-layerscape-ep.c diff --git a/drivers/pci/controller/dwc/Makefile b/drivers/pci/controller/dwc/Makefile index 5d2ce72..b26d617 100644 --- a/drivers/pci/controller/dwc/Makefile +++ b/drivers/pci/controller/dwc/Makefile @@ -8,7 +8,7 @@ obj-$(CONFIG_PCI_EXYNOS) += pci-exynos.o obj-$(CONFIG_PCI_IMX6) += pci-imx6.o obj-$(CONFIG_PCIE_SPEAR13XX) += pcie-spear13xx.o obj-$(CONFIG_PCI_KEYSTONE) += pci-keystone-dw.o pci-keystone.o -obj-$(CONFIG_PCI_LAYERSCAPE) += pci-layerscape.o +obj-$(CONFIG_PCI_LAYERSCAPE) += pci-layerscape.o pci-layerscape-ep.o obj-$(CONFIG_PCIE_QCOM) += pcie-qcom.o obj-$(CONFIG_PCIE_ARMADA_8K) += pcie-armada8k.o obj-$(CONFIG_PCIE_ARTPEC6) += pcie-artpec6.o diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c b/drivers/pci/controller/dwc/pci-layerscape-ep.c new file mode 100644 index 000..3b33bbc --- /dev/null +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c @@ -0,0 +1,161 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * PCIe controller EP driver for Freescale Layerscape SoCs + * + * Copyright (C) 2018 NXP Semiconductor. + * + * Author: Xiaowei Bao + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "pcie-designware.h" + +#define PCIE_DBI2_OFFSET 0x1000 /* DBI2 base address*/ + +struct ls_pcie_ep { + struct dw_pcie *pci; +}; + +#define to_ls_pcie_ep(x) dev_get_drvdata((x)->dev) + +static bool ls_pcie_is_bridge(struct ls_pcie_ep *pcie) +{ + struct dw_pcie *pci = pcie->pci; + u32 header_type; + + header_type = ioread8(pci->dbi_base + PCI_HEADER_TYPE); + header_type &= 0x7f; + + return header_type == PCI_HEADER_TYPE_BRIDGE; +} + +static int ls_pcie_establish_link(struct dw_pcie *pci) +{ + return 0; +} + +static const struct dw_pcie_ops ls_pcie_ep_ops = { + .start_link = ls_pcie_establish_link, +}; + +static const struct of_device_id ls_pcie_ep_of_match[] = { + { .compatible = "fsl,ls-pcie-ep",}, + { }, +}; + +static void ls_pcie_ep_init(struct dw_pcie_ep *ep) +{ + struct dw_pcie *pci = to_dw_pcie_from_ep(ep); + struct pci_epc *epc = ep->epc; + enum pci_barno bar; + + for (bar = BAR_0; bar <= BAR_5; bar++) + dw_pcie_ep_reset_bar(pci, bar); + + epc->features |= EPC_FEATURE_NO_LINKUP_NOTIFIER; +} + +static int ls_pcie_ep_raise_irq(struct dw_pcie_ep *ep, u8 func_no, + enum pci_epc_irq_type type, u16 interrupt_num) +{ + struct dw_pcie *pci = to_dw_pcie_from_ep(ep); + + switch (type) { + case PCI_EPC_IRQ_LEGACY: + return dw_pcie_ep_raise_legacy_irq(ep, func_no); + case PCI_EPC_IRQ_MSI: + return dw_pcie_ep_raise_msi_irq(ep, func_no, interrupt_num); + case PCI_EPC_IRQ_MSIX: + return dw_pcie_ep_raise_msix_irq(ep, func_no, interrupt_num); + default: + dev_err(pci->dev, "UNKNOWN IRQ type\n"); + } + + return 0; +} + +static struct dw_pcie_ep_ops pcie_ep_ops = { + .ep_init = ls_pcie_ep_init, + .raise_irq = ls_pcie_ep_raise_irq, +}; + +static int __init ls_add_pcie_ep(struct ls_pcie_ep *pcie, + struct platform_device *pdev) +{ + struct dw_pcie *pci = pcie->pci; + struct device *dev = pci->dev; + struct dw_pcie_ep *ep; + struct resource *res; + int ret; + + ep = &pci->ep; + ep->ops = &pcie_ep_ops; + + res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "addr_space"); + if (!res) + return -EINVAL; + + ep->phys_base = res->start; + ep->addr_size = resource_size(res); + + ret = dw_pcie_ep_init(ep); + if (ret) { + dev_err(dev, "failed to initialize endpoint\n"); + return ret; + } + + return 0; +} + +static int __init ls_pcie_ep_probe(struct platform_device *pdev) +{ + struct device *dev = &pdev->dev; + struct dw_pcie *pci; + struct ls_pcie_ep *pcie; + struct resource *dbi_base; + int ret; + + pcie = devm_kzalloc(dev, sizeof(*pcie), GFP_KERNEL); + if (!pcie) + return -ENOMEM; + + pci = devm_kzalloc(dev, sizeof(*pci), GFP_KERNEL); + if (!pci) + return -ENOMEM; + + dbi_base = platform_get_resource_byname(pdev, IORESOURCE_MEM, "regs"); + pci->dbi_base = devm_pci_remap_cfg_resource(dev, dbi_base); + if (IS_ERR(pci->dbi_base)) + return PTR_ERR(pci->dbi_base); + + pci->dbi_base2 = pci->dbi_base + PCIE_DBI2_OFFSET; + pci->dev = dev; + pci->ops = &ls_
[PATCH 1/6] arm64: dts: Add the status property disable PCIe
From: Bao Xiaowei Add the status property disable the PCIe, the property will be enable by bootloader. Signed-off-by: Bao Xiaowei --- arch/arm64/boot/dts/freescale/fsl-ls1012a.dtsi |1 + arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi |3 +++ arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi |3 +++ arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi |3 +++ arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi |4 5 files changed, 14 insertions(+), 0 deletions(-) diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1012a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1012a.dtsi index 5da732f..21f2b3b 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls1012a.dtsi +++ b/arch/arm64/boot/dts/freescale/fsl-ls1012a.dtsi @@ -496,6 +496,7 @@ < 0 0 2 &gic 0 111 IRQ_TYPE_LEVEL_HIGH>, < 0 0 3 &gic 0 112 IRQ_TYPE_LEVEL_HIGH>, < 0 0 4 &gic 0 113 IRQ_TYPE_LEVEL_HIGH>; + status = "disabled"; }; }; diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi index 3fed504..760d510 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi +++ b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi @@ -683,6 +683,7 @@ < 0 0 2 &gic 0 111 0x4>, < 0 0 3 &gic 0 112 0x4>, < 0 0 4 &gic 0 113 0x4>; + status = "disabled"; }; pcie@350 { @@ -708,6 +709,7 @@ < 0 0 2 &gic 0 121 0x4>, < 0 0 3 &gic 0 122 0x4>, < 0 0 4 &gic 0 123 0x4>; + status = "disabled"; }; pcie@360 { @@ -733,6 +735,7 @@ < 0 0 2 &gic 0 155 0x4>, < 0 0 3 &gic 0 156 0x4>, < 0 0 4 &gic 0 157 0x4>; + status = "disabled"; }; }; diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi index 51cbd50..64d334c 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi +++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi @@ -652,6 +652,7 @@ < 0 0 2 &gic GIC_SPI 110 IRQ_TYPE_LEVEL_HIGH>, < 0 0 3 &gic GIC_SPI 110 IRQ_TYPE_LEVEL_HIGH>, < 0 0 4 &gic GIC_SPI 110 IRQ_TYPE_LEVEL_HIGH>; + status = "disabled"; }; pcie@350 { @@ -677,6 +678,7 @@ < 0 0 2 &gic GIC_SPI 120 IRQ_TYPE_LEVEL_HIGH>, < 0 0 3 &gic GIC_SPI 120 IRQ_TYPE_LEVEL_HIGH>, < 0 0 4 &gic GIC_SPI 120 IRQ_TYPE_LEVEL_HIGH>; + status = "disabled"; }; pcie@360 { @@ -702,6 +704,7 @@ < 0 0 2 &gic GIC_SPI 154 IRQ_TYPE_LEVEL_HIGH>, < 0 0 3 &gic GIC_SPI 154 IRQ_TYPE_LEVEL_HIGH>, < 0 0 4 &gic GIC_SPI 154 IRQ_TYPE_LEVEL_HIGH>; + status = "disabled"; }; }; diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi index a07f612..9deb9cb 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi +++ b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi @@ -533,6 +533,7 @@ < 0 0 2 &gic 0 0 0 110 IRQ_TYPE_LEVEL_HIGH>, < 0 0 3 &gic 0 0 0 111 IRQ_TYPE_LEVEL_HIGH>, < 0 0 4 &gic 0 0 0 112 IRQ_TYPE_LEVEL_HIGH>; + status = "disabled"; }; pcie@350 { @@ -557,6 +558,7 @@ < 0 0 2 &gic 0 0 0 115 IRQ_TYPE_LEVEL_HIGH>, < 0 0 3 &gic 0 0 0 116 IRQ_TYPE_LEVEL_HIGH>, < 0 0 4 &gic 0 0 0 117 IRQ_TYPE_LEVEL_HIGH>; + status = "disabled"; }; pcie@360 { @@ -581,6 +583,7 @@ < 0 0 2 &gic 0 0 0 120 IRQ_TYPE_LEVEL_HIGH>, < 0 0 3 &gic 0 0 0 121 IRQ_TYPE_LEVEL_HIGH>, < 0 0 4 &gic 0 0 0 122 IRQ_TYPE_LEVEL_HIGH>; +
Re: [PATCH v2 0/2] arm64: Cut rebuild time when changing CONFIG_BLK_DEV_INITRD
+Ard On Thu, Oct 25, 2018 at 4:38 AM Mike Rapoport wrote: > > On Wed, Oct 24, 2018 at 02:55:17PM -0500, Rob Herring wrote: > > On Wed, Oct 24, 2018 at 2:33 PM Florian Fainelli > > wrote: > > > > > > Hi all, > > > > > > While investigating why ARM64 required a ton of objects to be rebuilt > > > when toggling CONFIG_DEV_BLK_INITRD, it became clear that this was > > > because we define __early_init_dt_declare_initrd() differently and we do > > > that in arch/arm64/include/asm/memory.h which gets included by a fair > > > amount of other header files, and translation units as well. > > > > I scratch my head sometimes as to why some config options rebuild so > > much stuff. One down, ? to go. :) > > > > > Changing the value of CONFIG_DEV_BLK_INITRD is a common thing with build > > > systems that generate two kernels: one with the initramfs and one > > > without. buildroot is one of these build systems, OpenWrt is also > > > another one that does this. > > > > > > This patch series proposes adding an empty initrd.h to satisfy the need > > > for drivers/of/fdt.c to unconditionally include that file, and moves the > > > custom __early_init_dt_declare_initrd() definition away from > > > asm/memory.h > > > > > > This cuts the number of objects rebuilds from 1920 down to 26, so a > > > factor 73 approximately. > > > > > > Apologies for the long CC list, please let me know how you would go > > > about merging that and if another approach would be preferable, e.g: > > > introducing a CONFIG_ARCH_INITRD_BELOW_START_OK Kconfig option or > > > something like that. > > > > There may be a better way as of 4.20 because bootmem is now gone and > > only memblock is used. This should unify what each arch needs to do > > with initrd early. We need the physical address early for memblock > > reserving. Then later on we need the virtual address to access the > > initrd. Perhaps we should just change initrd_start and initrd_end to > > physical addresses (or add 2 new variables would be less invasive and > > allow for different translation than __va()). The sanity checks and > > memblock reserve could also perhaps be moved to a common location. > > > > Alternatively, given arm64 is the only oddball, I'd be fine with an > > "if (IS_ENABLED(CONFIG_ARM64))" condition in the default > > __early_init_dt_declare_initrd as long as we have a path to removing > > it like the above option. > > I think arm64 does not have to redefine __early_init_dt_declare_initrd(). > Something like this might be just all we need (completely untested, > probably it won't even compile): > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > index 9d9582c..e9ca238 100644 > --- a/arch/arm64/mm/init.c > +++ b/arch/arm64/mm/init.c > @@ -62,6 +62,9 @@ s64 memstart_addr __ro_after_init = -1; > phys_addr_t arm64_dma_phys_limit __ro_after_init; > > #ifdef CONFIG_BLK_DEV_INITRD > + > +static phys_addr_t initrd_start_phys, initrd_end_phys; > + > static int __init early_initrd(char *p) > { > unsigned long start, size; > @@ -71,8 +74,8 @@ static int __init early_initrd(char *p) > if (*endp == ',') { > size = memparse(endp + 1, NULL); > > - initrd_start = start; > - initrd_end = start + size; > + initrd_start_phys = start; > + initrd_end_phys = end; > } > return 0; > } > @@ -407,14 +410,27 @@ void __init arm64_memblock_init(void) > memblock_add(__pa_symbol(_text), (u64)(_end - _text)); > } > > - if (IS_ENABLED(CONFIG_BLK_DEV_INITRD) && initrd_start) { > + if (IS_ENABLED(CONFIG_BLK_DEV_INITRD) && > + (initrd_start || initrd_start_phys)) { > + /* > +* FIXME: ensure proper precendence between > +* early_initrd and DT when both are present Command line takes precedence, so just reverse the order. > +*/ > + if (initrd_start) { > + initrd_start_phys = __phys_to_virt(initrd_start); > + initrd_end_phys = __phys_to_virt(initrd_end); AIUI, the original issue was doing the P2V translation was happening too early and the VA could be wrong if the linear range is adjusted. So I don't think this would work. I suppose you could convert the VA back to a PA before any adjustments and then back to a VA again after. But that's kind of hacky. 2 wrongs making a right. > + } else if (initrd_start_phys) { > + initrd_start = __va(initrd_start_phys); > + initrd_end = __va(initrd_start_phys); > + } > + > /* > * Add back the memory we just removed if it results in the > * initrd to become inaccessible via the linear mapping. > * Otherwise, this is a no-op > */ > - u64 base = initrd_start & PAGE_MASK; > - u64 size = PAGE_ALIGN(initrd_e
Re: [PATCH 2/4] mm: speed up mremap by 500x on large regions (v2)
On Wed, Oct 24, 2018 at 03:57:24PM +0300, Kirill A. Shutemov wrote: > On Wed, Oct 24, 2018 at 10:57:33PM +1100, Balbir Singh wrote: > > On Wed, Oct 24, 2018 at 01:12:56PM +0300, Kirill A. Shutemov wrote: > > > On Fri, Oct 12, 2018 at 06:31:58PM -0700, Joel Fernandes (Google) wrote: > > > > diff --git a/mm/mremap.c b/mm/mremap.c > > > > index 9e68a02a52b1..2fd163cff406 100644 > > > > --- a/mm/mremap.c > > > > +++ b/mm/mremap.c > > > > @@ -191,6 +191,54 @@ static void move_ptes(struct vm_area_struct *vma, > > > > pmd_t *old_pmd, > > > > drop_rmap_locks(vma); > > > > } > > > > > > > > +static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long > > > > old_addr, > > > > + unsigned long new_addr, unsigned long old_end, > > > > + pmd_t *old_pmd, pmd_t *new_pmd, bool *need_flush) > > > > +{ > > > > + spinlock_t *old_ptl, *new_ptl; > > > > + struct mm_struct *mm = vma->vm_mm; > > > > + > > > > + if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK) > > > > + || old_end - old_addr < PMD_SIZE) > > > > + return false; > > > > + > > > > + /* > > > > +* The destination pmd shouldn't be established, free_pgtables() > > > > +* should have release it. > > > > +*/ > > > > + if (WARN_ON(!pmd_none(*new_pmd))) > > > > + return false; > > > > + > > > > + /* > > > > +* We don't have to worry about the ordering of src and dst > > > > +* ptlocks because exclusive mmap_sem prevents deadlock. > > > > +*/ > > > > + old_ptl = pmd_lock(vma->vm_mm, old_pmd); > > > > + if (old_ptl) { > > > > > > How can it ever be false? Kirill, It cannot, you are right. I'll remove the test. By the way, there are new changes upstream by Linus which flush the TLB before releasing the ptlock instead of after. I'm guessing that patch came about because of reviews of this patch and someone spotted an issue in the existing code :) Anyway the patch in concern is: eb66ae030829 ("mremap: properly flush TLB before releasing the page") I need to rebase on top of that with appropriate modifications, but I worry that this patch will slow down performance since we have to flush at every PMD/PTE move before releasing the ptlock. Where as with my patch, the intention is to flush only at once in the end of move_page_tables. When I tried to flush TLB on every PMD move, it was quite slow on my arm64 device [2]. Further observation [1] is, it seems like the move_huge_pmds and move_ptes code is a bit sub optimal in the sense, we are acquiring and releasing the same ptlock for a bunch of PMDs if the said PMDs are on the same page-table page right? Instead we can do better by acquiring and release the ptlock less often. I think this observation [1] and the frequent TLB flush issue [2] can be solved by acquiring the ptlock once for a bunch of PMDs, move them all, then flush the tlb and then release the ptlock, and then proceed to doing the same thing for the PMDs in the next page-table page. What do you think? - Joel
Re: [PATCH 2/4] mm: speed up mremap by 500x on large regions (v2)
On Wed, Oct 24, 2018 at 10:57:33PM +1100, Balbir Singh wrote: [...] > > > + pmd_t pmd; > > > + > > > + new_ptl = pmd_lockptr(mm, new_pmd); > > > Looks like this is largely inspired by move_huge_pmd(), I guess a lot of > the code applies, why not just reuse as much as possible? The same comments > w.r.t mmap_sem helping protect against lock order issues applies as well. I thought about this and when I looked into it, it seemed there are subtle differences that make such sharing not worth it (or not possible). - Joel
Re: [PATCH] seccomp: Add pkru into seccomp_data
On 10/24/2018 08:06 PM, Florian Weimer wrote: * Michael Sammler: Add the current value of the PKRU register to data available for seccomp-bpf programs to work on. This allows filters based on the currently enabled protection keys. diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h index 9efc0e73..e8b9ecfc 100644 --- a/include/uapi/linux/seccomp.h +++ b/include/uapi/linux/seccomp.h @@ -52,12 +52,16 @@ * @instruction_pointer: at the time of the system call. * @args: up to 6 system call arguments always stored as 64-bit values *regardless of the architecture. + * @pkru: value of the pkru register + * @reserved: pad the structure to a multiple of eight bytes */ struct seccomp_data { int nr; __u32 arch; __u64 instruction_pointer; __u64 args[6]; + __u32 pkru; + __u32 reserved; }; This doesn't cover the POWER implementation. Adding Cc:s. And I think the kernel shouldn't expose the number of protection keys in the ABI. Thanks, Florian Thank you for the pointer about the POWER implementation. I am not familiar with POWER in general and its protection key feature at all. Would the AMR register be the correct register to expose here? I understand your concern about exposing the number of protection keys in the ABI. One idea would be to state, that the pkru field (which should probably be renamed) contains an architecture specific value, which could then be the PKRU on x86 and AMR (or another register) on POWER. This new field should probably be extended to __u64 and the reserved field removed. Another idea would be to not add a field in the seccomp_data structure, but instead provide a new BPF instruction, which reads the value of a specified protection key. - Michael
Re: [PATCH 2/4] mm: speed up mremap by 500x on large regions (v2)
On Wed, Oct 24, 2018 at 07:09:07PM -0700, Joel Fernandes wrote: > On Wed, Oct 24, 2018 at 03:57:24PM +0300, Kirill A. Shutemov wrote: > > On Wed, Oct 24, 2018 at 10:57:33PM +1100, Balbir Singh wrote: > > > On Wed, Oct 24, 2018 at 01:12:56PM +0300, Kirill A. Shutemov wrote: > > > > On Fri, Oct 12, 2018 at 06:31:58PM -0700, Joel Fernandes (Google) wrote: > > > > > diff --git a/mm/mremap.c b/mm/mremap.c > > > > > index 9e68a02a52b1..2fd163cff406 100644 > > > > > --- a/mm/mremap.c > > > > > +++ b/mm/mremap.c > > > > > @@ -191,6 +191,54 @@ static void move_ptes(struct vm_area_struct > > > > > *vma, pmd_t *old_pmd, > > > > > drop_rmap_locks(vma); > > > > > } > > > > > > > > > > +static bool move_normal_pmd(struct vm_area_struct *vma, unsigned > > > > > long old_addr, > > > > > + unsigned long new_addr, unsigned long old_end, > > > > > + pmd_t *old_pmd, pmd_t *new_pmd, bool *need_flush) > > > > > +{ > > > > > + spinlock_t *old_ptl, *new_ptl; > > > > > + struct mm_struct *mm = vma->vm_mm; > > > > > + > > > > > + if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK) > > > > > + || old_end - old_addr < PMD_SIZE) > > > > > + return false; > > > > > + > > > > > + /* > > > > > + * The destination pmd shouldn't be established, free_pgtables() > > > > > + * should have release it. > > > > > + */ > > > > > + if (WARN_ON(!pmd_none(*new_pmd))) > > > > > + return false; > > > > > + > > > > > + /* > > > > > + * We don't have to worry about the ordering of src and dst > > > > > + * ptlocks because exclusive mmap_sem prevents deadlock. > > > > > + */ > > > > > + old_ptl = pmd_lock(vma->vm_mm, old_pmd); > > > > > + if (old_ptl) { > > > > > > > > How can it ever be false? > > Kirill, > It cannot, you are right. I'll remove the test. > > By the way, there are new changes upstream by Linus which flush the TLB > before releasing the ptlock instead of after. I'm guessing that patch came > about because of reviews of this patch and someone spotted an issue in the > existing code :) > > Anyway the patch in concern is: > eb66ae030829 ("mremap: properly flush TLB before releasing the page") > > I need to rebase on top of that with appropriate modifications, but I worry > that this patch will slow down performance since we have to flush at every > PMD/PTE move before releasing the ptlock. Where as with my patch, the > intention is to flush only at once in the end of move_page_tables. When I > tried to flush TLB on every PMD move, it was quite slow on my arm64 device > [2]. > > Further observation [1] is, it seems like the move_huge_pmds and move_ptes > code > is a bit sub optimal in the sense, we are acquiring and releasing the same > ptlock for a bunch of PMDs if the said PMDs are on the same page-table page > right? Instead we can do better by acquiring and release the ptlock less > often. > > I think this observation [1] and the frequent TLB flush issue [2] can be > solved > by acquiring the ptlock once for a bunch of PMDs, move them all, then flush > the tlb and then release the ptlock, and then proceed to doing the same thing > for the PMDs in the next page-table page. What do you think? Yeah, that's viable optimization. The tricky part is that one PMD page table can have PMD entires of different types: THP, page table that you can move as whole and the one that you cannot (for any reason). If we cannot move the PMD entry as a whole and must go to PTE page table we would need to drop PMD ptl and take PTE ptl (it might be the same lock in some configuations). Also we don't want to take PMD lock unless it's required. I expect it to be not very trivial to get everything right. But take a shot :) -- Kirill A. Shutemov
[PATCH] powerpc/process: Fix flush_all_to_thread for SPE
From: "Felipe Rechia" Date: Wed, 24 Oct 2018 10:57:22 -0300 Subject: [PATCH] powerpc/process: Fix flush_all_to_thread for SPE Fix a bug introduced by the creation of flush_all_to_thread() for processors that have SPE (Signal Processing Engine) and use it to compute floating-point operations. >From userspace perspective, the problem was seen in attempts of computing floating-point operations which should generate exceptions. For example: fork(); float x = 0.0 / 0.0; isnan(x); // forked process returns False (should be True) The operation above also should always cause the SPEFSCR FINV bit to be set. However, the SPE floating-point exceptions were turned off after a fork(). Kernel versions prior to the bug used flush_spe_to_thread(), which first saves SPEFSCR register values in tsk->thread and then calls giveup_spe(tsk). After commit 579e633e764e, the save_all() function was called first to giveup_spe(), and then the SPEFSCR register values were saved in tsk->thread. This would save the SPEFSCR register values after disabling SPE for that thread, causing the bug described above. Fixes 579e633e764e ("powerpc: create flush_all_to_thread()") Signed-off-by: felipe.rechia --- arch/powerpc/kernel/process.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index a0c74bb..16eb428 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -566,12 +566,11 @@ void flush_all_to_thread(struct task_struct *tsk) if (tsk->thread.regs) { preempt_disable(); BUG_ON(tsk != current); - save_all(tsk); - #ifdef CONFIG_SPE if (tsk->thread.regs->msr & MSR_SPE) tsk->thread.spefscr = mfspr(SPRN_SPEFSCR); #endif + save_all(tsk); preempt_enable(); } -- 2.7.4
Re: [PATCH v5 00/18] of: overlay: validation checks, subsequent fixes
On Wed, Oct 24, 2018 at 2:57 PM Rob Herring wrote: > > On Mon, Oct 22, 2018 at 4:25 PM Alan Tull wrote: > > > > On Thu, Oct 18, 2018 at 5:48 PM wrote: > > > > > > From: Frank Rowand > > > > > > Add checks to (1) overlay apply process and (2) memory freeing > > > triggered by overlay release. The checks are intended to detect > > > possible memory leaks and invalid overlays. > > > > I've tested v5, nothing new to report. > > Does that mean everything broken or everything works great? In the > latter case, care to give a Tested-by. > > Rob Tested-by: Alan Tull Alan
Re: [PATCH] seccomp: Add pkru into seccomp_data
On 10/25/2018 11:12 AM, Florian Weimer wrote: I understand your concern about exposing the number of protection keys in the ABI. One idea would be to state, that the pkru field (which should probably be renamed) contains an architecture specific value, which could then be the PKRU on x86 and AMR (or another register) on POWER. This new field should probably be extended to __u64 and the reserved field removed. POWER also has proper read/write bit separation, not PKEY_DISABLE_ACCESS (disable read and write) and PKEY_DISABLE_WRITE like Intel. It's currently translated by the kernel, but I really need a PKEY_DISABLE_READ bit in glibc to implement pkey_get in case the memory is write-only. The idea here would be to simply provide the raw value of the register (PKRU on x86, AMR on POWER) to the BPF program and let the BPF program (or maybe a higher level library like libseccomp) deal with the complications of interpreting this architecture specific value (similar how the BPF program currently already has to deal with architecture specific system call numbers). If an architecture were to support more protection keys than fit into the field, the architecture specific value stored in the field might simply be the first protection keys. If there was interest, it would be possible to add more architecture specific fields to seccomp_data. Another idea would be to not add a field in the seccomp_data structure, but instead provide a new BPF instruction, which reads the value of a specified protection key. I would prefer that if it's possible. We should make sure that the bits are the same as those returned from pkey_get. I have an implementation on POWER, but have yet to figure out the implications for 32-bit because I do not know the AMR register size there. Thanks, Florian I have had a look at how BPF is implemented and it does not seem to be easy to just add an BPF instruction for seccomp since (as far as I understand) the code of the classical BPF (as used by seccomp) is shared with the code of eBPF, which is used in many parts of the kernel and there is at least one interpreter and one JIT compiler for BPF. But maybe someone with more experience than me can comment on how hard it would be to add an instruction to BPF. - Michael
Re: [PATCH v2 0/2] arm64: Cut rebuild time when changing CONFIG_BLK_DEV_INITRD
On Thu, Oct 25, 2018 at 08:15:15AM -0500, Rob Herring wrote: > +Ard > > On Thu, Oct 25, 2018 at 4:38 AM Mike Rapoport wrote: > > > > On Wed, Oct 24, 2018 at 02:55:17PM -0500, Rob Herring wrote: > > > On Wed, Oct 24, 2018 at 2:33 PM Florian Fainelli > > > wrote: > > > > > > > > Hi all, > > > > > > > > While investigating why ARM64 required a ton of objects to be rebuilt > > > > when toggling CONFIG_DEV_BLK_INITRD, it became clear that this was > > > > because we define __early_init_dt_declare_initrd() differently and we do > > > > that in arch/arm64/include/asm/memory.h which gets included by a fair > > > > amount of other header files, and translation units as well. > > > > > > I scratch my head sometimes as to why some config options rebuild so > > > much stuff. One down, ? to go. :) > > > > > > > Changing the value of CONFIG_DEV_BLK_INITRD is a common thing with build > > > > systems that generate two kernels: one with the initramfs and one > > > > without. buildroot is one of these build systems, OpenWrt is also > > > > another one that does this. > > > > > > > > This patch series proposes adding an empty initrd.h to satisfy the need > > > > for drivers/of/fdt.c to unconditionally include that file, and moves the > > > > custom __early_init_dt_declare_initrd() definition away from > > > > asm/memory.h > > > > > > > > This cuts the number of objects rebuilds from 1920 down to 26, so a > > > > factor 73 approximately. > > > > > > > > Apologies for the long CC list, please let me know how you would go > > > > about merging that and if another approach would be preferable, e.g: > > > > introducing a CONFIG_ARCH_INITRD_BELOW_START_OK Kconfig option or > > > > something like that. > > > > > > There may be a better way as of 4.20 because bootmem is now gone and > > > only memblock is used. This should unify what each arch needs to do > > > with initrd early. We need the physical address early for memblock > > > reserving. Then later on we need the virtual address to access the > > > initrd. Perhaps we should just change initrd_start and initrd_end to > > > physical addresses (or add 2 new variables would be less invasive and > > > allow for different translation than __va()). The sanity checks and > > > memblock reserve could also perhaps be moved to a common location. > > > > > > Alternatively, given arm64 is the only oddball, I'd be fine with an > > > "if (IS_ENABLED(CONFIG_ARM64))" condition in the default > > > __early_init_dt_declare_initrd as long as we have a path to removing > > > it like the above option. > > > > I think arm64 does not have to redefine __early_init_dt_declare_initrd(). > > Something like this might be just all we need (completely untested, > > probably it won't even compile): > > > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > > index 9d9582c..e9ca238 100644 > > --- a/arch/arm64/mm/init.c > > +++ b/arch/arm64/mm/init.c > > @@ -62,6 +62,9 @@ s64 memstart_addr __ro_after_init = -1; > > phys_addr_t arm64_dma_phys_limit __ro_after_init; > > > > #ifdef CONFIG_BLK_DEV_INITRD > > + > > +static phys_addr_t initrd_start_phys, initrd_end_phys; > > + > > static int __init early_initrd(char *p) > > { > > unsigned long start, size; > > @@ -71,8 +74,8 @@ static int __init early_initrd(char *p) > > if (*endp == ',') { > > size = memparse(endp + 1, NULL); > > > > - initrd_start = start; > > - initrd_end = start + size; > > + initrd_start_phys = start; > > + initrd_end_phys = end; > > } > > return 0; > > } > > @@ -407,14 +410,27 @@ void __init arm64_memblock_init(void) > > memblock_add(__pa_symbol(_text), (u64)(_end - _text)); > > } > > > > - if (IS_ENABLED(CONFIG_BLK_DEV_INITRD) && initrd_start) { > > + if (IS_ENABLED(CONFIG_BLK_DEV_INITRD) && > > + (initrd_start || initrd_start_phys)) { > > + /* > > +* FIXME: ensure proper precendence between > > +* early_initrd and DT when both are present > > Command line takes precedence, so just reverse the order. > > > +*/ > > + if (initrd_start) { > > + initrd_start_phys = __phys_to_virt(initrd_start); > > + initrd_end_phys = __phys_to_virt(initrd_end); > > AIUI, the original issue was doing the P2V translation was happening > too early and the VA could be wrong if the linear range is adjusted. > So I don't think this would work. Probably things have changed since then, but in the current code there is initrd_start = __phys_to_virt(initrd_start); and in between only the code related to CONFIG_RANDOMIZE_BASE, so I believe it's safe to use __phys_to_virt() here as well. > I suppose you could convert the VA back to a PA before any adjustments > and then back to a VA again after. But that's kind of hacky. 2 wrongs > making a right. > > > +
Re: [PATCH v6 5/6] arm64: dts: add QorIQ LX2160A SoC support
On Thu, Oct 25, 2018 at 2:03 AM Vabhav Sharma wrote: > > LX2160A SoC is based on Layerscape Chassis Generation 3.2 Architecture. > > LX2160A features an advanced 16 64-bit ARM v8 CortexA72 processor cores > in 8 cluster, CCN508, GICv3,two 64-bit DDR4 memory controller, 8 I2C > controllers, 3 dspi, 2 esdhc,2 USB 3.0, mmu 500, 3 SATA, 4 PL011 SBSA > UARTs etc. > > Signed-off-by: Ramneek Mehresh > Signed-off-by: Zhang Ying-22455 > Signed-off-by: Nipun Gupta > Signed-off-by: Priyanka Jain > Signed-off-by: Yogesh Gaur > Signed-off-by: Sriram Dash > Signed-off-by: Vabhav Sharma > Signed-off-by: Horia Geanta > Signed-off-by: Ran Wang > Signed-off-by: Yinbo Zhu > --- > arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi | 766 > + > 1 file changed, 766 insertions(+) > create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi > > diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi > b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi > new file mode 100644 > index 000..9fcfd48 > --- /dev/null > +++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi > @@ -0,0 +1,766 @@ > +// SPDX-License-Identifier: (GPL-2.0 OR MIT) > +// > +// Device Tree Include file for Layerscape-LX2160A family SoC. > +// > +// Copyright 2018 NXP > + > +#include > +#include > + > +/memreserve/ 0x8000 0x0001; > + > +/ { > + compatible = "fsl,lx2160a"; > + interrupt-parent = <&gic>; > + #address-cells = <2>; > + #size-cells = <2>; > + > + cpus { > + #address-cells = <1>; > + #size-cells = <0>; > + > + // 8 clusters having 2 Cortex-A72 cores each > + cpu@0 { > + device_type = "cpu"; > + compatible = "arm,cortex-a72"; > + enable-method = "psci"; > + reg = <0x0>; > + clocks = <&clockgen 1 0>; > + d-cache-size = <0x8000>; > + d-cache-line-size = <64>; > + d-cache-sets = <128>; > + i-cache-size = <0xC000>; > + i-cache-line-size = <64>; > + i-cache-sets = <192>; > + next-level-cache = <&cluster0_l2>; > + }; > + > + cpu@1 { > + device_type = "cpu"; > + compatible = "arm,cortex-a72"; > + enable-method = "psci"; > + reg = <0x1>; > + clocks = <&clockgen 1 0>; > + d-cache-size = <0x8000>; > + d-cache-line-size = <64>; > + d-cache-sets = <128>; > + i-cache-size = <0xC000>; > + i-cache-line-size = <64>; > + i-cache-sets = <192>; > + next-level-cache = <&cluster0_l2>; > + }; > + > + cpu@100 { > + device_type = "cpu"; > + compatible = "arm,cortex-a72"; > + enable-method = "psci"; > + reg = <0x100>; > + clocks = <&clockgen 1 1>; > + d-cache-size = <0x8000>; > + d-cache-line-size = <64>; > + d-cache-sets = <128>; > + i-cache-size = <0xC000>; > + i-cache-line-size = <64>; > + i-cache-sets = <192>; > + next-level-cache = <&cluster1_l2>; > + }; > + > + cpu@101 { > + device_type = "cpu"; > + compatible = "arm,cortex-a72"; > + enable-method = "psci"; > + reg = <0x101>; > + clocks = <&clockgen 1 1>; > + d-cache-size = <0x8000>; > + d-cache-line-size = <64>; > + d-cache-sets = <128>; > + i-cache-size = <0xC000>; > + i-cache-line-size = <64>; > + i-cache-sets = <192>; > + next-level-cache = <&cluster1_l2>; > + }; > + > + cpu@200 { > + device_type = "cpu"; > + compatible = "arm,cortex-a72"; > + enable-method = "psci"; > + reg = <0x200>; > + clocks = <&clockgen 1 2>; > + d-cache-size = <0x8000>; > + d-cache-line-size = <64>; > + d-cache-sets = <128>; > + i-cache-size = <0xC000>; > + i-cache-line-size = <64>; > + i-cache-sets = <192>; > + next-level-cache = <&cluster2_l2>; > + }; > + > + cpu@201 { > +
[PATCH v1 0/5] Add dtl_entry tracepoint
This is v1 of the patches for providing a tracepoint for processing the dispatch trace log entries from the hypervisor in a shared processor LPAR. The previous RFC can be found here: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=66340 Since the RFC, this series has been expanded/generalized to support !CONFIG_VIRT_CPU_ACCOUNTING_NATIVE and has been tested in different configurations. The dispatch distance calculation has also been updated to use the platform provided information better. Also, patch 3 is new and fixes an issue with stolen time accounting when the dtl debugfs interface is in use. - Naveen Naveen N. Rao (5): powerpc/pseries: Use macros for referring to the DTL enable mask powerpc/pseries: Do not save the previous DTL mask value powerpc/pseries: Fix stolen time accounting when dtl debugfs is used powerpc/pseries: Factor out DTL buffer allocation and registration routines powerpc/pseries: Introduce dtl_entry tracepoint arch/powerpc/include/asm/lppaca.h | 11 + arch/powerpc/include/asm/plpar_wrappers.h | 9 + arch/powerpc/include/asm/trace.h | 55 + arch/powerpc/kernel/entry_64.S| 39 arch/powerpc/kernel/time.c| 7 +- arch/powerpc/mm/numa.c| 144 - arch/powerpc/platforms/pseries/dtl.c | 22 +- arch/powerpc/platforms/pseries/lpar.c | 249 -- arch/powerpc/platforms/pseries/setup.c| 34 +-- 9 files changed, 502 insertions(+), 68 deletions(-) -- 2.19.1
[PATCH v1 1/5] powerpc/pseries: Use macros for referring to the DTL enable mask
Introduce macros to encode the DTL enable mask fields and use those instead of hardcoding numbers. Signed-off-by: Naveen N. Rao --- arch/powerpc/include/asm/lppaca.h | 11 +++ arch/powerpc/platforms/pseries/dtl.c | 8 +--- arch/powerpc/platforms/pseries/lpar.c | 2 +- arch/powerpc/platforms/pseries/setup.c | 2 +- 4 files changed, 14 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/include/asm/lppaca.h b/arch/powerpc/include/asm/lppaca.h index 7c23ce8a5a4c..2c7e31187726 100644 --- a/arch/powerpc/include/asm/lppaca.h +++ b/arch/powerpc/include/asm/lppaca.h @@ -154,6 +154,17 @@ struct dtl_entry { #define DISPATCH_LOG_BYTES 4096/* bytes per cpu */ #define N_DISPATCH_LOG (DISPATCH_LOG_BYTES / sizeof(struct dtl_entry)) +/* + * Dispatch trace log event enable mask: + * 0x1: voluntary virtual processor waits + * 0x2: time-slice preempts + * 0x4: virtual partition memory page faults + */ +#define DTL_LOG_CEDE 0x1 +#define DTL_LOG_PREEMPT0x2 +#define DTL_LOG_FAULT 0x4 +#define DTL_LOG_ALL(DTL_LOG_CEDE | DTL_LOG_PREEMPT | DTL_LOG_FAULT) + extern struct kmem_cache *dtl_cache; /* diff --git a/arch/powerpc/platforms/pseries/dtl.c b/arch/powerpc/platforms/pseries/dtl.c index ef6595153642..051ea2de1e1a 100644 --- a/arch/powerpc/platforms/pseries/dtl.c +++ b/arch/powerpc/platforms/pseries/dtl.c @@ -40,13 +40,7 @@ struct dtl { }; static DEFINE_PER_CPU(struct dtl, cpu_dtl); -/* - * Dispatch trace log event mask: - * 0x7: 0x1: voluntary virtual processor waits - * 0x2: time-slice preempts - * 0x4: virtual partition memory page faults - */ -static u8 dtl_event_mask = 0x7; +static u8 dtl_event_mask = DTL_LOG_ALL; /* diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c index 0b5081085a44..ad194420e8ae 100644 --- a/arch/powerpc/platforms/pseries/lpar.c +++ b/arch/powerpc/platforms/pseries/lpar.c @@ -125,7 +125,7 @@ void vpa_init(int cpu) pr_err("WARNING: DTL registration of cpu %d (hw %d) " "failed with %ld\n", smp_processor_id(), hwcpu, ret); - lppaca_of(cpu).dtl_enable_mask = 2; + lppaca_of(cpu).dtl_enable_mask = DTL_LOG_PREEMPT; } } diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index 0f553dcfa548..f3b5822e88c6 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -306,7 +306,7 @@ static int alloc_dispatch_logs(void) pr_err("WARNING: DTL registration of cpu %d (hw %d) failed " "with %d\n", smp_processor_id(), hard_smp_processor_id(), ret); - get_paca()->lppaca_ptr->dtl_enable_mask = 2; + get_paca()->lppaca_ptr->dtl_enable_mask = DTL_LOG_PREEMPT; return 0; } -- 2.19.1
[PATCH v1 2/5] powerpc/pseries: Do not save the previous DTL mask value
When CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is enabled, we always initialize DTL enable mask to DTL_LOG_PREEMPT (0x2). There are no other places where the mask is changed. As such, when reading the DTL log buffer through debugfs, there is no need to save and restore the previous mask value. We don't need to save and restore the earlier mask value if CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not enabled. So, remove the field from the structure as well. Signed-off-by: Naveen N. Rao --- arch/powerpc/platforms/pseries/dtl.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/powerpc/platforms/pseries/dtl.c b/arch/powerpc/platforms/pseries/dtl.c index 051ea2de1e1a..fb05804adb2f 100644 --- a/arch/powerpc/platforms/pseries/dtl.c +++ b/arch/powerpc/platforms/pseries/dtl.c @@ -55,7 +55,6 @@ struct dtl_ring { struct dtl_entry *write_ptr; struct dtl_entry *buf; struct dtl_entry *buf_end; - u8 saved_dtl_mask; }; static DEFINE_PER_CPU(struct dtl_ring, dtl_rings); @@ -105,7 +104,6 @@ static int dtl_start(struct dtl *dtl) dtlr->write_ptr = dtl->buf; /* enable event logging */ - dtlr->saved_dtl_mask = lppaca_of(dtl->cpu).dtl_enable_mask; lppaca_of(dtl->cpu).dtl_enable_mask |= dtl_event_mask; dtl_consumer = consume_dtle; @@ -123,7 +121,7 @@ static void dtl_stop(struct dtl *dtl) dtlr->buf = NULL; /* restore dtl_enable_mask */ - lppaca_of(dtl->cpu).dtl_enable_mask = dtlr->saved_dtl_mask; + lppaca_of(dtl->cpu).dtl_enable_mask = DTL_LOG_PREEMPT; if (atomic_dec_and_test(&dtl_count)) dtl_consumer = NULL; -- 2.19.1
[PATCH v1 3/5] powerpc/pseries: Fix stolen time accounting when dtl debugfs is used
When the dtl debugfs interface is used, we usually set the dtl_enable_mask to 0x7 (DTL_LOG_ALL). When this happens, we start seeing DTL entries for all preempt reasons, including CEDE. In scan_dispatch_log(), we add up the times from all entries and account those towards stolen time. However, we should only be accounting stolen time when the preemption was due to HDEC at the end of our time slice. Fix this by checking for the dispatch reason in the DTL entry before adding to the stolen time. Fixes: cf9efce0ce313 ("powerpc: Account time using timebase rather than PURR") Signed-off-by: Naveen N. Rao --- arch/powerpc/kernel/time.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c index 40868f3ee113..923abc3e555d 100644 --- a/arch/powerpc/kernel/time.c +++ b/arch/powerpc/kernel/time.c @@ -199,7 +199,7 @@ static u64 scan_dispatch_log(u64 stop_tb) struct lppaca *vpa = local_paca->lppaca_ptr; u64 tb_delta; u64 stolen = 0; - u64 dtb; + u64 dtb, dispatch_reason; if (!dtl) return 0; @@ -210,6 +210,7 @@ static u64 scan_dispatch_log(u64 stop_tb) dtb = be64_to_cpu(dtl->timebase); tb_delta = be32_to_cpu(dtl->enqueue_to_dispatch_time) + be32_to_cpu(dtl->ready_to_enqueue_time); + dispatch_reason = dtl->dispatch_reason; barrier(); if (i + N_DISPATCH_LOG < be64_to_cpu(vpa->dtl_idx)) { /* buffer has overflowed */ @@ -221,7 +222,9 @@ static u64 scan_dispatch_log(u64 stop_tb) break; if (dtl_consumer) dtl_consumer(dtl, i); - stolen += tb_delta; + /* 7 indicates that this dispatch follows a time slice preempt */ + if (dispatch_reason == 7) + stolen += tb_delta; ++i; ++dtl; if (dtl == dtl_end) -- 2.19.1
[PATCH v1 4/5] powerpc/pseries: Factor out DTL buffer allocation and registration routines
Introduce new helpers for DTL buffer allocation and registration and have the existing code use those. Signed-off-by: Naveen N. Rao --- arch/powerpc/include/asm/plpar_wrappers.h | 2 + arch/powerpc/platforms/pseries/lpar.c | 66 --- arch/powerpc/platforms/pseries/setup.c| 34 +--- 3 files changed, 52 insertions(+), 50 deletions(-) diff --git a/arch/powerpc/include/asm/plpar_wrappers.h b/arch/powerpc/include/asm/plpar_wrappers.h index cff5a411e595..7dcbf42e9e11 100644 --- a/arch/powerpc/include/asm/plpar_wrappers.h +++ b/arch/powerpc/include/asm/plpar_wrappers.h @@ -88,6 +88,8 @@ static inline long register_dtl(unsigned long cpu, unsigned long vpa) return vpa_call(H_VPA_REG_DTL, cpu, vpa); } +extern void alloc_dtl_buffers(void); +extern void register_dtl_buffer(int cpu); extern void vpa_init(int cpu); static inline long plpar_pte_enter(unsigned long flags, diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c index ad194420e8ae..d83bb3db6767 100644 --- a/arch/powerpc/platforms/pseries/lpar.c +++ b/arch/powerpc/platforms/pseries/lpar.c @@ -64,13 +64,58 @@ EXPORT_SYMBOL(plpar_hcall); EXPORT_SYMBOL(plpar_hcall9); EXPORT_SYMBOL(plpar_hcall_norets); +void alloc_dtl_buffers(void) +{ + int cpu; + struct paca_struct *pp; + struct dtl_entry *dtl; + + for_each_possible_cpu(cpu) { + pp = paca_ptrs[cpu]; + dtl = kmem_cache_alloc(dtl_cache, GFP_KERNEL); + if (!dtl) { + pr_warn("Failed to allocate dispatch trace log for cpu %d\n", + cpu); + pr_warn("Stolen time statistics will be unreliable\n"); + break; + } + + pp->dtl_ridx = 0; + pp->dispatch_log = dtl; + pp->dispatch_log_end = dtl + N_DISPATCH_LOG; + pp->dtl_curr = dtl; + } +} + +void register_dtl_buffer(int cpu) +{ + long ret; + struct paca_struct *pp; + struct dtl_entry *dtl; + int hwcpu = get_hard_smp_processor_id(cpu); + + pp = paca_ptrs[cpu]; + dtl = pp->dispatch_log; + if (dtl) { + pp->dtl_ridx = 0; + pp->dtl_curr = dtl; + lppaca_of(cpu).dtl_idx = 0; + + /* hypervisor reads buffer length from this field */ + dtl->enqueue_to_dispatch_time = cpu_to_be32(DISPATCH_LOG_BYTES); + ret = register_dtl(hwcpu, __pa(dtl)); + if (ret) + pr_err("WARNING: DTL registration of cpu %d (hw %d) " + "failed with %ld\n", cpu, hwcpu, ret); + lppaca_of(cpu).dtl_enable_mask = DTL_LOG_PREEMPT; + } +} + void vpa_init(int cpu) { int hwcpu = get_hard_smp_processor_id(cpu); unsigned long addr; long ret; - struct paca_struct *pp; - struct dtl_entry *dtl; /* * The spec says it "may be problematic" if CPU x registers the VPA of @@ -111,22 +156,7 @@ void vpa_init(int cpu) /* * Register dispatch trace log, if one has been allocated. */ - pp = paca_ptrs[cpu]; - dtl = pp->dispatch_log; - if (dtl) { - pp->dtl_ridx = 0; - pp->dtl_curr = dtl; - lppaca_of(cpu).dtl_idx = 0; - - /* hypervisor reads buffer length from this field */ - dtl->enqueue_to_dispatch_time = cpu_to_be32(DISPATCH_LOG_BYTES); - ret = register_dtl(hwcpu, __pa(dtl)); - if (ret) - pr_err("WARNING: DTL registration of cpu %d (hw %d) " - "failed with %ld\n", smp_processor_id(), - hwcpu, ret); - lppaca_of(cpu).dtl_enable_mask = DTL_LOG_PREEMPT; - } + register_dtl_buffer(cpu); } #ifdef CONFIG_PPC_BOOK3S_64 diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index f3b5822e88c6..be6a3845b7ea 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -267,46 +267,16 @@ struct kmem_cache *dtl_cache; */ static int alloc_dispatch_logs(void) { - int cpu, ret; - struct paca_struct *pp; - struct dtl_entry *dtl; - if (!firmware_has_feature(FW_FEATURE_SPLPAR)) return 0; if (!dtl_cache) return 0; - for_each_possible_cpu(cpu) { - pp = paca_ptrs[cpu]; - dtl = kmem_cache_alloc(dtl_cache, GFP_KERNEL); - if (!dtl) { - pr_warn("Failed to allocate dispatch trace log for cpu %d\n", - cpu); - pr_warn("Stolen time statistics will be unreliable\n"); - break; - } - - pp->dtl_ridx = 0; -
[PATCH v1 5/5] powerpc/pseries: Introduce dtl_entry tracepoint
This tracepoint provides access to the fields of each DTL entry in the Dispatch Trace Log buffer. Since the buffer is populated by the hypervisor and since we allocate just a 4k area per cpu for the buffer, we need to process the entries on a regular basis before they are overwritten by the hypervisor. We do this by using a static branch (or a reference counter if we don't have jump labels) in ret_from_except similar to how the hcall/opal tracepoints do. Apart from making the DTL entries available for processing through the usual trace interface, this tracepoint also adds a new field 'distance' to each DTL entry, enabling enhanced statistics around the vcpu dispatch behavior of the hypervisor. For Shared Processor LPARs, the POWER Hypervisor maintains a relatively static mapping of LPAR vcpus to physical processor cores and tries to always dispatch vcpus on their associated physical processor core. The LPAR can discover this through the H_VPHN(flags=1) hcall to obtain the associativity of the LPAR vcpus. However, under certain scenarios, vcpus may be dispatched on a different processor core. The actual physical processor number on which a certain vcpu is dispatched is available to the LPAR in the 'processor_id' field of each DTL entry. The LPAR can then discover the associativity of that physical processor through the H_VPHN(flags=2) hcall. This can then be compared to the home node associativity for that specific vcpu to determine if the vcpu was dispatched on the same core or not. If the vcpu was not dispatched on the home node, it is possible to determine if the vcpu was dispatched in a different chip, socket or drawer. The tracepoint field 'distance' encodes this information. If distance is 0, then the vcpu was dispatched on its home node/chip. If not, increasing values of 'distance' indicate a dispatch on a different chip in a MCM, different socket or a different drawer. In terms of the implementation, we update our numa code to retain the vcpu associativity that is retrieved while discovering our numa topology. In addition, on tracepoint registration, we discover the physical cpu associativity. This information is only retrieved during the tracepoint registration and is not expected to change for the duration of the trace. To support configurations with/without CONFIG_VIRT_CPU_ACCOUNTING_NATIVE selected, we generalize and extend helpers for DTL buffer allocation, freeing and registration. We also introduce a global variable 'dtl_mask' to encode the DTL enable mask to be set for all cpus. This helps ensure that cpus that come online honor the global enable mask. Finally, to ensure that the new dtl_entry tracepoint usage does not interfere with the dtl debugfs interface, we introduce helpers to ensure only one of the two interfaces are used at any point in time. Signed-off-by: Naveen N. Rao --- arch/powerpc/include/asm/plpar_wrappers.h | 7 + arch/powerpc/include/asm/trace.h | 55 +++ arch/powerpc/kernel/entry_64.S| 39 + arch/powerpc/mm/numa.c| 144 - arch/powerpc/platforms/pseries/dtl.c | 10 +- arch/powerpc/platforms/pseries/lpar.c | 187 +- 6 files changed, 434 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/plpar_wrappers.h b/arch/powerpc/include/asm/plpar_wrappers.h index 7dcbf42e9e11..029f019ddfb6 100644 --- a/arch/powerpc/include/asm/plpar_wrappers.h +++ b/arch/powerpc/include/asm/plpar_wrappers.h @@ -88,7 +88,14 @@ static inline long register_dtl(unsigned long cpu, unsigned long vpa) return vpa_call(H_VPA_REG_DTL, cpu, vpa); } +extern void dtl_entry_tracepoint_enable(void); +extern void dtl_entry_tracepoint_disable(void); +extern int register_dtl_buffer_access(int global); +extern void unregister_dtl_buffer_access(int global); +extern void set_dtl_mask(u8 mask); +extern void reset_dtl_mask(void); extern void alloc_dtl_buffers(void); +extern void free_dtl_buffers(void); extern void register_dtl_buffer(int cpu); extern void vpa_init(int cpu); diff --git a/arch/powerpc/include/asm/trace.h b/arch/powerpc/include/asm/trace.h index d018e8602694..bcb8d66d3232 100644 --- a/arch/powerpc/include/asm/trace.h +++ b/arch/powerpc/include/asm/trace.h @@ -101,6 +101,61 @@ TRACE_EVENT_FN_COND(hcall_exit, hcall_tracepoint_regfunc, hcall_tracepoint_unregfunc ); + +#ifdef CONFIG_PPC_SPLPAR +extern int dtl_entry_tracepoint_regfunc(void); +extern void dtl_entry_tracepoint_unregfunc(void); +extern u8 compute_dispatch_distance(unsigned int pcpu); + +TRACE_EVENT_FN(dtl_entry, + + TP_PROTO(u8 dispatch_reason, u8 preempt_reason, u16 processor_id, + u32 enqueue_to_dispatch_time, u32 ready_to_enqueue_time, + u32 waiting_to_ready_time, u64 timebase, u64 fault_addr, + u64 srr0, u64 srr1), + + TP_ARGS(dispatch_reason, preempt_reason, processor_id, + enqueue_to_dispatch_time, ready_to_enqueue_time, +
Re: [PATCH v2 0/2] arm64: Cut rebuild time when changing CONFIG_BLK_DEV_INITRD
On Thu, Oct 25, 2018 at 12:30 PM Mike Rapoport wrote: > > On Thu, Oct 25, 2018 at 08:15:15AM -0500, Rob Herring wrote: > > +Ard > > > > On Thu, Oct 25, 2018 at 4:38 AM Mike Rapoport wrote: > > > > > > On Wed, Oct 24, 2018 at 02:55:17PM -0500, Rob Herring wrote: > > > > On Wed, Oct 24, 2018 at 2:33 PM Florian Fainelli > > > > wrote: > > > > > > > > > > Hi all, > > > > > > > > > > While investigating why ARM64 required a ton of objects to be rebuilt > > > > > when toggling CONFIG_DEV_BLK_INITRD, it became clear that this was > > > > > because we define __early_init_dt_declare_initrd() differently and we > > > > > do > > > > > that in arch/arm64/include/asm/memory.h which gets included by a fair > > > > > amount of other header files, and translation units as well. > > > > > > > > I scratch my head sometimes as to why some config options rebuild so > > > > much stuff. One down, ? to go. :) > > > > > > > > > Changing the value of CONFIG_DEV_BLK_INITRD is a common thing with > > > > > build > > > > > systems that generate two kernels: one with the initramfs and one > > > > > without. buildroot is one of these build systems, OpenWrt is also > > > > > another one that does this. > > > > > > > > > > This patch series proposes adding an empty initrd.h to satisfy the > > > > > need > > > > > for drivers/of/fdt.c to unconditionally include that file, and moves > > > > > the > > > > > custom __early_init_dt_declare_initrd() definition away from > > > > > asm/memory.h > > > > > > > > > > This cuts the number of objects rebuilds from 1920 down to 26, so a > > > > > factor 73 approximately. > > > > > > > > > > Apologies for the long CC list, please let me know how you would go > > > > > about merging that and if another approach would be preferable, e.g: > > > > > introducing a CONFIG_ARCH_INITRD_BELOW_START_OK Kconfig option or > > > > > something like that. > > > > > > > > There may be a better way as of 4.20 because bootmem is now gone and > > > > only memblock is used. This should unify what each arch needs to do > > > > with initrd early. We need the physical address early for memblock > > > > reserving. Then later on we need the virtual address to access the > > > > initrd. Perhaps we should just change initrd_start and initrd_end to > > > > physical addresses (or add 2 new variables would be less invasive and > > > > allow for different translation than __va()). The sanity checks and > > > > memblock reserve could also perhaps be moved to a common location. > > > > > > > > Alternatively, given arm64 is the only oddball, I'd be fine with an > > > > "if (IS_ENABLED(CONFIG_ARM64))" condition in the default > > > > __early_init_dt_declare_initrd as long as we have a path to removing > > > > it like the above option. > > > > > > I think arm64 does not have to redefine __early_init_dt_declare_initrd(). > > > Something like this might be just all we need (completely untested, > > > probably it won't even compile): > > > > > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > > > index 9d9582c..e9ca238 100644 > > > --- a/arch/arm64/mm/init.c > > > +++ b/arch/arm64/mm/init.c > > > @@ -62,6 +62,9 @@ s64 memstart_addr __ro_after_init = -1; > > > phys_addr_t arm64_dma_phys_limit __ro_after_init; > > > > > > #ifdef CONFIG_BLK_DEV_INITRD > > > + > > > +static phys_addr_t initrd_start_phys, initrd_end_phys; > > > + > > > static int __init early_initrd(char *p) > > > { > > > unsigned long start, size; > > > @@ -71,8 +74,8 @@ static int __init early_initrd(char *p) > > > if (*endp == ',') { > > > size = memparse(endp + 1, NULL); > > > > > > - initrd_start = start; > > > - initrd_end = start + size; > > > + initrd_start_phys = start; > > > + initrd_end_phys = end; > > > } > > > return 0; > > > } > > > @@ -407,14 +410,27 @@ void __init arm64_memblock_init(void) > > > memblock_add(__pa_symbol(_text), (u64)(_end - _text)); > > > } > > > > > > - if (IS_ENABLED(CONFIG_BLK_DEV_INITRD) && initrd_start) { > > > + if (IS_ENABLED(CONFIG_BLK_DEV_INITRD) && > > > + (initrd_start || initrd_start_phys)) { > > > + /* > > > +* FIXME: ensure proper precendence between > > > +* early_initrd and DT when both are present > > > > Command line takes precedence, so just reverse the order. > > > > > +*/ > > > + if (initrd_start) { > > > + initrd_start_phys = __phys_to_virt(initrd_start); > > > + initrd_end_phys = __phys_to_virt(initrd_end); BTW, I think you meant virt_to_phys() here? > > > > AIUI, the original issue was doing the P2V translation was happening > > too early and the VA could be wrong if the linear range is adjusted. > > So I don't think this would work. > > Probably things have changed since then, but in the current code there is > > in
Re: [PATCH 3/6] PCI: layerscape: Add the EP mode support
On Thu, Oct 25, 2018 at 07:08:58PM +0800, Xiaowei Bao wrote: > Add the EP mode support. > > Signed-off-by: Xiaowei Bao > --- > .../devicetree/bindings/pci/layerscape-pci.txt |3 +++ > 1 files changed, 3 insertions(+), 0 deletions(-) > > diff --git a/Documentation/devicetree/bindings/pci/layerscape-pci.txt > b/Documentation/devicetree/bindings/pci/layerscape-pci.txt > index 66df1e8..d3d7be1 100644 > --- a/Documentation/devicetree/bindings/pci/layerscape-pci.txt > +++ b/Documentation/devicetree/bindings/pci/layerscape-pci.txt > @@ -13,12 +13,15 @@ information. > > Required properties: > - compatible: should contain the platform identifier such as: > + RC mode: > "fsl,ls1021a-pcie", "snps,dw-pcie" > "fsl,ls2080a-pcie", "fsl,ls2085a-pcie", "snps,dw-pcie" > "fsl,ls2088a-pcie" > "fsl,ls1088a-pcie" > "fsl,ls1046a-pcie" > "fsl,ls1012a-pcie" > + EP mode: > +"fsl,ls-pcie-ep" You need SoC specific compatibles for the same reasons as the RC. Rob
Re: [PATCH] selftests/powerpc: Relax L1d miss targets for rfi_flush test
On Tue, 23 Oct 2018 at 18:35, Naveen N. Rao wrote: > > When running the rfi_flush test, if the system is loaded, we see two > issues: > 1. The L1d misses when rfi_flush is disabled increase significantly due > to other workloads interfering with the cache. > 2. The L1d misses when rfi_flush is enabled sometimes goes slightly > below the expected number of misses. > > To address these, let's relax the expected number of L1d misses: > 1. When rfi_flush is disabled, we allow upto half the expected number of > the misses for when rfi_flush is enabled. > 2. When rfi_flush is enabled, we allow ~1% lower number of cache misses. > > Reported-by: Joel Stanley > Signed-off-by: Naveen N. Rao Thanks, this now passes 10/10 runs on my Romulus machine. A log is attached below. Tested-by: Joel Stanley Cheers, Joel --- for i in `seq 1 10`; do sudo ./rfi_flush; done test: rfi_flush_test tags: git_version:next-20181018-67-g61f7abf00719 PASS (L1D misses with rfi_flush=0: 5013939 < 9500) [10/10 pass] PASS (L1D misses with rfi_flush=1: 195054696 > 19000) [10/10 pass] success: rfi_flush_test test: rfi_flush_test tags: git_version:next-20181018-67-g61f7abf00719 PASS (L1D misses with rfi_flush=0: 11015957 < 9500) [10/10 pass] PASS (L1D misses with rfi_flush=1: 195053292 > 19000) [10/10 pass] success: rfi_flush_test test: rfi_flush_test tags: git_version:next-20181018-67-g61f7abf00719 PASS (L1D misses with rfi_flush=0: 8017248 < 9500) [10/10 pass] PASS (L1D misses with rfi_flush=1: 195145579 > 19000) [10/10 pass] success: rfi_flush_test test: rfi_flush_test tags: git_version:next-20181018-67-g61f7abf00719 PASS (L1D misses with rfi_flush=0: 11015308 < 9500) [10/10 pass] PASS (L1D misses with rfi_flush=1: 195042376 > 19000) [10/10 pass] success: rfi_flush_test test: rfi_flush_test tags: git_version:next-20181018-67-g61f7abf00719 PASS (L1D misses with rfi_flush=0: 1021356 < 9500) [10/10 pass] PASS (L1D misses with rfi_flush=1: 195031624 > 19000) [10/10 pass] success: rfi_flush_test test: rfi_flush_test tags: git_version:next-20181018-67-g61f7abf00719 PASS (L1D misses with rfi_flush=0: 6015342 < 9500) [10/10 pass] PASS (L1D misses with rfi_flush=1: 195037322 > 19000) [10/10 pass] success: rfi_flush_test test: rfi_flush_test tags: git_version:next-20181018-67-g61f7abf00719 PASS (L1D misses with rfi_flush=0: 16635 < 9500) [10/10 pass] PASS (L1D misses with rfi_flush=1: 195032476 > 19000) [10/10 pass] success: rfi_flush_test test: rfi_flush_test tags: git_version:next-20181018-67-g61f7abf00719 PASS (L1D misses with rfi_flush=0: 6013599 < 9500) [10/10 pass] PASS (L1D misses with rfi_flush=1: 195060037 > 19000) [10/10 pass] success: rfi_flush_test test: rfi_flush_test tags: git_version:next-20181018-67-g61f7abf00719 PASS (L1D misses with rfi_flush=0: 25236 < 9500) [10/10 pass] PASS (L1D misses with rfi_flush=1: 195052859 > 19000) [10/10 pass] success: rfi_flush_test test: rfi_flush_test tags: git_version:next-20181018-67-g61f7abf00719 PASS (L1D misses with rfi_flush=0: 18120 < 9500) [10/10 pass] PASS (L1D misses with rfi_flush=1: 195014212 > 19000) [10/10 pass] success: rfi_flush_test
Re: [PATCH v1 3/5] powerpc/pseries: Fix stolen time accounting when dtl debugfs is used
On Fri, Oct 26, 2018 at 01:55:44AM +0530, Naveen N. Rao wrote: > When the dtl debugfs interface is used, we usually set the > dtl_enable_mask to 0x7 (DTL_LOG_ALL). When this happens, we start seeing > DTL entries for all preempt reasons, including CEDE. In > scan_dispatch_log(), we add up the times from all entries and account > those towards stolen time. However, we should only be accounting stolen > time when the preemption was due to HDEC at the end of our time slice. It's always been the case that stolen time when idle has been accounted as idle time, not stolen time. That's why we didn't check for this in the past. Do you have a test that shows different results (as in reported idle and stolen times) with this patch compared to without? Paul.
Re: [PATCH] seccomp: Add pkru into seccomp_data
On Thu, Oct 25, 2018 at 9:42 AM Michael Sammler wrote: > > On 10/25/2018 11:12 AM, Florian Weimer wrote: > >> I understand your concern about exposing the number of protection keys > >> in the ABI. One idea would be to state, that the pkru field (which > >> should probably be renamed) contains an architecture specific value, > >> which could then be the PKRU on x86 and AMR (or another register) on > >> POWER. This new field should probably be extended to __u64 and the > >> reserved field removed. > > POWER also has proper read/write bit separation, not PKEY_DISABLE_ACCESS > > (disable read and write) and PKEY_DISABLE_WRITE like Intel. It's > > currently translated by the kernel, but I really need a > > PKEY_DISABLE_READ bit in glibc to implement pkey_get in case the memory > > is write-only. > The idea here would be to simply provide the raw value of the register > (PKRU on x86, AMR on POWER) to the BPF program and let the BPF program > (or maybe a higher level library like libseccomp) deal with the > complications of interpreting this architecture specific value (similar > how the BPF program currently already has to deal with architecture > specific system call numbers). If an architecture were to support more > protection keys than fit into the field, the architecture specific value > stored in the field might simply be the first protection keys. If there > was interest, it would be possible to add more architecture specific > fields to seccomp_data. > >> Another idea would be to not add a field in the seccomp_data > >> structure, but instead provide a new BPF instruction, which reads the > >> value of a specified protection key. > > I would prefer that if it's possible. We should make sure that the bits > > are the same as those returned from pkey_get. I have an implementation > > on POWER, but have yet to figure out the implications for 32-bit because > > I do not know the AMR register size there. > > > > Thanks, > > Florian > I have had a look at how BPF is implemented and it does not seem to be > easy to just add an BPF instruction for seccomp since (as far as I > understand) the code of the classical BPF (as used by seccomp) is shared > with the code of eBPF, which is used in many parts of the kernel and > there is at least one interpreter and one JIT compiler for BPF. But > maybe someone with more experience than me can comment on how hard it > would be to add an instruction to BPF. > You could bite the bullet and add seccomp eBPF support :)
Re: [PATCH v2 0/2] arm64: Cut rebuild time when changing CONFIG_BLK_DEV_INITRD
On 10/25/18 2:13 PM, Rob Herring wrote: > On Thu, Oct 25, 2018 at 12:30 PM Mike Rapoport wrote: >> >> On Thu, Oct 25, 2018 at 08:15:15AM -0500, Rob Herring wrote: >>> +Ard >>> >>> On Thu, Oct 25, 2018 at 4:38 AM Mike Rapoport wrote: On Wed, Oct 24, 2018 at 02:55:17PM -0500, Rob Herring wrote: > On Wed, Oct 24, 2018 at 2:33 PM Florian Fainelli > wrote: >> >> Hi all, >> >> While investigating why ARM64 required a ton of objects to be rebuilt >> when toggling CONFIG_DEV_BLK_INITRD, it became clear that this was >> because we define __early_init_dt_declare_initrd() differently and we do >> that in arch/arm64/include/asm/memory.h which gets included by a fair >> amount of other header files, and translation units as well. > > I scratch my head sometimes as to why some config options rebuild so > much stuff. One down, ? to go. :) > >> Changing the value of CONFIG_DEV_BLK_INITRD is a common thing with build >> systems that generate two kernels: one with the initramfs and one >> without. buildroot is one of these build systems, OpenWrt is also >> another one that does this. >> >> This patch series proposes adding an empty initrd.h to satisfy the need >> for drivers/of/fdt.c to unconditionally include that file, and moves the >> custom __early_init_dt_declare_initrd() definition away from >> asm/memory.h >> >> This cuts the number of objects rebuilds from 1920 down to 26, so a >> factor 73 approximately. >> >> Apologies for the long CC list, please let me know how you would go >> about merging that and if another approach would be preferable, e.g: >> introducing a CONFIG_ARCH_INITRD_BELOW_START_OK Kconfig option or >> something like that. > > There may be a better way as of 4.20 because bootmem is now gone and > only memblock is used. This should unify what each arch needs to do > with initrd early. We need the physical address early for memblock > reserving. Then later on we need the virtual address to access the > initrd. Perhaps we should just change initrd_start and initrd_end to > physical addresses (or add 2 new variables would be less invasive and > allow for different translation than __va()). The sanity checks and > memblock reserve could also perhaps be moved to a common location. > > Alternatively, given arm64 is the only oddball, I'd be fine with an > "if (IS_ENABLED(CONFIG_ARM64))" condition in the default > __early_init_dt_declare_initrd as long as we have a path to removing > it like the above option. I think arm64 does not have to redefine __early_init_dt_declare_initrd(). Something like this might be just all we need (completely untested, probably it won't even compile): diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 9d9582c..e9ca238 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -62,6 +62,9 @@ s64 memstart_addr __ro_after_init = -1; phys_addr_t arm64_dma_phys_limit __ro_after_init; #ifdef CONFIG_BLK_DEV_INITRD + +static phys_addr_t initrd_start_phys, initrd_end_phys; + static int __init early_initrd(char *p) { unsigned long start, size; @@ -71,8 +74,8 @@ static int __init early_initrd(char *p) if (*endp == ',') { size = memparse(endp + 1, NULL); - initrd_start = start; - initrd_end = start + size; + initrd_start_phys = start; + initrd_end_phys = end; } return 0; } @@ -407,14 +410,27 @@ void __init arm64_memblock_init(void) memblock_add(__pa_symbol(_text), (u64)(_end - _text)); } - if (IS_ENABLED(CONFIG_BLK_DEV_INITRD) && initrd_start) { + if (IS_ENABLED(CONFIG_BLK_DEV_INITRD) && + (initrd_start || initrd_start_phys)) { + /* +* FIXME: ensure proper precendence between +* early_initrd and DT when both are present >>> >>> Command line takes precedence, so just reverse the order. >>> +*/ + if (initrd_start) { + initrd_start_phys = __phys_to_virt(initrd_start); + initrd_end_phys = __phys_to_virt(initrd_end); > > BTW, I think you meant virt_to_phys() here? > >>> >>> AIUI, the original issue was doing the P2V translation was happening >>> too early and the VA could be wrong if the linear range is adjusted. >>> So I don't think this would work. >> >> Probably things have changed since then, but in the current code there is >> >> initrd_start = __phys_to_virt(initrd_start); >> >> and in between only the code related to CONFIG_RANDOMIZE_BASE, so I believe >> it's safe to use __phys
Re: [PATCH] seccomp: Add pkru into seccomp_data
On Fri, Oct 26, 2018 at 12:00 AM, Andy Lutomirski wrote: > You could bite the bullet and add seccomp eBPF support :) I'm not convinced this is a good enough reason for gaining the eBPF attack surface yet. -Kees -- Kees Cook
Re: [PATCH] seccomp: Add pkru into seccomp_data
> On Oct 25, 2018, at 5:35 PM, Kees Cook wrote: > >> On Fri, Oct 26, 2018 at 12:00 AM, Andy Lutomirski >> wrote: >> You could bite the bullet and add seccomp eBPF support :) > > I'm not convinced this is a good enough reason for gaining the eBPF > attack surface yet. > > Is it an interesting attack surface? It’s certainly scarier if you’re worried about attacks from the sandbox creator, but the security inside the sandbox should be more or less equivalent, no?
RE: [PATCH 3/6] PCI: layerscape: Add the EP mode support
-Original Message- From: Rob Herring Sent: 2018年10月26日 5:53 To: Xiaowei Bao Cc: bhelg...@google.com; mark.rutl...@arm.com; shawn...@kernel.org; Leo Li ; kis...@ti.com; lorenzo.pieral...@arm.com; a...@arndb.de; gre...@linuxfoundation.org; M.h. Lian ; Mingkai Hu ; Roy Zang ; kstew...@linuxfoundation.org; cyrille.pitc...@free-electrons.com; pombreda...@nexb.com; shawn@rock-chips.com; niklas.cas...@axis.com; linux-...@vger.kernel.org; devicet...@vger.kernel.org; linux-ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH 3/6] PCI: layerscape: Add the EP mode support On Thu, Oct 25, 2018 at 07:08:58PM +0800, Xiaowei Bao wrote: > Add the EP mode support. > > Signed-off-by: Xiaowei Bao > --- > .../devicetree/bindings/pci/layerscape-pci.txt |3 +++ > 1 files changed, 3 insertions(+), 0 deletions(-) > > diff --git a/Documentation/devicetree/bindings/pci/layerscape-pci.txt > b/Documentation/devicetree/bindings/pci/layerscape-pci.txt > index 66df1e8..d3d7be1 100644 > --- a/Documentation/devicetree/bindings/pci/layerscape-pci.txt > +++ b/Documentation/devicetree/bindings/pci/layerscape-pci.txt > @@ -13,12 +13,15 @@ information. > > Required properties: > - compatible: should contain the platform identifier such as: > + RC mode: > "fsl,ls1021a-pcie", "snps,dw-pcie" > "fsl,ls2080a-pcie", "fsl,ls2085a-pcie", "snps,dw-pcie" > "fsl,ls2088a-pcie" > "fsl,ls1088a-pcie" > "fsl,ls1046a-pcie" > "fsl,ls1012a-pcie" > + EP mode: > +"fsl,ls-pcie-ep" You need SoC specific compatibles for the same reasons as the RC. [Xiaowei Bao] I want to contains all layerscape platform use one compatible if the PCIe controller work in EP mode. Rob
Re: [PATCH 5/6] pci: layerscape: Add the EP mode support.
Hi, On Thursday 25 October 2018 04:39 PM, Xiaowei Bao wrote: > Add the PCIe EP mode support for layerscape platform. > > Signed-off-by: Xiaowei Bao > --- > drivers/pci/controller/dwc/Makefile|2 +- > drivers/pci/controller/dwc/pci-layerscape-ep.c | 161 > > 2 files changed, 162 insertions(+), 1 deletions(-) > create mode 100644 drivers/pci/controller/dwc/pci-layerscape-ep.c > > diff --git a/drivers/pci/controller/dwc/Makefile > b/drivers/pci/controller/dwc/Makefile > index 5d2ce72..b26d617 100644 > --- a/drivers/pci/controller/dwc/Makefile > +++ b/drivers/pci/controller/dwc/Makefile > @@ -8,7 +8,7 @@ obj-$(CONFIG_PCI_EXYNOS) += pci-exynos.o > obj-$(CONFIG_PCI_IMX6) += pci-imx6.o > obj-$(CONFIG_PCIE_SPEAR13XX) += pcie-spear13xx.o > obj-$(CONFIG_PCI_KEYSTONE) += pci-keystone-dw.o pci-keystone.o > -obj-$(CONFIG_PCI_LAYERSCAPE) += pci-layerscape.o > +obj-$(CONFIG_PCI_LAYERSCAPE) += pci-layerscape.o pci-layerscape-ep.o > obj-$(CONFIG_PCIE_QCOM) += pcie-qcom.o > obj-$(CONFIG_PCIE_ARMADA_8K) += pcie-armada8k.o > obj-$(CONFIG_PCIE_ARTPEC6) += pcie-artpec6.o > diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c > b/drivers/pci/controller/dwc/pci-layerscape-ep.c > new file mode 100644 > index 000..3b33bbc > --- /dev/null > +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c > @@ -0,0 +1,161 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * PCIe controller EP driver for Freescale Layerscape SoCs > + * > + * Copyright (C) 2018 NXP Semiconductor. > + * > + * Author: Xiaowei Bao > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "pcie-designware.h" > + > +#define PCIE_DBI2_OFFSET 0x1000 /* DBI2 base address*/ The base address should come from dt. > + > +struct ls_pcie_ep { > + struct dw_pcie *pci; > +}; > + > +#define to_ls_pcie_ep(x) dev_get_drvdata((x)->dev) > + > +static bool ls_pcie_is_bridge(struct ls_pcie_ep *pcie) > +{ > + struct dw_pcie *pci = pcie->pci; > + u32 header_type; > + > + header_type = ioread8(pci->dbi_base + PCI_HEADER_TYPE); > + header_type &= 0x7f; > + > + return header_type == PCI_HEADER_TYPE_BRIDGE; > +} > + > +static int ls_pcie_establish_link(struct dw_pcie *pci) > +{ > + return 0; > +} There should be some way by which EP should tell RC that it is not configured yet. Are there no bits to control LTSSM state initialization or Configuration retry status enabling? > + > +static const struct dw_pcie_ops ls_pcie_ep_ops = { > + .start_link = ls_pcie_establish_link, > +}; > + > +static const struct of_device_id ls_pcie_ep_of_match[] = { > + { .compatible = "fsl,ls-pcie-ep",}, > + { }, > +}; > + > +static void ls_pcie_ep_init(struct dw_pcie_ep *ep) > +{ > + struct dw_pcie *pci = to_dw_pcie_from_ep(ep); > + struct pci_epc *epc = ep->epc; > + enum pci_barno bar; > + > + for (bar = BAR_0; bar <= BAR_5; bar++) > + dw_pcie_ep_reset_bar(pci, bar); > + > + epc->features |= EPC_FEATURE_NO_LINKUP_NOTIFIER; > +} > + > +static int ls_pcie_ep_raise_irq(struct dw_pcie_ep *ep, u8 func_no, > + enum pci_epc_irq_type type, u16 interrupt_num) > +{ > + struct dw_pcie *pci = to_dw_pcie_from_ep(ep); > + > + switch (type) { > + case PCI_EPC_IRQ_LEGACY: > + return dw_pcie_ep_raise_legacy_irq(ep, func_no); > + case PCI_EPC_IRQ_MSI: > + return dw_pcie_ep_raise_msi_irq(ep, func_no, interrupt_num); > + case PCI_EPC_IRQ_MSIX: > + return dw_pcie_ep_raise_msix_irq(ep, func_no, interrupt_num); > + default: > + dev_err(pci->dev, "UNKNOWN IRQ type\n"); > + } > + > + return 0; > +} > + > +static struct dw_pcie_ep_ops pcie_ep_ops = { > + .ep_init = ls_pcie_ep_init, > + .raise_irq = ls_pcie_ep_raise_irq, > +}; > + > +static int __init ls_add_pcie_ep(struct ls_pcie_ep *pcie, > + struct platform_device *pdev) > +{ > + struct dw_pcie *pci = pcie->pci; > + struct device *dev = pci->dev; > + struct dw_pcie_ep *ep; > + struct resource *res; > + int ret; > + > + ep = &pci->ep; > + ep->ops = &pcie_ep_ops; > + > + res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "addr_space"); > + if (!res) > + return -EINVAL; > + > + ep->phys_base = res->start; > + ep->addr_size = resource_size(res); > + > + ret = dw_pcie_ep_init(ep); > + if (ret) { > + dev_err(dev, "failed to initialize endpoint\n"); > + return ret; > + } > + > + return 0; > +} > + > +static int __init ls_pcie_ep_probe(struct platform_device *pdev) > +{ > + struct device *dev = &pdev->dev; > + struct dw_pcie *pci; > + struct ls_pcie_ep *pcie; > + struct resource *dbi_base; > + int ret; > + > + pcie = devm_kzalloc(dev, sizeof(*pcie), GFP_KERNEL); > + if (!pcie) >
Re: [PATCH] seccomp: Add pkru into seccomp_data
On Thu, Oct 25, 2018 at 11:12:25AM +0200, Florian Weimer wrote: > * Michael Sammler: > > > Thank you for the pointer about the POWER implementation. I am not > > familiar with POWER in general and its protection key feature at > > all. Would the AMR register be the correct register to expose here? > > Yes, according to my notes, the register is called AMR (special purpose > register 13). Yes. it is AMR register. RP
[PATCH 1/5] powerpc/64s: Guarded Userspace Access Prevention
Guarded Userspace Access Prevention (GUAP) utilises a feature of the Radix MMU which disallows read and write access to userspace addresses. By utilising this, the kernel is prevented from accessing user data from outside of trusted paths that perform proper safety checks, such as copy_{to/from}_user() and friends. Userspace access is disabled from early boot and is only enabled when: - exiting the kernel and entering userspace - performing an operation like copy_{to/from}_user() - context switching to a process that has access enabled and similarly, access is disabled again when exiting userspace and entering the kernel. This feature has a slight performance impact which I roughly measured to be 3% slower in the worst case (performing 1GB of 1 byte read()/write() syscalls), and is gated behind the CONFIG_PPC_RADIX_GUAP option for performance-critical builds. This feature can be tested by using the lkdtm driver (CONFIG_LKDTM=y) and performing the following: echo ACCESS_USERSPACE > [debugfs]/provoke-crash/DIRECT if enabled, this should send SIGSEGV to the thread. Signed-off-by: Russell Currey --- Since the previous version of this patchset (named KHRAP) there have been several changes, some of which include: - macro naming, suggested by Nick - builds should be fixed outside of 64s - no longer unlock heading out to userspace - removal of unnecessary isyncs - more config option testing - removal of save/restore - use pr_crit() and reword message on fault arch/powerpc/include/asm/exception-64e.h | 3 ++ arch/powerpc/include/asm/exception-64s.h | 19 +++- arch/powerpc/include/asm/mmu.h | 7 +++ arch/powerpc/include/asm/paca.h | 3 ++ arch/powerpc/include/asm/reg.h | 1 + arch/powerpc/include/asm/uaccess.h | 57 arch/powerpc/kernel/asm-offsets.c| 1 + arch/powerpc/kernel/dt_cpu_ftrs.c| 4 ++ arch/powerpc/kernel/entry_64.S | 17 ++- arch/powerpc/mm/fault.c | 12 + arch/powerpc/mm/pgtable-radix.c | 2 + arch/powerpc/mm/pkeys.c | 7 ++- arch/powerpc/platforms/Kconfig.cputype | 15 +++ 13 files changed, 135 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64e.h b/arch/powerpc/include/asm/exception-64e.h index 555e22d5e07f..bf25015834ee 100644 --- a/arch/powerpc/include/asm/exception-64e.h +++ b/arch/powerpc/include/asm/exception-64e.h @@ -215,5 +215,8 @@ exc_##label##_book3e: #define RFI_TO_USER\ rfi +#define UNLOCK_USER_ACCESS(reg) +#define LOCK_USER_ACCESS(reg) + #endif /* _ASM_POWERPC_EXCEPTION_64E_H */ diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index 3b4767ed3ec5..0cac5bd380ca 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -264,6 +264,19 @@ BEGIN_FTR_SECTION_NESTED(943) \ std ra,offset(r13); \ END_FTR_SECTION_NESTED(ftr,ftr,943) +#define LOCK_USER_ACCESS(reg) \ +BEGIN_MMU_FTR_SECTION_NESTED(944) \ + LOAD_REG_IMMEDIATE(reg,AMR_LOCKED); \ + mtspr SPRN_AMR,reg; \ +END_MMU_FTR_SECTION_NESTED(MMU_FTR_RADIX_GUAP,MMU_FTR_RADIX_GUAP,944) + +#define UNLOCK_USER_ACCESS(reg) \ +BEGIN_MMU_FTR_SECTION_NESTED(945) \ + li reg,0; \ + mtspr SPRN_AMR,reg; \ + isync \ +END_MMU_FTR_SECTION_NESTED(MMU_FTR_RADIX_GUAP,MMU_FTR_RADIX_GUAP,945) + #define EXCEPTION_PROLOG_0(area) \ GET_PACA(r13); \ std r9,area+EX_R9(r13); /* save r9 */ \ @@ -500,7 +513,11 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) beq 4f; /* if from kernel mode */ \ ACCOUNT_CPU_USER_ENTRY(r13, r9, r10); \ SAVE_PPR(area, r9);\ -4: EXCEPTION_PROLOG_COMMON_2(area)\ +4: lbz r9,PACA_USER_ACCESS_ALLOWED(r13); \ + cmpwi cr1,r9,0; \ + beq 5f;\ + LOCK_USER_ACCESS(r9); \ +5: EXCEPTION_
[PATCH 2/5] powerpc/futex: GUAP support for futex ops
Wrap the futex operations in GUAP locks and unlocks. Signed-off-by: Russell Currey --- arch/powerpc/include/asm/futex.h | 4 1 file changed, 4 insertions(+) diff --git a/arch/powerpc/include/asm/futex.h b/arch/powerpc/include/asm/futex.h index 94542776a62d..3aed640ee9ef 100644 --- a/arch/powerpc/include/asm/futex.h +++ b/arch/powerpc/include/asm/futex.h @@ -35,6 +35,7 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, { int oldval = 0, ret; + unlock_user_access(); pagefault_disable(); switch (op) { @@ -62,6 +63,7 @@ static inline int arch_futex_atomic_op_inuser(int op, int oparg, int *oval, if (!ret) *oval = oldval; + lock_user_access(); return ret; } @@ -75,6 +77,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32))) return -EFAULT; + unlock_user_access(); __asm__ __volatile__ ( PPC_ATOMIC_ENTRY_BARRIER "1: lwarx %1,0,%3 # futex_atomic_cmpxchg_inatomic\n\ @@ -95,6 +98,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, : "cc", "memory"); *uval = prev; + lock_user_access(); return ret; } -- 2.19.1
[PATCH 0/5] Guarded Userspace Access Prevention on Radix
Guarded Userspace Access Prevention is a security mechanism that prevents the kernel from being able to read and write userspace addresses outside of the allowed paths, most commonly copy_{to/from}_user(). At present, the only CPU that supports this is POWER9, and only while using the Radix MMU. Privileged reads and writes cannot access user data when key 0 of the AMR is set. This is described in the "Radix Tree Translation Storage Protection" section of the POWER ISA as of version 3.0. GUAP code sets key 0 of the AMR (thus disabling accesses of user data) early during boot, and only ever "unlocks" access prior to certain operations, like copy_{to/from}_user(), futex ops, etc. Setting this does not prevent unprivileged access, so userspace can operate fine while access is locked. There is a performance impact, although I don't consider it heavy. Running a worst-case benchmark of a 1GB copy 1 byte at a time (and thus constant read(1) write(1) syscalls), I found enabling GUAP to be 3.5% slower than when disabled. In most cases, the difference is negligible. The main performance impact is the mtspr instruction, which is quite slow. There are a few caveats with this series that could be improved upon in future. Right now there is no saving and restoring of the AMR value - there is no userspace exploitation of the AMR on Radix in POWER9, but if this were to change in future, saving and restoring the value would be necessary. No attempt to optimise cases of repeated calls - for example, if some code was repeatedly calling copy_to_user() for small sizes very frequently, it would be slower than the equivalent of wrapping that code in an unlock and lock and only having to modify the AMR once. There are some interesting cases that I've attempted to handle, such as if the AMR is unlocked (i.e. because a copy_{to_from}_user is in progress)... - and an exception is taken, the kernel would then be running with the AMR unlocked and freely able to access userspace again. I am working around this by storing a flag in the PACA to indicate if the AMR is unlocked (to save a costly SPR read), and if so, locking the AMR in the exception entry path and unlocking it on the way out. - and gets context switched out, goes into a path that locks the AMR, then context switches back, access will be disabled and will fault. As a result, I context switch the AMR between tasks as if it was used by userspace like hash (which already implements this). Another consideration is use of the isync instruction. Without an isync following the mtspr instruction, there is no guarantee that the change takes effect. The issue is that isync is very slow, and so I tried to avoid them wherever necessary. In this series, the only place an isync gets used is after *unlocking* the AMR, because if an access takes place and access is still prevented, the kernel will fault. On the flipside, a slight delay in unlocking caused by skipping an isync potentially allows a small window of vulnerability. It is my opinion that this window is practically impossible to exploit, but if someone thinks otherwise, please do share. This series is my first attempt at POWER assembly so all feedback is very welcome. The official theme song of this series can be found here: https://www.youtube.com/watch?v=QjTrnKAcYjE Russell Currey (5): powerpc/64s: Guarded Userspace Access Prevention powerpc/futex: GUAP support for futex ops powerpc/lib: checksum GUAP support powerpc/64s: Disable GUAP with nosmap option powerpc/64s: Document that PPC supports nosmap .../admin-guide/kernel-parameters.txt | 2 +- arch/powerpc/include/asm/exception-64e.h | 3 + arch/powerpc/include/asm/exception-64s.h | 19 ++- arch/powerpc/include/asm/futex.h | 6 ++ arch/powerpc/include/asm/mmu.h| 7 +++ arch/powerpc/include/asm/paca.h | 3 + arch/powerpc/include/asm/reg.h| 1 + arch/powerpc/include/asm/uaccess.h| 57 --- arch/powerpc/kernel/asm-offsets.c | 1 + arch/powerpc/kernel/dt_cpu_ftrs.c | 4 ++ arch/powerpc/kernel/entry_64.S| 17 +- arch/powerpc/lib/checksum_wrappers.c | 6 +- arch/powerpc/mm/fault.c | 9 +++ arch/powerpc/mm/init_64.c | 15 + arch/powerpc/mm/pgtable-radix.c | 2 + arch/powerpc/mm/pkeys.c | 7 ++- arch/powerpc/platforms/Kconfig.cputype| 15 + 17 files changed, 158 insertions(+), 16 deletions(-) -- 2.19.1
[PATCH 3/5] powerpc/lib: checksum GUAP support
Wrap the checksumming code in GUAP locks and unlocks. Signed-off-by: Russell Currey --- arch/powerpc/lib/checksum_wrappers.c | 4 1 file changed, 4 insertions(+) diff --git a/arch/powerpc/lib/checksum_wrappers.c b/arch/powerpc/lib/checksum_wrappers.c index a0cb63fb76a1..c67db0a6e18b 100644 --- a/arch/powerpc/lib/checksum_wrappers.c +++ b/arch/powerpc/lib/checksum_wrappers.c @@ -28,6 +28,7 @@ __wsum csum_and_copy_from_user(const void __user *src, void *dst, { unsigned int csum; + unlock_user_access(); might_sleep(); *err_ptr = 0; @@ -60,6 +61,7 @@ __wsum csum_and_copy_from_user(const void __user *src, void *dst, } out: + lock_user_access(); return (__force __wsum)csum; } EXPORT_SYMBOL(csum_and_copy_from_user); @@ -69,6 +71,7 @@ __wsum csum_and_copy_to_user(const void *src, void __user *dst, int len, { unsigned int csum; + unlock_user_access(); might_sleep(); *err_ptr = 0; @@ -97,6 +100,7 @@ __wsum csum_and_copy_to_user(const void *src, void __user *dst, int len, } out: + lock_user_access(); return (__force __wsum)csum; } EXPORT_SYMBOL(csum_and_copy_to_user); -- 2.19.1
[PATCH 4/5] powerpc/64s: Disable GUAP with nosmap option
GUAP is similar to SMAP on x86 platforms, so implement support for the same kernel parameter. Signed-off-by: Russell Currey --- arch/powerpc/mm/init_64.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c index 7a9886f98b0c..b26641df36f2 100644 --- a/arch/powerpc/mm/init_64.c +++ b/arch/powerpc/mm/init_64.c @@ -312,6 +312,7 @@ void register_page_bootmem_memmap(unsigned long section_nr, #ifdef CONFIG_PPC_BOOK3S_64 static bool disable_radix = !IS_ENABLED(CONFIG_PPC_RADIX_MMU_DEFAULT); +static bool disable_guap = !IS_ENABLED(CONFIG_PPC_RADIX_GUAP); static int __init parse_disable_radix(char *p) { @@ -328,6 +329,18 @@ static int __init parse_disable_radix(char *p) } early_param("disable_radix", parse_disable_radix); +static int __init parse_nosmap(char *p) +{ + /* +* nosmap is an existing option on x86 where it doesn't return -EINVAL +* if the parameter is set to something, so even though it's different +* to disable_radix, don't return an error for compatibility. +*/ + disable_guap = true; + return 0; +} +early_param("nosmap", parse_nosmap); + /* * If we're running under a hypervisor, we need to check the contents of * /chosen/ibm,architecture-vec-5 to see if the hypervisor is willing to do @@ -381,6 +394,8 @@ void __init mmu_early_init_devtree(void) /* Disable radix mode based on kernel command line. */ if (disable_radix) cur_cpu_spec->mmu_features &= ~MMU_FTR_TYPE_RADIX; + if (disable_radix || disable_guap) + cur_cpu_spec->mmu_features &= ~MMU_FTR_RADIX_GUAP; /* * Check /chosen/ibm,architecture-vec-5 if running as a guest. -- 2.19.1
[PATCH 5/5] powerpc/64s: Document that PPC supports nosmap
Signed-off-by: Russell Currey --- Documentation/admin-guide/kernel-parameters.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index a5ad67d5cb16..8f78e75965f0 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2764,7 +2764,7 @@ noexec=on: enable non-executable mappings (default) noexec=off: disable non-executable mappings - nosmap [X86] + nosmap [X86,PPC] Disable SMAP (Supervisor Mode Access Prevention) even if it is supported by processor. -- 2.19.1