[PATCH v6 0/6] arm64: dts: NXP: add basic dts file for LX2160A SoC

2018-10-25 Thread Vabhav Sharma
Changes for v6:
- Added comment for clock unit-sysclk node name in SoC device tree 

Changes for v5:
- Updated temperature sensor regulator name in board device tree
- Sorted nodes alphabatically and unit-address in SoC/board device tree
- Identation, new line update in SoC/board device tree
- Updated nodes name as per DT spec generic name recommendation in SoC DT
- Updated macro define for interrupt/gpio property
- Updated i2c node property name scl-gpio
- Removed device_type property except cpu/memory node
- Added esdhc controller nodes in SoC/RDB board device tree
- Added aliases for uart/crypto nodes
- Add SoC die attribute definition for LX2160A

Changes for v4:
- Updated bindings for lx2160a clockgen and dcfg
- Modified commit message for lx2160a clockgen changes
- Updated interrupt property with macro definition
- Added required enable-method property to each core node with psci value
- Removed unused node syscon in device tree
- Removed blank lines in device tree fsl-lx2160a.dtsi
- Updated uart node compatible sbsa-uart first
- Added and defined vcc-supply property to temperature sensor node in
  device tree fsl-lx2160a-rdb.dts

Changes for v3:
-Split clockgen support patch into below two patches:
- a)Updated array size of cmux_to_group[] with NUM_CMUX+1 to include -1
 terminator and p4080 cmux_to_group[] array with -1 terminator
- b)Add clockgen support for lx2160a

Changes for v2:
- Modified cmux_to_group array to include -1 terminator
- Revert NUM_CMUX to original value 8 from 16
- Remove â LX2160A is 16 core, so modified value for NUM_CMUXX
  in patch "[PATCH 3/5] drivers: clk-qoriq: Add clockgen support for
  lx2160a" description
- Populated cache properties for L1 and L2 cache in lx2160a device-tree.
- Removed reboot node from lx2160a device-tree as PSCI is implemented.
- Removed incorrect comment for timer node interrupt property in
  lx2160a device-tree.
- Modified pmu node compatible property from "arm,armv8-pmuv3" to
  "arm,cortex-a72-pmu" in lx2160a device-tree
- Non-standard aliases removed in lx2160a rdb board device-tree
- Updated i2c child nodes to generic name in lx2160a rdb device-tree.

Changes for v1:
- Add compatible string for LX2160A clockgen support
- Add compatible string to initialize LX2160A guts driver
- Add compatible string for LX2160A support in dt-bindings
- Add dts file to enable support for LX2160A SoC and LX2160A RDB
  (Reference design board)

Vabhav Sharma (4):
  dt-bindings: arm64: add compatible for LX2160A
  soc/fsl/guts: Add definition for LX2160A
  arm64: dts: add QorIQ LX2160A SoC support
  arm64: dts: add LX2160ARDB board support

Yogesh Gaur (2):
  clk: qoriq: increase array size of cmux_to_group
  clk: qoriq: Add clockgen support for lx2160a

 Documentation/devicetree/bindings/arm/fsl.txt  |  14 +-
 .../devicetree/bindings/clock/qoriq-clock.txt  |   1 +
 arch/arm64/boot/dts/freescale/Makefile |   1 +
 arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts  | 119 
 arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi | 766 +
 drivers/clk/clk-qoriq.c|  16 +-
 drivers/cpufreq/qoriq-cpufreq.c|   1 +
 drivers/soc/fsl/guts.c |   6 +
 8 files changed, 921 insertions(+), 3 deletions(-)
 create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts
 create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi

-- 
2.7.4



[PATCH v6 1/6] dt-bindings: arm64: add compatible for LX2160A

2018-10-25 Thread Vabhav Sharma
Add compatible for LX2160A SoC,QDS and RDB board
Add lx2160a compatible for clockgen and dcfg

Signed-off-by: Vabhav Sharma 
Reviewed-by: Rob Herring 
---
 Documentation/devicetree/bindings/arm/fsl.txt   | 14 +-
 Documentation/devicetree/bindings/clock/qoriq-clock.txt |  1 +
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/arm/fsl.txt 
b/Documentation/devicetree/bindings/arm/fsl.txt
index 8a1baa2..71adce2 100644
--- a/Documentation/devicetree/bindings/arm/fsl.txt
+++ b/Documentation/devicetree/bindings/arm/fsl.txt
@@ -130,7 +130,7 @@ core start address and release the secondary core from 
holdoff and startup.
   - compatible: Should contain a chip-specific compatible string,
Chip-specific strings are of the form "fsl,-dcfg",
The following s are known to be supported:
-   ls1012a, ls1021a, ls1043a, ls1046a, ls2080a.
+   ls1012a, ls1021a, ls1043a, ls1046a, ls2080a, lx2160a.
 
   - reg : should contain base address and length of DCFG memory-mapped 
registers
 
@@ -222,3 +222,15 @@ Required root node properties:
 LS2088A ARMv8 based RDB Board
 Required root node properties:
 - compatible = "fsl,ls2088a-rdb", "fsl,ls2088a";
+
+LX2160A SoC
+Required root node properties:
+- compatible = "fsl,lx2160a";
+
+LX2160A ARMv8 based QDS Board
+Required root node properties:
+- compatible = "fsl,lx2160a-qds", "fsl,lx2160a";
+
+LX2160A ARMv8 based RDB Board
+Required root node properties:
+- compatible = "fsl,lx2160a-rdb", "fsl,lx2160a";
diff --git a/Documentation/devicetree/bindings/clock/qoriq-clock.txt 
b/Documentation/devicetree/bindings/clock/qoriq-clock.txt
index 97f46ad..3fb9995 100644
--- a/Documentation/devicetree/bindings/clock/qoriq-clock.txt
+++ b/Documentation/devicetree/bindings/clock/qoriq-clock.txt
@@ -37,6 +37,7 @@ Required properties:
* "fsl,ls1046a-clockgen"
* "fsl,ls1088a-clockgen"
* "fsl,ls2080a-clockgen"
+   * "fsl,lx2160a-clockgen"
Chassis-version clock strings include:
* "fsl,qoriq-clockgen-1.0": for chassis 1.0 clocks
* "fsl,qoriq-clockgen-2.0": for chassis 2.0 clocks
-- 
2.7.4



[PATCH v6 2/6] soc/fsl/guts: Add definition for LX2160A

2018-10-25 Thread Vabhav Sharma
Adding compatible string "lx2160a-dcfg" to
initialize guts driver for lx2160 and SoC die
attribute definition for LX2160A

Signed-off-by: Vabhav Sharma 
Signed-off-by: Yinbo Zhu 
Acked-by: Li Yang 
---
 drivers/soc/fsl/guts.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/soc/fsl/guts.c b/drivers/soc/fsl/guts.c
index 302e0c8..bcab1ee 100644
--- a/drivers/soc/fsl/guts.c
+++ b/drivers/soc/fsl/guts.c
@@ -100,6 +100,11 @@ static const struct fsl_soc_die_attr fsl_soc_die[] = {
  .svr  = 0x8700,
  .mask = 0xfff7,
},
+   /* Die: LX2160A, SoC: LX2160A/LX2120A/LX2080A */
+   { .die  = "LX2160A",
+ .svr  = 0x8736,
+ .mask = 0xff3f,
+   },
{ },
 };
 
@@ -222,6 +227,7 @@ static const struct of_device_id fsl_guts_of_match[] = {
{ .compatible = "fsl,ls1088a-dcfg", },
{ .compatible = "fsl,ls1012a-dcfg", },
{ .compatible = "fsl,ls1046a-dcfg", },
+   { .compatible = "fsl,lx2160a-dcfg", },
{}
 };
 MODULE_DEVICE_TABLE(of, fsl_guts_of_match);
-- 
2.7.4



[PATCH v6 4/6] clk: qoriq: Add clockgen support for lx2160a

2018-10-25 Thread Vabhav Sharma
From: Yogesh Gaur 

Add clockgen support for lx2160a.
Added entry for compat 'fsl,lx2160a-clockgen'.

Signed-off-by: Tang Yuantian 
Signed-off-by: Yogesh Gaur 
Signed-off-by: Vabhav Sharma 
Acked-by: Stephen Boyd 
Acked-by: Viresh Kumar 
---
 drivers/clk/clk-qoriq.c | 12 
 drivers/cpufreq/qoriq-cpufreq.c |  1 +
 2 files changed, 13 insertions(+)

diff --git a/drivers/clk/clk-qoriq.c b/drivers/clk/clk-qoriq.c
index e152bfb..99675de 100644
--- a/drivers/clk/clk-qoriq.c
+++ b/drivers/clk/clk-qoriq.c
@@ -570,6 +570,17 @@ static const struct clockgen_chipinfo chipinfo[] = {
.flags = CG_VER3 | CG_LITTLE_ENDIAN,
},
{
+   .compat = "fsl,lx2160a-clockgen",
+   .cmux_groups = {
+   &clockgen2_cmux_cga12, &clockgen2_cmux_cgb
+   },
+   .cmux_to_group = {
+   0, 0, 0, 0, 1, 1, 1, 1, -1
+   },
+   .pll_mask = 0x37,
+   .flags = CG_VER3 | CG_LITTLE_ENDIAN,
+   },
+   {
.compat = "fsl,p2041-clockgen",
.guts_compat = "fsl,qoriq-device-config-1.0",
.init_periph = p2041_init_periph,
@@ -1424,6 +1435,7 @@ CLK_OF_DECLARE(qoriq_clockgen_ls1043a, 
"fsl,ls1043a-clockgen", clockgen_init);
 CLK_OF_DECLARE(qoriq_clockgen_ls1046a, "fsl,ls1046a-clockgen", clockgen_init);
 CLK_OF_DECLARE(qoriq_clockgen_ls1088a, "fsl,ls1088a-clockgen", clockgen_init);
 CLK_OF_DECLARE(qoriq_clockgen_ls2080a, "fsl,ls2080a-clockgen", clockgen_init);
+CLK_OF_DECLARE(qoriq_clockgen_lx2160a, "fsl,lx2160a-clockgen", clockgen_init);
 
 /* Legacy nodes */
 CLK_OF_DECLARE(qoriq_sysclk_1, "fsl,qoriq-sysclk-1.0", sysclk_init);
diff --git a/drivers/cpufreq/qoriq-cpufreq.c b/drivers/cpufreq/qoriq-cpufreq.c
index 3d773f6..83921b7 100644
--- a/drivers/cpufreq/qoriq-cpufreq.c
+++ b/drivers/cpufreq/qoriq-cpufreq.c
@@ -295,6 +295,7 @@ static const struct of_device_id node_matches[] __initconst 
= {
{ .compatible = "fsl,ls1046a-clockgen", },
{ .compatible = "fsl,ls1088a-clockgen", },
{ .compatible = "fsl,ls2080a-clockgen", },
+   { .compatible = "fsl,lx2160a-clockgen", },
{ .compatible = "fsl,p4080-clockgen", },
{ .compatible = "fsl,qoriq-clockgen-1.0", },
{ .compatible = "fsl,qoriq-clockgen-2.0", },
-- 
2.7.4



[PATCH v6 5/6] arm64: dts: add QorIQ LX2160A SoC support

2018-10-25 Thread Vabhav Sharma
LX2160A SoC is based on Layerscape Chassis Generation 3.2 Architecture.

LX2160A features an advanced 16 64-bit ARM v8 CortexA72 processor cores
in 8 cluster, CCN508, GICv3,two 64-bit DDR4 memory controller, 8 I2C
controllers, 3 dspi, 2 esdhc,2 USB 3.0, mmu 500, 3 SATA, 4 PL011 SBSA
UARTs etc.

Signed-off-by: Ramneek Mehresh 
Signed-off-by: Zhang Ying-22455 
Signed-off-by: Nipun Gupta 
Signed-off-by: Priyanka Jain 
Signed-off-by: Yogesh Gaur 
Signed-off-by: Sriram Dash 
Signed-off-by: Vabhav Sharma 
Signed-off-by: Horia Geanta 
Signed-off-by: Ran Wang 
Signed-off-by: Yinbo Zhu 
---
 arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi | 766 +
 1 file changed, 766 insertions(+)
 create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi

diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
new file mode 100644
index 000..9fcfd48
--- /dev/null
+++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
@@ -0,0 +1,766 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+//
+// Device Tree Include file for Layerscape-LX2160A family SoC.
+//
+// Copyright 2018 NXP
+
+#include 
+#include 
+
+/memreserve/ 0x8000 0x0001;
+
+/ {
+   compatible = "fsl,lx2160a";
+   interrupt-parent = <&gic>;
+   #address-cells = <2>;
+   #size-cells = <2>;
+
+   cpus {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   // 8 clusters having 2 Cortex-A72 cores each
+   cpu@0 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a72";
+   enable-method = "psci";
+   reg = <0x0>;
+   clocks = <&clockgen 1 0>;
+   d-cache-size = <0x8000>;
+   d-cache-line-size = <64>;
+   d-cache-sets = <128>;
+   i-cache-size = <0xC000>;
+   i-cache-line-size = <64>;
+   i-cache-sets = <192>;
+   next-level-cache = <&cluster0_l2>;
+   };
+
+   cpu@1 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a72";
+   enable-method = "psci";
+   reg = <0x1>;
+   clocks = <&clockgen 1 0>;
+   d-cache-size = <0x8000>;
+   d-cache-line-size = <64>;
+   d-cache-sets = <128>;
+   i-cache-size = <0xC000>;
+   i-cache-line-size = <64>;
+   i-cache-sets = <192>;
+   next-level-cache = <&cluster0_l2>;
+   };
+
+   cpu@100 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a72";
+   enable-method = "psci";
+   reg = <0x100>;
+   clocks = <&clockgen 1 1>;
+   d-cache-size = <0x8000>;
+   d-cache-line-size = <64>;
+   d-cache-sets = <128>;
+   i-cache-size = <0xC000>;
+   i-cache-line-size = <64>;
+   i-cache-sets = <192>;
+   next-level-cache = <&cluster1_l2>;
+   };
+
+   cpu@101 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a72";
+   enable-method = "psci";
+   reg = <0x101>;
+   clocks = <&clockgen 1 1>;
+   d-cache-size = <0x8000>;
+   d-cache-line-size = <64>;
+   d-cache-sets = <128>;
+   i-cache-size = <0xC000>;
+   i-cache-line-size = <64>;
+   i-cache-sets = <192>;
+   next-level-cache = <&cluster1_l2>;
+   };
+
+   cpu@200 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a72";
+   enable-method = "psci";
+   reg = <0x200>;
+   clocks = <&clockgen 1 2>;
+   d-cache-size = <0x8000>;
+   d-cache-line-size = <64>;
+   d-cache-sets = <128>;
+   i-cache-size = <0xC000>;
+   i-cache-line-size = <64>;
+   i-cache-sets = <192>;
+   next-level-cache = <&cluster2_l2>;
+   };
+
+   cpu@201 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a72";
+   enable-method = "psci";
+   reg = <0x201>;
+   clocks = <&clockgen 1 2>;
+   d-cache-size = <0x8000>;
+   d-c

[PATCH v6 3/6] clk: qoriq: increase array size of cmux_to_group

2018-10-25 Thread Vabhav Sharma
From: Yogesh Gaur 

Increase size of cmux_to_group array, to accomdate entry of
-1 termination.

Added -1, terminated, entry for 4080_cmux_grpX.

Signed-off-by: Yogesh Gaur 
Signed-off-by: Vabhav Sharma 
Acked-by: Stephen Boyd 
---
 drivers/clk/clk-qoriq.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/clk/clk-qoriq.c b/drivers/clk/clk-qoriq.c
index 3a1812f..e152bfb 100644
--- a/drivers/clk/clk-qoriq.c
+++ b/drivers/clk/clk-qoriq.c
@@ -79,7 +79,7 @@ struct clockgen_chipinfo {
const struct clockgen_muxinfo *cmux_groups[2];
const struct clockgen_muxinfo *hwaccel[NUM_HWACCEL];
void (*init_periph)(struct clockgen *cg);
-   int cmux_to_group[NUM_CMUX]; /* -1 terminates if fewer than NUM_CMUX */
+   int cmux_to_group[NUM_CMUX+1]; /* array should be -1 terminated */
u32 pll_mask;   /* 1 << n bit set if PLL n is valid */
u32 flags;  /* CG_xxx */
 };
@@ -601,7 +601,7 @@ static const struct clockgen_chipinfo chipinfo[] = {
&p4080_cmux_grp1, &p4080_cmux_grp2
},
.cmux_to_group = {
-   0, 0, 0, 0, 1, 1, 1, 1
+   0, 0, 0, 0, 1, 1, 1, 1, -1
},
.pll_mask = 0x1f,
},
-- 
2.7.4



[PATCH v6 6/6] arm64: dts: add LX2160ARDB board support

2018-10-25 Thread Vabhav Sharma
LX2160A reference design board (RDB) is a high-performance
computing, evaluation, and development platform with LX2160A
SoC.

Signed-off-by: Priyanka Jain 
Signed-off-by: Sriram Dash 
Signed-off-by: Vabhav Sharma 
Signed-off-by: Horia Geanta 
Signed-off-by: Ran Wang 
Signed-off-by: Zhang Ying-22455 
Signed-off-by: Yinbo Zhu 
Acked-by: Li Yang 
---
 arch/arm64/boot/dts/freescale/Makefile|   1 +
 arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts | 119 ++
 2 files changed, 120 insertions(+)
 create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts

diff --git a/arch/arm64/boot/dts/freescale/Makefile 
b/arch/arm64/boot/dts/freescale/Makefile
index 86e18ad..445b72b 100644
--- a/arch/arm64/boot/dts/freescale/Makefile
+++ b/arch/arm64/boot/dts/freescale/Makefile
@@ -13,3 +13,4 @@ dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2080a-rdb.dtb
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2080a-simu.dtb
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2088a-qds.dtb
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2088a-rdb.dtb
+dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-lx2160a-rdb.dtb
diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts 
b/arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts
new file mode 100644
index 000..6481e5f
--- /dev/null
+++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts
@@ -0,0 +1,119 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+//
+// Device Tree file for LX2160ARDB
+//
+// Copyright 2018 NXP
+
+/dts-v1/;
+
+#include "fsl-lx2160a.dtsi"
+
+/ {
+   model = "NXP Layerscape LX2160ARDB";
+   compatible = "fsl,lx2160a-rdb", "fsl,lx2160a";
+
+   aliases {
+   crypto = &crypto;
+   serial0 = &uart0;
+   };
+
+   chosen {
+   stdout-path = "serial0:115200n8";
+   };
+
+   sb_3v3: regulator-sb3v3 {
+   compatible = "regulator-fixed";
+   regulator-name = "MC34717-3.3VSB";
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   regulator-boot-on;
+   regulator-always-on;
+   };
+};
+
+&crypto {
+   status = "okay";
+};
+
+&esdhc0 {
+   sd-uhs-sdr104;
+   sd-uhs-sdr50;
+   sd-uhs-sdr25;
+   sd-uhs-sdr12;
+   status = "okay";
+};
+
+&esdhc1 {
+   mmc-hs200-1_8v;
+   mmc-hs400-1_8v;
+   bus-width = <8>;
+   status = "okay";
+};
+
+&i2c0 {
+   status = "okay";
+
+   i2c-mux@77 {
+   compatible = "nxp,pca9547";
+   reg = <0x77>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   i2c@2 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <0x2>;
+
+   power-monitor@40 {
+   compatible = "ti,ina220";
+   reg = <0x40>;
+   shunt-resistor = <1000>;
+   };
+   };
+
+   i2c@3 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <0x3>;
+
+   temperature-sensor@4c {
+   compatible = "nxp,sa56004";
+   reg = <0x4c>;
+   vcc-supply = <&sb_3v3>;
+   };
+
+   temperature-sensor@4d {
+   compatible = "nxp,sa56004";
+   reg = <0x4d>;
+   vcc-supply = <&sb_3v3>;
+   };
+   };
+   };
+};
+
+&i2c4 {
+   status = "okay";
+
+   rtc@51 {
+   compatible = "nxp,pcf2129";
+   reg = <0x51>;
+   // IRQ10_B
+   interrupts = <0 150 0x4>;
+   };
+};
+
+&uart0 {
+   status = "okay";
+};
+
+&uart1 {
+   status = "okay";
+};
+
+&usb0 {
+   status = "okay";
+};
+
+&usb1 {
+   status = "okay";
+};
-- 
2.7.4



Re: [PATCH] seccomp: Add pkru into seccomp_data

2018-10-25 Thread Florian Weimer
* Michael Sammler:

> Thank you for the pointer about the POWER implementation. I am not
> familiar with POWER in general and its protection key feature at
> all. Would the AMR register be the correct register to expose here?

Yes, according to my notes, the register is called AMR (special purpose
register 13).

> I understand your concern about exposing the number of protection keys
> in the ABI. One idea would be to state, that the pkru field (which
> should probably be renamed) contains an architecture specific value,
> which could then be the PKRU on x86 and AMR (or another register) on
> POWER. This new field should probably be extended to __u64 and the
> reserved field removed.

POWER also has proper read/write bit separation, not PKEY_DISABLE_ACCESS
(disable read and write) and PKEY_DISABLE_WRITE like Intel.  It's
currently translated by the kernel, but I really need a
PKEY_DISABLE_READ bit in glibc to implement pkey_get in case the memory
is write-only.

> Another idea would be to not add a field in the seccomp_data
> structure, but instead provide a new BPF instruction, which reads the
> value of a specified protection key.

I would prefer that if it's possible.  We should make sure that the bits
are the same as those returned from pkey_get.  I have an implementation
on POWER, but have yet to figure out the implications for 32-bit because
I do not know the AMR register size there.

Thanks,
Florian


Re: [PATCH v2 0/2] arm64: Cut rebuild time when changing CONFIG_BLK_DEV_INITRD

2018-10-25 Thread Mike Rapoport
On Wed, Oct 24, 2018 at 02:55:17PM -0500, Rob Herring wrote:
> On Wed, Oct 24, 2018 at 2:33 PM Florian Fainelli  wrote:
> >
> > Hi all,
> >
> > While investigating why ARM64 required a ton of objects to be rebuilt
> > when toggling CONFIG_DEV_BLK_INITRD, it became clear that this was
> > because we define __early_init_dt_declare_initrd() differently and we do
> > that in arch/arm64/include/asm/memory.h which gets included by a fair
> > amount of other header files, and translation units as well.
> 
> I scratch my head sometimes as to why some config options rebuild so
> much stuff. One down, ? to go. :)
> 
> > Changing the value of CONFIG_DEV_BLK_INITRD is a common thing with build
> > systems that generate two kernels: one with the initramfs and one
> > without. buildroot is one of these build systems, OpenWrt is also
> > another one that does this.
> >
> > This patch series proposes adding an empty initrd.h to satisfy the need
> > for drivers/of/fdt.c to unconditionally include that file, and moves the
> > custom __early_init_dt_declare_initrd() definition away from
> > asm/memory.h
> >
> > This cuts the number of objects rebuilds from 1920 down to 26, so a
> > factor 73 approximately.
> >
> > Apologies for the long CC list, please let me know how you would go
> > about merging that and if another approach would be preferable, e.g:
> > introducing a CONFIG_ARCH_INITRD_BELOW_START_OK Kconfig option or
> > something like that.
> 
> There may be a better way as of 4.20 because bootmem is now gone and
> only memblock is used. This should unify what each arch needs to do
> with initrd early. We need the physical address early for memblock
> reserving. Then later on we need the virtual address to access the
> initrd. Perhaps we should just change initrd_start and initrd_end to
> physical addresses (or add 2 new variables would be less invasive and
> allow for different translation than __va()). The sanity checks and
> memblock reserve could also perhaps be moved to a common location.
>
> Alternatively, given arm64 is the only oddball, I'd be fine with an
> "if (IS_ENABLED(CONFIG_ARM64))" condition in the default
> __early_init_dt_declare_initrd as long as we have a path to removing
> it like the above option.

I think arm64 does not have to redefine __early_init_dt_declare_initrd().
Something like this might be just all we need (completely untested,
probably it won't even compile):

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 9d9582c..e9ca238 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -62,6 +62,9 @@ s64 memstart_addr __ro_after_init = -1;
 phys_addr_t arm64_dma_phys_limit __ro_after_init;
 
 #ifdef CONFIG_BLK_DEV_INITRD
+
+static phys_addr_t initrd_start_phys, initrd_end_phys;
+
 static int __init early_initrd(char *p)
 {
unsigned long start, size;
@@ -71,8 +74,8 @@ static int __init early_initrd(char *p)
if (*endp == ',') {
size = memparse(endp + 1, NULL);
 
-   initrd_start = start;
-   initrd_end = start + size;
+   initrd_start_phys = start;
+   initrd_end_phys = end;
}
return 0;
 }
@@ -407,14 +410,27 @@ void __init arm64_memblock_init(void)
memblock_add(__pa_symbol(_text), (u64)(_end - _text));
}
 
-   if (IS_ENABLED(CONFIG_BLK_DEV_INITRD) && initrd_start) {
+   if (IS_ENABLED(CONFIG_BLK_DEV_INITRD) &&
+   (initrd_start || initrd_start_phys)) {
+   /*
+* FIXME: ensure proper precendence between
+* early_initrd and DT when both are present
+*/
+   if (initrd_start) {
+   initrd_start_phys = __phys_to_virt(initrd_start);
+   initrd_end_phys = __phys_to_virt(initrd_end);
+   } else if (initrd_start_phys) {
+   initrd_start = __va(initrd_start_phys);
+   initrd_end = __va(initrd_start_phys);
+   }
+
/*
 * Add back the memory we just removed if it results in the
 * initrd to become inaccessible via the linear mapping.
 * Otherwise, this is a no-op
 */
-   u64 base = initrd_start & PAGE_MASK;
-   u64 size = PAGE_ALIGN(initrd_end) - base;
+   u64 base = initrd_start_phys & PAGE_MASK;
+   u64 size = PAGE_ALIGN(initrd_end_phys) - base;
 
/*
 * We can only add back the initrd memory if we don't end up
@@ -458,7 +474,7 @@ void __init arm64_memblock_init(void)
 * pagetables with memblock.
 */
memblock_reserve(__pa_symbol(_text), _end - _text);
-#ifdef CONFIG_BLK_DEV_INITRD
+#if 0
if (initrd_start) {
memblock_reserve(initrd_start, initrd_end - initrd_start);
 
 
> Rob
> 

-- 
Sincerely yours,
Mike.



Re: [PATCH v2 0/2] arm64: Cut rebuild time when changing CONFIG_BLK_DEV_INITRD

2018-10-25 Thread Russell King - ARM Linux
On Thu, Oct 25, 2018 at 10:38:34AM +0100, Mike Rapoport wrote:
> On Wed, Oct 24, 2018 at 02:55:17PM -0500, Rob Herring wrote:
> > On Wed, Oct 24, 2018 at 2:33 PM Florian Fainelli  
> > wrote:
> > >
> > > Hi all,
> > >
> > > While investigating why ARM64 required a ton of objects to be rebuilt
> > > when toggling CONFIG_DEV_BLK_INITRD, it became clear that this was
> > > because we define __early_init_dt_declare_initrd() differently and we do
> > > that in arch/arm64/include/asm/memory.h which gets included by a fair
> > > amount of other header files, and translation units as well.
> > 
> > I scratch my head sometimes as to why some config options rebuild so
> > much stuff. One down, ? to go. :)
> > 
> > > Changing the value of CONFIG_DEV_BLK_INITRD is a common thing with build
> > > systems that generate two kernels: one with the initramfs and one
> > > without. buildroot is one of these build systems, OpenWrt is also
> > > another one that does this.
> > >
> > > This patch series proposes adding an empty initrd.h to satisfy the need
> > > for drivers/of/fdt.c to unconditionally include that file, and moves the
> > > custom __early_init_dt_declare_initrd() definition away from
> > > asm/memory.h
> > >
> > > This cuts the number of objects rebuilds from 1920 down to 26, so a
> > > factor 73 approximately.
> > >
> > > Apologies for the long CC list, please let me know how you would go
> > > about merging that and if another approach would be preferable, e.g:
> > > introducing a CONFIG_ARCH_INITRD_BELOW_START_OK Kconfig option or
> > > something like that.
> > 
> > There may be a better way as of 4.20 because bootmem is now gone and
> > only memblock is used. This should unify what each arch needs to do
> > with initrd early. We need the physical address early for memblock
> > reserving. Then later on we need the virtual address to access the
> > initrd. Perhaps we should just change initrd_start and initrd_end to
> > physical addresses (or add 2 new variables would be less invasive and
> > allow for different translation than __va()). The sanity checks and
> > memblock reserve could also perhaps be moved to a common location.
> >
> > Alternatively, given arm64 is the only oddball, I'd be fine with an
> > "if (IS_ENABLED(CONFIG_ARM64))" condition in the default
> > __early_init_dt_declare_initrd as long as we have a path to removing
> > it like the above option.
> 
> I think arm64 does not have to redefine __early_init_dt_declare_initrd().
> Something like this might be just all we need (completely untested,
> probably it won't even compile):

The alternative solution would be to replace initrd_start/initrd_end
with physical address versions of these everywhere - that's what
we're passed from DT, it's what 32-bit ARM would prefer, and seemingly
what 64-bit ARM would also like as well.

Grepping for initrd_start in arch/*/mm shows that there's lots of
architectures that have virtual/physical conversions on these, and
a number that have obviously been derived from 32-bit ARM's approach
(with maintaining a phys_initrd_start variable to simplify things).

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH 1/4] treewide: remove unused address argument from pte_alloc functions (v2)

2018-10-25 Thread Kirill A. Shutemov
On Wed, Oct 24, 2018 at 10:37:16AM +0200, Peter Zijlstra wrote:
> On Fri, Oct 12, 2018 at 06:31:57PM -0700, Joel Fernandes (Google) wrote:
> > This series speeds up mremap(2) syscall by copying page tables at the
> > PMD level even for non-THP systems. There is concern that the extra
> > 'address' argument that mremap passes to pte_alloc may do something
> > subtle architecture related in the future that may make the scheme not
> > work.  Also we find that there is no point in passing the 'address' to
> > pte_alloc since its unused. So this patch therefore removes this
> > argument tree-wide resulting in a nice negative diff as well. Also
> > ensuring along the way that the enabled architectures do not do anything
> > funky with 'address' argument that goes unnoticed by the optimization.
> 
> Did you happen to look at the history of where that address argument
> came from? -- just being curious here. ISTR something vague about
> architectures having different paging structure for different memory
> ranges.

I see some archicetures (i.e. sparc and, I believe power) used the address
for coloring. It's not needed anymore. Page allocator and SL?B are good
enough now.

See 3c936465249f ("[SPARC64]: Kill pgtable quicklists and use SLAB.")

-- 
 Kirill A. Shutemov


[PATCH 3/6] PCI: layerscape: Add the EP mode support

2018-10-25 Thread Xiaowei Bao
Add the EP mode support.

Signed-off-by: Xiaowei Bao 
---
 .../devicetree/bindings/pci/layerscape-pci.txt |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/Documentation/devicetree/bindings/pci/layerscape-pci.txt 
b/Documentation/devicetree/bindings/pci/layerscape-pci.txt
index 66df1e8..d3d7be1 100644
--- a/Documentation/devicetree/bindings/pci/layerscape-pci.txt
+++ b/Documentation/devicetree/bindings/pci/layerscape-pci.txt
@@ -13,12 +13,15 @@ information.
 
 Required properties:
 - compatible: should contain the platform identifier such as:
+  RC mode:
 "fsl,ls1021a-pcie", "snps,dw-pcie"
 "fsl,ls2080a-pcie", "fsl,ls2085a-pcie", "snps,dw-pcie"
 "fsl,ls2088a-pcie"
 "fsl,ls1088a-pcie"
 "fsl,ls1046a-pcie"
 "fsl,ls1012a-pcie"
+  EP mode:
+"fsl,ls-pcie-ep"
 - reg: base addresses and lengths of the PCIe controller register blocks.
 - interrupts: A list of interrupt outputs of the controller. Must contain an
   entry for each entry in the interrupt-names property.
-- 
1.7.1



[PATCH 2/6] ARM: dts: ls1021a: Add the status property disable PCIe

2018-10-25 Thread Xiaowei Bao
Add the status property disable the PCIe, the property will be enable
by bootloader.

Signed-off-by: Xiaowei Bao 
---
 arch/arm/boot/dts/ls1021a.dtsi |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/arm/boot/dts/ls1021a.dtsi b/arch/arm/boot/dts/ls1021a.dtsi
index bdd6e66..b769e0e 100644
--- a/arch/arm/boot/dts/ls1021a.dtsi
+++ b/arch/arm/boot/dts/ls1021a.dtsi
@@ -736,6 +736,7 @@
< 0 0 2 &gic GIC_SPI 188 
IRQ_TYPE_LEVEL_HIGH>,
< 0 0 3 &gic GIC_SPI 190 
IRQ_TYPE_LEVEL_HIGH>,
< 0 0 4 &gic GIC_SPI 192 
IRQ_TYPE_LEVEL_HIGH>;
+   status = "disabled";
};
 
pcie@350 {
@@ -759,6 +760,7 @@
< 0 0 2 &gic GIC_SPI 189 
IRQ_TYPE_LEVEL_HIGH>,
< 0 0 3 &gic GIC_SPI 191 
IRQ_TYPE_LEVEL_HIGH>,
< 0 0 4 &gic GIC_SPI 193 
IRQ_TYPE_LEVEL_HIGH>;
+   status = "disabled";
};
 
can0: can@2a7 {
-- 
1.7.1



[PATCH 6/6] misc: pci_endpoint_test: Add the layerscape EP device support

2018-10-25 Thread Xiaowei Bao
Add the layerscape EP device support in pci_endpoint_test driver.

Signed-off-by: Xiaowei Bao 
---
 drivers/misc/pci_endpoint_test.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c
index 896e2df..744d10c 100644
--- a/drivers/misc/pci_endpoint_test.c
+++ b/drivers/misc/pci_endpoint_test.c
@@ -788,6 +788,8 @@ static void pci_endpoint_test_remove(struct pci_dev *pdev)
 static const struct pci_device_id pci_endpoint_test_tbl[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_TI, PCI_DEVICE_ID_TI_DRA74x) },
{ PCI_DEVICE(PCI_VENDOR_ID_TI, PCI_DEVICE_ID_TI_DRA72x) },
+   /* 0x81c0: The device id of ls1046a in NXP. */
+   { PCI_DEVICE(PCI_VENDOR_ID_FREESCALE, 0x81c0) },
{ PCI_DEVICE(PCI_VENDOR_ID_SYNOPSYS, 0xedda) },
{ }
 };
-- 
1.7.1



[PATCH 4/6] arm64: dts: Add the PCIE EP node in dts

2018-10-25 Thread Xiaowei Bao
Add the PCIE EP node in dts for ls1046a.

Signed-off-by: Xiaowei Bao 
---
 arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi |   32 
 1 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
index 64d334c..08b4f08 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
@@ -655,6 +655,17 @@
status = "disabled";
};
 
+   pcie_ep@340 {
+   compatible = "fsl,ls-pcie-ep";
+   reg = <0x00 0x0340 0x0 0x0010
+   0x40 0x 0x8 0x>;
+   reg-names = "regs", "addr_space";
+   num-ib-windows = <6>;
+   num-ob-windows = <6>;
+   num-lanes = <2>;
+   status = "disabled";
+   };
+
pcie@350 {
compatible = "fsl,ls1046a-pcie", "snps,dw-pcie";
reg = <0x00 0x0350 0x0 0x0010   /* controller 
registers */
@@ -681,6 +692,17 @@
status = "disabled";
};
 
+   pcie_ep@350 {
+   compatible = "fsl,ls-pcie-ep";
+   reg = <0x00 0x0350 0x0 0x0010
+   0x48 0x 0x8 0x>;
+   reg-names = "regs", "addr_space";
+   num-ib-windows = <6>;
+   num-ob-windows = <6>;
+   num-lanes = <2>;
+   status = "disabled";
+   };
+
pcie@360 {
compatible = "fsl,ls1046a-pcie", "snps,dw-pcie";
reg = <0x00 0x0360 0x0 0x0010   /* controller 
registers */
@@ -707,6 +729,16 @@
status = "disabled";
};
 
+   pcie_ep@360 {
+   compatible = "fsl,ls-pcie-ep";
+   reg = <0x00 0x0360 0x0 0x0010
+   0x50 0x 0x8 0x>;
+   reg-names = "regs", "addr_space";
+   num-ib-windows = <6>;
+   num-ob-windows = <6>;
+   num-lanes = <2>;
+   status = "disabled";
+   };
};
 
reserved-memory {
-- 
1.7.1



[PATCH 5/6] pci: layerscape: Add the EP mode support.

2018-10-25 Thread Xiaowei Bao
Add the PCIe EP mode support for layerscape platform.

Signed-off-by: Xiaowei Bao 
---
 drivers/pci/controller/dwc/Makefile|2 +-
 drivers/pci/controller/dwc/pci-layerscape-ep.c |  161 
 2 files changed, 162 insertions(+), 1 deletions(-)
 create mode 100644 drivers/pci/controller/dwc/pci-layerscape-ep.c

diff --git a/drivers/pci/controller/dwc/Makefile 
b/drivers/pci/controller/dwc/Makefile
index 5d2ce72..b26d617 100644
--- a/drivers/pci/controller/dwc/Makefile
+++ b/drivers/pci/controller/dwc/Makefile
@@ -8,7 +8,7 @@ obj-$(CONFIG_PCI_EXYNOS) += pci-exynos.o
 obj-$(CONFIG_PCI_IMX6) += pci-imx6.o
 obj-$(CONFIG_PCIE_SPEAR13XX) += pcie-spear13xx.o
 obj-$(CONFIG_PCI_KEYSTONE) += pci-keystone-dw.o pci-keystone.o
-obj-$(CONFIG_PCI_LAYERSCAPE) += pci-layerscape.o
+obj-$(CONFIG_PCI_LAYERSCAPE) += pci-layerscape.o pci-layerscape-ep.o
 obj-$(CONFIG_PCIE_QCOM) += pcie-qcom.o
 obj-$(CONFIG_PCIE_ARMADA_8K) += pcie-armada8k.o
 obj-$(CONFIG_PCIE_ARTPEC6) += pcie-artpec6.o
diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
b/drivers/pci/controller/dwc/pci-layerscape-ep.c
new file mode 100644
index 000..3b33bbc
--- /dev/null
+++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
@@ -0,0 +1,161 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * PCIe controller EP driver for Freescale Layerscape SoCs
+ *
+ * Copyright (C) 2018 NXP Semiconductor.
+ *
+ * Author: Xiaowei Bao 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "pcie-designware.h"
+
+#define PCIE_DBI2_OFFSET   0x1000  /* DBI2 base address*/
+
+struct ls_pcie_ep {
+   struct dw_pcie  *pci;
+};
+
+#define to_ls_pcie_ep(x)   dev_get_drvdata((x)->dev)
+
+static bool ls_pcie_is_bridge(struct ls_pcie_ep *pcie)
+{
+   struct dw_pcie *pci = pcie->pci;
+   u32 header_type;
+
+   header_type = ioread8(pci->dbi_base + PCI_HEADER_TYPE);
+   header_type &= 0x7f;
+
+   return header_type == PCI_HEADER_TYPE_BRIDGE;
+}
+
+static int ls_pcie_establish_link(struct dw_pcie *pci)
+{
+   return 0;
+}
+
+static const struct dw_pcie_ops ls_pcie_ep_ops = {
+   .start_link = ls_pcie_establish_link,
+};
+
+static const struct of_device_id ls_pcie_ep_of_match[] = {
+   { .compatible = "fsl,ls-pcie-ep",},
+   { },
+};
+
+static void ls_pcie_ep_init(struct dw_pcie_ep *ep)
+{
+   struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
+   struct pci_epc *epc = ep->epc;
+   enum pci_barno bar;
+
+   for (bar = BAR_0; bar <= BAR_5; bar++)
+   dw_pcie_ep_reset_bar(pci, bar);
+
+   epc->features |= EPC_FEATURE_NO_LINKUP_NOTIFIER;
+}
+
+static int ls_pcie_ep_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
+ enum pci_epc_irq_type type, u16 interrupt_num)
+{
+   struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
+
+   switch (type) {
+   case PCI_EPC_IRQ_LEGACY:
+   return dw_pcie_ep_raise_legacy_irq(ep, func_no);
+   case PCI_EPC_IRQ_MSI:
+   return dw_pcie_ep_raise_msi_irq(ep, func_no, interrupt_num);
+   case PCI_EPC_IRQ_MSIX:
+   return dw_pcie_ep_raise_msix_irq(ep, func_no, interrupt_num);
+   default:
+   dev_err(pci->dev, "UNKNOWN IRQ type\n");
+   }
+
+   return 0;
+}
+
+static struct dw_pcie_ep_ops pcie_ep_ops = {
+   .ep_init = ls_pcie_ep_init,
+   .raise_irq = ls_pcie_ep_raise_irq,
+};
+
+static int __init ls_add_pcie_ep(struct ls_pcie_ep *pcie,
+   struct platform_device *pdev)
+{
+   struct dw_pcie *pci = pcie->pci;
+   struct device *dev = pci->dev;
+   struct dw_pcie_ep *ep;
+   struct resource *res;
+   int ret;
+
+   ep = &pci->ep;
+   ep->ops = &pcie_ep_ops;
+
+   res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "addr_space");
+   if (!res)
+   return -EINVAL;
+
+   ep->phys_base = res->start;
+   ep->addr_size = resource_size(res);
+
+   ret = dw_pcie_ep_init(ep);
+   if (ret) {
+   dev_err(dev, "failed to initialize endpoint\n");
+   return ret;
+   }
+
+   return 0;
+}
+
+static int __init ls_pcie_ep_probe(struct platform_device *pdev)
+{
+   struct device *dev = &pdev->dev;
+   struct dw_pcie *pci;
+   struct ls_pcie_ep *pcie;
+   struct resource *dbi_base;
+   int ret;
+
+   pcie = devm_kzalloc(dev, sizeof(*pcie), GFP_KERNEL);
+   if (!pcie)
+   return -ENOMEM;
+
+   pci = devm_kzalloc(dev, sizeof(*pci), GFP_KERNEL);
+   if (!pci)
+   return -ENOMEM;
+
+   dbi_base = platform_get_resource_byname(pdev, IORESOURCE_MEM, "regs");
+   pci->dbi_base = devm_pci_remap_cfg_resource(dev, dbi_base);
+   if (IS_ERR(pci->dbi_base))
+   return PTR_ERR(pci->dbi_base);
+
+   pci->dbi_base2 = pci->dbi_base + PCIE_DBI2_OFFSET;
+   pci->dev = dev;
+   pci->ops = &ls_

[PATCH 1/6] arm64: dts: Add the status property disable PCIe

2018-10-25 Thread Xiaowei Bao
From: Bao Xiaowei 

Add the status property disable the PCIe, the property will be enable
by bootloader.

Signed-off-by: Bao Xiaowei 
---
 arch/arm64/boot/dts/freescale/fsl-ls1012a.dtsi |1 +
 arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi |3 +++
 arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi |3 +++
 arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi |3 +++
 arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi |4 
 5 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1012a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1012a.dtsi
index 5da732f..21f2b3b 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1012a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1012a.dtsi
@@ -496,6 +496,7 @@
< 0 0 2 &gic 0 111 
IRQ_TYPE_LEVEL_HIGH>,
< 0 0 3 &gic 0 112 
IRQ_TYPE_LEVEL_HIGH>,
< 0 0 4 &gic 0 113 
IRQ_TYPE_LEVEL_HIGH>;
+   status = "disabled";
};
};
 
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
index 3fed504..760d510 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
@@ -683,6 +683,7 @@
< 0 0 2 &gic 0 111 0x4>,
< 0 0 3 &gic 0 112 0x4>,
< 0 0 4 &gic 0 113 0x4>;
+   status = "disabled";
};
 
pcie@350 {
@@ -708,6 +709,7 @@
< 0 0 2 &gic 0 121 0x4>,
< 0 0 3 &gic 0 122 0x4>,
< 0 0 4 &gic 0 123 0x4>;
+   status = "disabled";
};
 
pcie@360 {
@@ -733,6 +735,7 @@
< 0 0 2 &gic 0 155 0x4>,
< 0 0 3 &gic 0 156 0x4>,
< 0 0 4 &gic 0 157 0x4>;
+   status = "disabled";
};
};
 
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
index 51cbd50..64d334c 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
@@ -652,6 +652,7 @@
< 0 0 2 &gic GIC_SPI 110 
IRQ_TYPE_LEVEL_HIGH>,
< 0 0 3 &gic GIC_SPI 110 
IRQ_TYPE_LEVEL_HIGH>,
< 0 0 4 &gic GIC_SPI 110 
IRQ_TYPE_LEVEL_HIGH>;
+   status = "disabled";
};
 
pcie@350 {
@@ -677,6 +678,7 @@
< 0 0 2 &gic GIC_SPI 120 
IRQ_TYPE_LEVEL_HIGH>,
< 0 0 3 &gic GIC_SPI 120 
IRQ_TYPE_LEVEL_HIGH>,
< 0 0 4 &gic GIC_SPI 120 
IRQ_TYPE_LEVEL_HIGH>;
+   status = "disabled";
};
 
pcie@360 {
@@ -702,6 +704,7 @@
< 0 0 2 &gic GIC_SPI 154 
IRQ_TYPE_LEVEL_HIGH>,
< 0 0 3 &gic GIC_SPI 154 
IRQ_TYPE_LEVEL_HIGH>,
< 0 0 4 &gic GIC_SPI 154 
IRQ_TYPE_LEVEL_HIGH>;
+   status = "disabled";
};
 
};
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
index a07f612..9deb9cb 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
@@ -533,6 +533,7 @@
< 0 0 2 &gic 0 0 0 110 
IRQ_TYPE_LEVEL_HIGH>,
< 0 0 3 &gic 0 0 0 111 
IRQ_TYPE_LEVEL_HIGH>,
< 0 0 4 &gic 0 0 0 112 
IRQ_TYPE_LEVEL_HIGH>;
+   status = "disabled";
};
 
pcie@350 {
@@ -557,6 +558,7 @@
< 0 0 2 &gic 0 0 0 115 
IRQ_TYPE_LEVEL_HIGH>,
< 0 0 3 &gic 0 0 0 116 
IRQ_TYPE_LEVEL_HIGH>,
< 0 0 4 &gic 0 0 0 117 
IRQ_TYPE_LEVEL_HIGH>;
+   status = "disabled";
};
 
pcie@360 {
@@ -581,6 +583,7 @@
< 0 0 2 &gic 0 0 0 120 
IRQ_TYPE_LEVEL_HIGH>,
< 0 0 3 &gic 0 0 0 121 
IRQ_TYPE_LEVEL_HIGH>,
< 0 0 4 &gic 0 0 0 122 
IRQ_TYPE_LEVEL_HIGH>;
+

Re: [PATCH v2 0/2] arm64: Cut rebuild time when changing CONFIG_BLK_DEV_INITRD

2018-10-25 Thread Rob Herring
+Ard

On Thu, Oct 25, 2018 at 4:38 AM Mike Rapoport  wrote:
>
> On Wed, Oct 24, 2018 at 02:55:17PM -0500, Rob Herring wrote:
> > On Wed, Oct 24, 2018 at 2:33 PM Florian Fainelli  
> > wrote:
> > >
> > > Hi all,
> > >
> > > While investigating why ARM64 required a ton of objects to be rebuilt
> > > when toggling CONFIG_DEV_BLK_INITRD, it became clear that this was
> > > because we define __early_init_dt_declare_initrd() differently and we do
> > > that in arch/arm64/include/asm/memory.h which gets included by a fair
> > > amount of other header files, and translation units as well.
> >
> > I scratch my head sometimes as to why some config options rebuild so
> > much stuff. One down, ? to go. :)
> >
> > > Changing the value of CONFIG_DEV_BLK_INITRD is a common thing with build
> > > systems that generate two kernels: one with the initramfs and one
> > > without. buildroot is one of these build systems, OpenWrt is also
> > > another one that does this.
> > >
> > > This patch series proposes adding an empty initrd.h to satisfy the need
> > > for drivers/of/fdt.c to unconditionally include that file, and moves the
> > > custom __early_init_dt_declare_initrd() definition away from
> > > asm/memory.h
> > >
> > > This cuts the number of objects rebuilds from 1920 down to 26, so a
> > > factor 73 approximately.
> > >
> > > Apologies for the long CC list, please let me know how you would go
> > > about merging that and if another approach would be preferable, e.g:
> > > introducing a CONFIG_ARCH_INITRD_BELOW_START_OK Kconfig option or
> > > something like that.
> >
> > There may be a better way as of 4.20 because bootmem is now gone and
> > only memblock is used. This should unify what each arch needs to do
> > with initrd early. We need the physical address early for memblock
> > reserving. Then later on we need the virtual address to access the
> > initrd. Perhaps we should just change initrd_start and initrd_end to
> > physical addresses (or add 2 new variables would be less invasive and
> > allow for different translation than __va()). The sanity checks and
> > memblock reserve could also perhaps be moved to a common location.
> >
> > Alternatively, given arm64 is the only oddball, I'd be fine with an
> > "if (IS_ENABLED(CONFIG_ARM64))" condition in the default
> > __early_init_dt_declare_initrd as long as we have a path to removing
> > it like the above option.
>
> I think arm64 does not have to redefine __early_init_dt_declare_initrd().
> Something like this might be just all we need (completely untested,
> probably it won't even compile):
>
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 9d9582c..e9ca238 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -62,6 +62,9 @@ s64 memstart_addr __ro_after_init = -1;
>  phys_addr_t arm64_dma_phys_limit __ro_after_init;
>
>  #ifdef CONFIG_BLK_DEV_INITRD
> +
> +static phys_addr_t initrd_start_phys, initrd_end_phys;
> +
>  static int __init early_initrd(char *p)
>  {
> unsigned long start, size;
> @@ -71,8 +74,8 @@ static int __init early_initrd(char *p)
> if (*endp == ',') {
> size = memparse(endp + 1, NULL);
>
> -   initrd_start = start;
> -   initrd_end = start + size;
> +   initrd_start_phys = start;
> +   initrd_end_phys = end;
> }
> return 0;
>  }
> @@ -407,14 +410,27 @@ void __init arm64_memblock_init(void)
> memblock_add(__pa_symbol(_text), (u64)(_end - _text));
> }
>
> -   if (IS_ENABLED(CONFIG_BLK_DEV_INITRD) && initrd_start) {
> +   if (IS_ENABLED(CONFIG_BLK_DEV_INITRD) &&
> +   (initrd_start || initrd_start_phys)) {
> +   /*
> +* FIXME: ensure proper precendence between
> +* early_initrd and DT when both are present

Command line takes precedence, so just reverse the order.

> +*/
> +   if (initrd_start) {
> +   initrd_start_phys = __phys_to_virt(initrd_start);
> +   initrd_end_phys = __phys_to_virt(initrd_end);

AIUI, the original issue was doing the P2V translation was happening
too early and the VA could be wrong if the linear range is adjusted.
So I don't think this would work.

I suppose you could convert the VA back to a PA before any adjustments
and then back to a VA again after. But that's kind of hacky. 2 wrongs
making a right.

> +   } else if (initrd_start_phys) {
> +   initrd_start = __va(initrd_start_phys);
> +   initrd_end = __va(initrd_start_phys);
> +   }
> +
> /*
>  * Add back the memory we just removed if it results in the
>  * initrd to become inaccessible via the linear mapping.
>  * Otherwise, this is a no-op
>  */
> -   u64 base = initrd_start & PAGE_MASK;
> -   u64 size = PAGE_ALIGN(initrd_e

Re: [PATCH 2/4] mm: speed up mremap by 500x on large regions (v2)

2018-10-25 Thread Joel Fernandes
On Wed, Oct 24, 2018 at 03:57:24PM +0300, Kirill A. Shutemov wrote:
> On Wed, Oct 24, 2018 at 10:57:33PM +1100, Balbir Singh wrote:
> > On Wed, Oct 24, 2018 at 01:12:56PM +0300, Kirill A. Shutemov wrote:
> > > On Fri, Oct 12, 2018 at 06:31:58PM -0700, Joel Fernandes (Google) wrote:
> > > > diff --git a/mm/mremap.c b/mm/mremap.c
> > > > index 9e68a02a52b1..2fd163cff406 100644
> > > > --- a/mm/mremap.c
> > > > +++ b/mm/mremap.c
> > > > @@ -191,6 +191,54 @@ static void move_ptes(struct vm_area_struct *vma, 
> > > > pmd_t *old_pmd,
> > > > drop_rmap_locks(vma);
> > > >  }
> > > >  
> > > > +static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long 
> > > > old_addr,
> > > > + unsigned long new_addr, unsigned long old_end,
> > > > + pmd_t *old_pmd, pmd_t *new_pmd, bool *need_flush)
> > > > +{
> > > > +   spinlock_t *old_ptl, *new_ptl;
> > > > +   struct mm_struct *mm = vma->vm_mm;
> > > > +
> > > > +   if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK)
> > > > +   || old_end - old_addr < PMD_SIZE)
> > > > +   return false;
> > > > +
> > > > +   /*
> > > > +* The destination pmd shouldn't be established, free_pgtables()
> > > > +* should have release it.
> > > > +*/
> > > > +   if (WARN_ON(!pmd_none(*new_pmd)))
> > > > +   return false;
> > > > +
> > > > +   /*
> > > > +* We don't have to worry about the ordering of src and dst
> > > > +* ptlocks because exclusive mmap_sem prevents deadlock.
> > > > +*/
> > > > +   old_ptl = pmd_lock(vma->vm_mm, old_pmd);
> > > > +   if (old_ptl) {
> > > 
> > > How can it ever be false?

Kirill,
It cannot, you are right. I'll remove the test.

By the way, there are new changes upstream by Linus which flush the TLB
before releasing the ptlock instead of after. I'm guessing that patch came
about because of reviews of this patch and someone spotted an issue in the
existing code :)

Anyway the patch in concern is:
eb66ae030829 ("mremap: properly flush TLB before releasing the page")

I need to rebase on top of that with appropriate modifications, but I worry
that this patch will slow down performance since we have to flush at every
PMD/PTE move before releasing the ptlock. Where as with my patch, the
intention is to flush only at once in the end of move_page_tables. When I
tried to flush TLB on every PMD move, it was quite slow on my arm64 device [2].

Further observation [1] is, it seems like the move_huge_pmds and move_ptes code
is a bit sub optimal in the sense, we are acquiring and releasing the same
ptlock for a bunch of PMDs if the said PMDs are on the same page-table page
right? Instead we can do better by acquiring and release the ptlock less
often.

I think this observation [1] and the frequent TLB flush issue [2] can be solved
by acquiring the ptlock once for a bunch of PMDs, move them all, then flush
the tlb and then release the ptlock, and then proceed to doing the same thing
for the PMDs in the next page-table page. What do you think?

- Joel



Re: [PATCH 2/4] mm: speed up mremap by 500x on large regions (v2)

2018-10-25 Thread Joel Fernandes
On Wed, Oct 24, 2018 at 10:57:33PM +1100, Balbir Singh wrote:
[...]
> > > + pmd_t pmd;
> > > +
> > > + new_ptl = pmd_lockptr(mm, new_pmd);
> 
> 
> Looks like this is largely inspired by move_huge_pmd(), I guess a lot of
> the code applies, why not just reuse as much as possible? The same comments
> w.r.t mmap_sem helping protect against lock order issues applies as well.

I thought about this and when I looked into it, it seemed there are subtle
differences that make such sharing not worth it (or not possible).

 - Joel



Re: [PATCH] seccomp: Add pkru into seccomp_data

2018-10-25 Thread Michael Sammler

On 10/24/2018 08:06 PM, Florian Weimer wrote:


* Michael Sammler:


Add the current value of the PKRU register to data available for
seccomp-bpf programs to work on. This allows filters based on the
currently enabled protection keys.
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 9efc0e73..e8b9ecfc 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -52,12 +52,16 @@
   * @instruction_pointer: at the time of the system call.
   * @args: up to 6 system call arguments always stored as 64-bit values
   *regardless of the architecture.
+ * @pkru: value of the pkru register
+ * @reserved: pad the structure to a multiple of eight bytes
   */
  struct seccomp_data {
int nr;
__u32 arch;
__u64 instruction_pointer;
__u64 args[6];
+   __u32 pkru;
+   __u32 reserved;
  };

This doesn't cover the POWER implementation.  Adding Cc:s.

And I think the kernel shouldn't expose the number of protection keys in
the ABI.

Thanks,
Florian
Thank you for the pointer about the POWER implementation. I am not 
familiar with POWER in general and its protection key feature at all. 
Would the AMR register be the correct register to expose here?


I understand your concern about exposing the number of protection keys 
in the ABI. One idea would be to state, that the pkru field (which 
should probably be renamed) contains an architecture specific value, 
which could then be the PKRU on x86 and AMR (or another register) on 
POWER. This new field should probably be extended to __u64 and the 
reserved field removed.


Another idea would be to not add a field in the seccomp_data structure, 
but instead provide a new BPF instruction, which reads the value of a 
specified protection key.


- Michael


Re: [PATCH 2/4] mm: speed up mremap by 500x on large regions (v2)

2018-10-25 Thread Kirill A. Shutemov
On Wed, Oct 24, 2018 at 07:09:07PM -0700, Joel Fernandes wrote:
> On Wed, Oct 24, 2018 at 03:57:24PM +0300, Kirill A. Shutemov wrote:
> > On Wed, Oct 24, 2018 at 10:57:33PM +1100, Balbir Singh wrote:
> > > On Wed, Oct 24, 2018 at 01:12:56PM +0300, Kirill A. Shutemov wrote:
> > > > On Fri, Oct 12, 2018 at 06:31:58PM -0700, Joel Fernandes (Google) wrote:
> > > > > diff --git a/mm/mremap.c b/mm/mremap.c
> > > > > index 9e68a02a52b1..2fd163cff406 100644
> > > > > --- a/mm/mremap.c
> > > > > +++ b/mm/mremap.c
> > > > > @@ -191,6 +191,54 @@ static void move_ptes(struct vm_area_struct 
> > > > > *vma, pmd_t *old_pmd,
> > > > >   drop_rmap_locks(vma);
> > > > >  }
> > > > >  
> > > > > +static bool move_normal_pmd(struct vm_area_struct *vma, unsigned 
> > > > > long old_addr,
> > > > > +   unsigned long new_addr, unsigned long old_end,
> > > > > +   pmd_t *old_pmd, pmd_t *new_pmd, bool *need_flush)
> > > > > +{
> > > > > + spinlock_t *old_ptl, *new_ptl;
> > > > > + struct mm_struct *mm = vma->vm_mm;
> > > > > +
> > > > > + if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK)
> > > > > + || old_end - old_addr < PMD_SIZE)
> > > > > + return false;
> > > > > +
> > > > > + /*
> > > > > +  * The destination pmd shouldn't be established, free_pgtables()
> > > > > +  * should have release it.
> > > > > +  */
> > > > > + if (WARN_ON(!pmd_none(*new_pmd)))
> > > > > + return false;
> > > > > +
> > > > > + /*
> > > > > +  * We don't have to worry about the ordering of src and dst
> > > > > +  * ptlocks because exclusive mmap_sem prevents deadlock.
> > > > > +  */
> > > > > + old_ptl = pmd_lock(vma->vm_mm, old_pmd);
> > > > > + if (old_ptl) {
> > > > 
> > > > How can it ever be false?
> 
> Kirill,
> It cannot, you are right. I'll remove the test.
> 
> By the way, there are new changes upstream by Linus which flush the TLB
> before releasing the ptlock instead of after. I'm guessing that patch came
> about because of reviews of this patch and someone spotted an issue in the
> existing code :)
> 
> Anyway the patch in concern is:
> eb66ae030829 ("mremap: properly flush TLB before releasing the page")
> 
> I need to rebase on top of that with appropriate modifications, but I worry
> that this patch will slow down performance since we have to flush at every
> PMD/PTE move before releasing the ptlock. Where as with my patch, the
> intention is to flush only at once in the end of move_page_tables. When I
> tried to flush TLB on every PMD move, it was quite slow on my arm64 device 
> [2].
> 
> Further observation [1] is, it seems like the move_huge_pmds and move_ptes 
> code
> is a bit sub optimal in the sense, we are acquiring and releasing the same
> ptlock for a bunch of PMDs if the said PMDs are on the same page-table page
> right? Instead we can do better by acquiring and release the ptlock less
> often.
> 
> I think this observation [1] and the frequent TLB flush issue [2] can be 
> solved
> by acquiring the ptlock once for a bunch of PMDs, move them all, then flush
> the tlb and then release the ptlock, and then proceed to doing the same thing
> for the PMDs in the next page-table page. What do you think?

Yeah, that's viable optimization.

The tricky part is that one PMD page table can have PMD entires of
different types: THP, page table that you can move as whole and the one
that you cannot (for any reason).

If we cannot move the PMD entry as a whole and must go to PTE page table
we would need to drop PMD ptl and take PTE ptl (it might be the same lock
in some configuations).

Also we don't want to take PMD lock unless it's required.

I expect it to be not very trivial to get everything right. But take a
shot :)

-- 
 Kirill A. Shutemov


[PATCH] powerpc/process: Fix flush_all_to_thread for SPE

2018-10-25 Thread DATACOM - Felipe.Rechia
From: "Felipe Rechia" 
Date: Wed, 24 Oct 2018 10:57:22 -0300
Subject: [PATCH] powerpc/process: Fix flush_all_to_thread for SPE

Fix a bug introduced by the creation of flush_all_to_thread() for
processors that have SPE (Signal Processing Engine) and use it to
compute floating-point operations.

>From userspace perspective, the problem was seen in attempts of
computing floating-point operations which should generate exceptions.
For example:

  fork();
  float x = 0.0 / 0.0;
  isnan(x);   // forked process returns False (should be True)

The operation above also should always cause the SPEFSCR FINV bit to
be set. However, the SPE floating-point exceptions were turned off
after a fork().

Kernel versions prior to the bug used flush_spe_to_thread(), which
first saves SPEFSCR register values in tsk->thread and then calls
giveup_spe(tsk).

After commit 579e633e764e, the save_all() function was called first
to giveup_spe(), and then the SPEFSCR register values were saved in
tsk->thread. This would save the SPEFSCR register values after
disabling SPE for that thread, causing the bug described above.

Fixes 579e633e764e ("powerpc: create flush_all_to_thread()")
Signed-off-by: felipe.rechia 
---
 arch/powerpc/kernel/process.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index a0c74bb..16eb428 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -566,12 +566,11 @@ void flush_all_to_thread(struct task_struct *tsk)
if (tsk->thread.regs) {
preempt_disable();
BUG_ON(tsk != current);
-   save_all(tsk);
-
 #ifdef CONFIG_SPE
if (tsk->thread.regs->msr & MSR_SPE)
tsk->thread.spefscr = mfspr(SPRN_SPEFSCR);
 #endif
+   save_all(tsk);
 
preempt_enable();
}
-- 
2.7.4


Re: [PATCH v5 00/18] of: overlay: validation checks, subsequent fixes

2018-10-25 Thread Alan Tull
On Wed, Oct 24, 2018 at 2:57 PM Rob Herring  wrote:
>
> On Mon, Oct 22, 2018 at 4:25 PM Alan Tull  wrote:
> >
> > On Thu, Oct 18, 2018 at 5:48 PM  wrote:
> > >
> > > From: Frank Rowand 
> > >
> > > Add checks to (1) overlay apply process and (2) memory freeing
> > > triggered by overlay release.  The checks are intended to detect
> > > possible memory leaks and invalid overlays.
> >
> > I've tested v5, nothing new to report.
>
> Does that mean everything broken or everything works great? In the
> latter case, care to give a Tested-by.
>
> Rob

Tested-by: Alan Tull 

Alan


Re: [PATCH] seccomp: Add pkru into seccomp_data

2018-10-25 Thread Michael Sammler

On 10/25/2018 11:12 AM, Florian Weimer wrote:

I understand your concern about exposing the number of protection keys
in the ABI. One idea would be to state, that the pkru field (which
should probably be renamed) contains an architecture specific value,
which could then be the PKRU on x86 and AMR (or another register) on
POWER. This new field should probably be extended to __u64 and the
reserved field removed.

POWER also has proper read/write bit separation, not PKEY_DISABLE_ACCESS
(disable read and write) and PKEY_DISABLE_WRITE like Intel.  It's
currently translated by the kernel, but I really need a
PKEY_DISABLE_READ bit in glibc to implement pkey_get in case the memory
is write-only.
The idea here would be to simply provide the raw value of the register 
(PKRU on x86, AMR on POWER) to the BPF program and let the BPF program 
(or maybe a higher level library like libseccomp) deal with the 
complications of interpreting this architecture specific value (similar 
how the BPF program currently already has to deal with architecture 
specific system call numbers). If an architecture were to support more 
protection keys than fit into the field, the architecture specific value 
stored in the field might simply be the first protection keys. If there 
was interest, it would be possible to add more architecture specific 
fields to seccomp_data.

Another idea would be to not add a field in the seccomp_data
structure, but instead provide a new BPF instruction, which reads the
value of a specified protection key.

I would prefer that if it's possible.  We should make sure that the bits
are the same as those returned from pkey_get.  I have an implementation
on POWER, but have yet to figure out the implications for 32-bit because
I do not know the AMR register size there.

Thanks,
Florian
I have had a look at how BPF is implemented and it does not seem to be 
easy to just add an BPF instruction for seccomp since (as far as I 
understand) the code of the classical BPF (as used by seccomp) is shared 
with the code of eBPF, which is used in many parts of the kernel and 
there is at least one interpreter and one JIT compiler for BPF. But 
maybe someone with more experience than me can comment on how hard it 
would be to add an instruction to BPF.


- Michael


Re: [PATCH v2 0/2] arm64: Cut rebuild time when changing CONFIG_BLK_DEV_INITRD

2018-10-25 Thread Mike Rapoport
On Thu, Oct 25, 2018 at 08:15:15AM -0500, Rob Herring wrote:
> +Ard
> 
> On Thu, Oct 25, 2018 at 4:38 AM Mike Rapoport  wrote:
> >
> > On Wed, Oct 24, 2018 at 02:55:17PM -0500, Rob Herring wrote:
> > > On Wed, Oct 24, 2018 at 2:33 PM Florian Fainelli  
> > > wrote:
> > > >
> > > > Hi all,
> > > >
> > > > While investigating why ARM64 required a ton of objects to be rebuilt
> > > > when toggling CONFIG_DEV_BLK_INITRD, it became clear that this was
> > > > because we define __early_init_dt_declare_initrd() differently and we do
> > > > that in arch/arm64/include/asm/memory.h which gets included by a fair
> > > > amount of other header files, and translation units as well.
> > >
> > > I scratch my head sometimes as to why some config options rebuild so
> > > much stuff. One down, ? to go. :)
> > >
> > > > Changing the value of CONFIG_DEV_BLK_INITRD is a common thing with build
> > > > systems that generate two kernels: one with the initramfs and one
> > > > without. buildroot is one of these build systems, OpenWrt is also
> > > > another one that does this.
> > > >
> > > > This patch series proposes adding an empty initrd.h to satisfy the need
> > > > for drivers/of/fdt.c to unconditionally include that file, and moves the
> > > > custom __early_init_dt_declare_initrd() definition away from
> > > > asm/memory.h
> > > >
> > > > This cuts the number of objects rebuilds from 1920 down to 26, so a
> > > > factor 73 approximately.
> > > >
> > > > Apologies for the long CC list, please let me know how you would go
> > > > about merging that and if another approach would be preferable, e.g:
> > > > introducing a CONFIG_ARCH_INITRD_BELOW_START_OK Kconfig option or
> > > > something like that.
> > >
> > > There may be a better way as of 4.20 because bootmem is now gone and
> > > only memblock is used. This should unify what each arch needs to do
> > > with initrd early. We need the physical address early for memblock
> > > reserving. Then later on we need the virtual address to access the
> > > initrd. Perhaps we should just change initrd_start and initrd_end to
> > > physical addresses (or add 2 new variables would be less invasive and
> > > allow for different translation than __va()). The sanity checks and
> > > memblock reserve could also perhaps be moved to a common location.
> > >
> > > Alternatively, given arm64 is the only oddball, I'd be fine with an
> > > "if (IS_ENABLED(CONFIG_ARM64))" condition in the default
> > > __early_init_dt_declare_initrd as long as we have a path to removing
> > > it like the above option.
> >
> > I think arm64 does not have to redefine __early_init_dt_declare_initrd().
> > Something like this might be just all we need (completely untested,
> > probably it won't even compile):
> >
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index 9d9582c..e9ca238 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -62,6 +62,9 @@ s64 memstart_addr __ro_after_init = -1;
> >  phys_addr_t arm64_dma_phys_limit __ro_after_init;
> >
> >  #ifdef CONFIG_BLK_DEV_INITRD
> > +
> > +static phys_addr_t initrd_start_phys, initrd_end_phys;
> > +
> >  static int __init early_initrd(char *p)
> >  {
> > unsigned long start, size;
> > @@ -71,8 +74,8 @@ static int __init early_initrd(char *p)
> > if (*endp == ',') {
> > size = memparse(endp + 1, NULL);
> >
> > -   initrd_start = start;
> > -   initrd_end = start + size;
> > +   initrd_start_phys = start;
> > +   initrd_end_phys = end;
> > }
> > return 0;
> >  }
> > @@ -407,14 +410,27 @@ void __init arm64_memblock_init(void)
> > memblock_add(__pa_symbol(_text), (u64)(_end - _text));
> > }
> >
> > -   if (IS_ENABLED(CONFIG_BLK_DEV_INITRD) && initrd_start) {
> > +   if (IS_ENABLED(CONFIG_BLK_DEV_INITRD) &&
> > +   (initrd_start || initrd_start_phys)) {
> > +   /*
> > +* FIXME: ensure proper precendence between
> > +* early_initrd and DT when both are present
> 
> Command line takes precedence, so just reverse the order.
> 
> > +*/
> > +   if (initrd_start) {
> > +   initrd_start_phys = __phys_to_virt(initrd_start);
> > +   initrd_end_phys = __phys_to_virt(initrd_end);
> 
> AIUI, the original issue was doing the P2V translation was happening
> too early and the VA could be wrong if the linear range is adjusted.
> So I don't think this would work.

Probably things have changed since then, but in the current code there is

initrd_start = __phys_to_virt(initrd_start);

and in between only the code related to CONFIG_RANDOMIZE_BASE, so I believe
it's safe to use __phys_to_virt() here as well.
 
> I suppose you could convert the VA back to a PA before any adjustments
> and then back to a VA again after. But that's kind of hacky. 2 wrongs
> making a right.
> 
> > +

Re: [PATCH v6 5/6] arm64: dts: add QorIQ LX2160A SoC support

2018-10-25 Thread Li Yang
On Thu, Oct 25, 2018 at 2:03 AM Vabhav Sharma  wrote:
>
> LX2160A SoC is based on Layerscape Chassis Generation 3.2 Architecture.
>
> LX2160A features an advanced 16 64-bit ARM v8 CortexA72 processor cores
> in 8 cluster, CCN508, GICv3,two 64-bit DDR4 memory controller, 8 I2C
> controllers, 3 dspi, 2 esdhc,2 USB 3.0, mmu 500, 3 SATA, 4 PL011 SBSA
> UARTs etc.
>
> Signed-off-by: Ramneek Mehresh 
> Signed-off-by: Zhang Ying-22455 
> Signed-off-by: Nipun Gupta 
> Signed-off-by: Priyanka Jain 
> Signed-off-by: Yogesh Gaur 
> Signed-off-by: Sriram Dash 
> Signed-off-by: Vabhav Sharma 
> Signed-off-by: Horia Geanta 
> Signed-off-by: Ran Wang 
> Signed-off-by: Yinbo Zhu 
> ---
>  arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi | 766 
> +
>  1 file changed, 766 insertions(+)
>  create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
>
> diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi 
> b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> new file mode 100644
> index 000..9fcfd48
> --- /dev/null
> +++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> @@ -0,0 +1,766 @@
> +// SPDX-License-Identifier: (GPL-2.0 OR MIT)
> +//
> +// Device Tree Include file for Layerscape-LX2160A family SoC.
> +//
> +// Copyright 2018 NXP
> +
> +#include 
> +#include 
> +
> +/memreserve/ 0x8000 0x0001;
> +
> +/ {
> +   compatible = "fsl,lx2160a";
> +   interrupt-parent = <&gic>;
> +   #address-cells = <2>;
> +   #size-cells = <2>;
> +
> +   cpus {
> +   #address-cells = <1>;
> +   #size-cells = <0>;
> +
> +   // 8 clusters having 2 Cortex-A72 cores each
> +   cpu@0 {
> +   device_type = "cpu";
> +   compatible = "arm,cortex-a72";
> +   enable-method = "psci";
> +   reg = <0x0>;
> +   clocks = <&clockgen 1 0>;
> +   d-cache-size = <0x8000>;
> +   d-cache-line-size = <64>;
> +   d-cache-sets = <128>;
> +   i-cache-size = <0xC000>;
> +   i-cache-line-size = <64>;
> +   i-cache-sets = <192>;
> +   next-level-cache = <&cluster0_l2>;
> +   };
> +
> +   cpu@1 {
> +   device_type = "cpu";
> +   compatible = "arm,cortex-a72";
> +   enable-method = "psci";
> +   reg = <0x1>;
> +   clocks = <&clockgen 1 0>;
> +   d-cache-size = <0x8000>;
> +   d-cache-line-size = <64>;
> +   d-cache-sets = <128>;
> +   i-cache-size = <0xC000>;
> +   i-cache-line-size = <64>;
> +   i-cache-sets = <192>;
> +   next-level-cache = <&cluster0_l2>;
> +   };
> +
> +   cpu@100 {
> +   device_type = "cpu";
> +   compatible = "arm,cortex-a72";
> +   enable-method = "psci";
> +   reg = <0x100>;
> +   clocks = <&clockgen 1 1>;
> +   d-cache-size = <0x8000>;
> +   d-cache-line-size = <64>;
> +   d-cache-sets = <128>;
> +   i-cache-size = <0xC000>;
> +   i-cache-line-size = <64>;
> +   i-cache-sets = <192>;
> +   next-level-cache = <&cluster1_l2>;
> +   };
> +
> +   cpu@101 {
> +   device_type = "cpu";
> +   compatible = "arm,cortex-a72";
> +   enable-method = "psci";
> +   reg = <0x101>;
> +   clocks = <&clockgen 1 1>;
> +   d-cache-size = <0x8000>;
> +   d-cache-line-size = <64>;
> +   d-cache-sets = <128>;
> +   i-cache-size = <0xC000>;
> +   i-cache-line-size = <64>;
> +   i-cache-sets = <192>;
> +   next-level-cache = <&cluster1_l2>;
> +   };
> +
> +   cpu@200 {
> +   device_type = "cpu";
> +   compatible = "arm,cortex-a72";
> +   enable-method = "psci";
> +   reg = <0x200>;
> +   clocks = <&clockgen 1 2>;
> +   d-cache-size = <0x8000>;
> +   d-cache-line-size = <64>;
> +   d-cache-sets = <128>;
> +   i-cache-size = <0xC000>;
> +   i-cache-line-size = <64>;
> +   i-cache-sets = <192>;
> +   next-level-cache = <&cluster2_l2>;
> +   };
> +
> +   cpu@201 {
> +

[PATCH v1 0/5] Add dtl_entry tracepoint

2018-10-25 Thread Naveen N. Rao
This is v1 of the patches for providing a tracepoint for processing the 
dispatch trace log entries from the hypervisor in a shared processor 
LPAR. The previous RFC can be found here:
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=66340

Since the RFC, this series has been expanded/generalized to support 
!CONFIG_VIRT_CPU_ACCOUNTING_NATIVE and has been tested in different 
configurations. The dispatch distance calculation has also been updated 
to use the platform provided information better.

Also, patch 3 is new and fixes an issue with stolen time accounting when 
the dtl debugfs interface is in use.

- Naveen


Naveen N. Rao (5):
  powerpc/pseries: Use macros for referring to the DTL enable mask
  powerpc/pseries: Do not save the previous DTL mask value
  powerpc/pseries: Fix stolen time accounting when dtl debugfs is used
  powerpc/pseries: Factor out DTL buffer allocation and registration
routines
  powerpc/pseries: Introduce dtl_entry tracepoint

 arch/powerpc/include/asm/lppaca.h |  11 +
 arch/powerpc/include/asm/plpar_wrappers.h |   9 +
 arch/powerpc/include/asm/trace.h  |  55 +
 arch/powerpc/kernel/entry_64.S|  39 
 arch/powerpc/kernel/time.c|   7 +-
 arch/powerpc/mm/numa.c| 144 -
 arch/powerpc/platforms/pseries/dtl.c  |  22 +-
 arch/powerpc/platforms/pseries/lpar.c | 249 --
 arch/powerpc/platforms/pseries/setup.c|  34 +--
 9 files changed, 502 insertions(+), 68 deletions(-)

-- 
2.19.1



[PATCH v1 1/5] powerpc/pseries: Use macros for referring to the DTL enable mask

2018-10-25 Thread Naveen N. Rao
Introduce macros to encode the DTL enable mask fields and use those
instead of hardcoding numbers.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/include/asm/lppaca.h  | 11 +++
 arch/powerpc/platforms/pseries/dtl.c   |  8 +---
 arch/powerpc/platforms/pseries/lpar.c  |  2 +-
 arch/powerpc/platforms/pseries/setup.c |  2 +-
 4 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/lppaca.h 
b/arch/powerpc/include/asm/lppaca.h
index 7c23ce8a5a4c..2c7e31187726 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -154,6 +154,17 @@ struct dtl_entry {
 #define DISPATCH_LOG_BYTES 4096/* bytes per cpu */
 #define N_DISPATCH_LOG (DISPATCH_LOG_BYTES / sizeof(struct dtl_entry))
 
+/*
+ * Dispatch trace log event enable mask:
+ *   0x1: voluntary virtual processor waits
+ *   0x2: time-slice preempts
+ *   0x4: virtual partition memory page faults
+ */
+#define DTL_LOG_CEDE   0x1
+#define DTL_LOG_PREEMPT0x2
+#define DTL_LOG_FAULT  0x4
+#define DTL_LOG_ALL(DTL_LOG_CEDE | DTL_LOG_PREEMPT | DTL_LOG_FAULT)
+
 extern struct kmem_cache *dtl_cache;
 
 /*
diff --git a/arch/powerpc/platforms/pseries/dtl.c 
b/arch/powerpc/platforms/pseries/dtl.c
index ef6595153642..051ea2de1e1a 100644
--- a/arch/powerpc/platforms/pseries/dtl.c
+++ b/arch/powerpc/platforms/pseries/dtl.c
@@ -40,13 +40,7 @@ struct dtl {
 };
 static DEFINE_PER_CPU(struct dtl, cpu_dtl);
 
-/*
- * Dispatch trace log event mask:
- * 0x7: 0x1: voluntary virtual processor waits
- *  0x2: time-slice preempts
- *  0x4: virtual partition memory page faults
- */
-static u8 dtl_event_mask = 0x7;
+static u8 dtl_event_mask = DTL_LOG_ALL;
 
 
 /*
diff --git a/arch/powerpc/platforms/pseries/lpar.c 
b/arch/powerpc/platforms/pseries/lpar.c
index 0b5081085a44..ad194420e8ae 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -125,7 +125,7 @@ void vpa_init(int cpu)
pr_err("WARNING: DTL registration of cpu %d (hw %d) "
   "failed with %ld\n", smp_processor_id(),
   hwcpu, ret);
-   lppaca_of(cpu).dtl_enable_mask = 2;
+   lppaca_of(cpu).dtl_enable_mask = DTL_LOG_PREEMPT;
}
 }
 
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 0f553dcfa548..f3b5822e88c6 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -306,7 +306,7 @@ static int alloc_dispatch_logs(void)
pr_err("WARNING: DTL registration of cpu %d (hw %d) failed "
   "with %d\n", smp_processor_id(),
   hard_smp_processor_id(), ret);
-   get_paca()->lppaca_ptr->dtl_enable_mask = 2;
+   get_paca()->lppaca_ptr->dtl_enable_mask = DTL_LOG_PREEMPT;
 
return 0;
 }
-- 
2.19.1



[PATCH v1 2/5] powerpc/pseries: Do not save the previous DTL mask value

2018-10-25 Thread Naveen N. Rao
When CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is enabled, we always initialize
DTL enable mask to DTL_LOG_PREEMPT (0x2). There are no other places
where the mask is changed. As such, when reading the DTL log buffer
through debugfs, there is no need to save and restore the previous mask
value.

We don't need to save and restore the earlier mask value if
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not enabled. So, remove the field
from the structure as well.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/platforms/pseries/dtl.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/dtl.c 
b/arch/powerpc/platforms/pseries/dtl.c
index 051ea2de1e1a..fb05804adb2f 100644
--- a/arch/powerpc/platforms/pseries/dtl.c
+++ b/arch/powerpc/platforms/pseries/dtl.c
@@ -55,7 +55,6 @@ struct dtl_ring {
struct dtl_entry *write_ptr;
struct dtl_entry *buf;
struct dtl_entry *buf_end;
-   u8  saved_dtl_mask;
 };
 
 static DEFINE_PER_CPU(struct dtl_ring, dtl_rings);
@@ -105,7 +104,6 @@ static int dtl_start(struct dtl *dtl)
dtlr->write_ptr = dtl->buf;
 
/* enable event logging */
-   dtlr->saved_dtl_mask = lppaca_of(dtl->cpu).dtl_enable_mask;
lppaca_of(dtl->cpu).dtl_enable_mask |= dtl_event_mask;
 
dtl_consumer = consume_dtle;
@@ -123,7 +121,7 @@ static void dtl_stop(struct dtl *dtl)
dtlr->buf = NULL;
 
/* restore dtl_enable_mask */
-   lppaca_of(dtl->cpu).dtl_enable_mask = dtlr->saved_dtl_mask;
+   lppaca_of(dtl->cpu).dtl_enable_mask = DTL_LOG_PREEMPT;
 
if (atomic_dec_and_test(&dtl_count))
dtl_consumer = NULL;
-- 
2.19.1



[PATCH v1 3/5] powerpc/pseries: Fix stolen time accounting when dtl debugfs is used

2018-10-25 Thread Naveen N. Rao
When the dtl debugfs interface is used, we usually set the
dtl_enable_mask to 0x7 (DTL_LOG_ALL). When this happens, we start seeing
DTL entries for all preempt reasons, including CEDE. In
scan_dispatch_log(), we add up the times from all entries and account
those towards stolen time. However, we should only be accounting stolen
time when the preemption was due to HDEC at the end of our time slice.

Fix this by checking for the dispatch reason in the DTL entry before
adding to the stolen time.

Fixes: cf9efce0ce313 ("powerpc: Account time using timebase rather than PURR")
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/kernel/time.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 40868f3ee113..923abc3e555d 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -199,7 +199,7 @@ static u64 scan_dispatch_log(u64 stop_tb)
struct lppaca *vpa = local_paca->lppaca_ptr;
u64 tb_delta;
u64 stolen = 0;
-   u64 dtb;
+   u64 dtb, dispatch_reason;
 
if (!dtl)
return 0;
@@ -210,6 +210,7 @@ static u64 scan_dispatch_log(u64 stop_tb)
dtb = be64_to_cpu(dtl->timebase);
tb_delta = be32_to_cpu(dtl->enqueue_to_dispatch_time) +
be32_to_cpu(dtl->ready_to_enqueue_time);
+   dispatch_reason = dtl->dispatch_reason;
barrier();
if (i + N_DISPATCH_LOG < be64_to_cpu(vpa->dtl_idx)) {
/* buffer has overflowed */
@@ -221,7 +222,9 @@ static u64 scan_dispatch_log(u64 stop_tb)
break;
if (dtl_consumer)
dtl_consumer(dtl, i);
-   stolen += tb_delta;
+   /* 7 indicates that this dispatch follows a time slice preempt 
*/
+   if (dispatch_reason == 7)
+   stolen += tb_delta;
++i;
++dtl;
if (dtl == dtl_end)
-- 
2.19.1



[PATCH v1 4/5] powerpc/pseries: Factor out DTL buffer allocation and registration routines

2018-10-25 Thread Naveen N. Rao
Introduce new helpers for DTL buffer allocation and registration and
have the existing code use those.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/include/asm/plpar_wrappers.h |  2 +
 arch/powerpc/platforms/pseries/lpar.c | 66 ---
 arch/powerpc/platforms/pseries/setup.c| 34 +---
 3 files changed, 52 insertions(+), 50 deletions(-)

diff --git a/arch/powerpc/include/asm/plpar_wrappers.h 
b/arch/powerpc/include/asm/plpar_wrappers.h
index cff5a411e595..7dcbf42e9e11 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -88,6 +88,8 @@ static inline long register_dtl(unsigned long cpu, unsigned 
long vpa)
return vpa_call(H_VPA_REG_DTL, cpu, vpa);
 }
 
+extern void alloc_dtl_buffers(void);
+extern void register_dtl_buffer(int cpu);
 extern void vpa_init(int cpu);
 
 static inline long plpar_pte_enter(unsigned long flags,
diff --git a/arch/powerpc/platforms/pseries/lpar.c 
b/arch/powerpc/platforms/pseries/lpar.c
index ad194420e8ae..d83bb3db6767 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -64,13 +64,58 @@ EXPORT_SYMBOL(plpar_hcall);
 EXPORT_SYMBOL(plpar_hcall9);
 EXPORT_SYMBOL(plpar_hcall_norets);
 
+void alloc_dtl_buffers(void)
+{
+   int cpu;
+   struct paca_struct *pp;
+   struct dtl_entry *dtl;
+
+   for_each_possible_cpu(cpu) {
+   pp = paca_ptrs[cpu];
+   dtl = kmem_cache_alloc(dtl_cache, GFP_KERNEL);
+   if (!dtl) {
+   pr_warn("Failed to allocate dispatch trace log for cpu 
%d\n",
+   cpu);
+   pr_warn("Stolen time statistics will be unreliable\n");
+   break;
+   }
+
+   pp->dtl_ridx = 0;
+   pp->dispatch_log = dtl;
+   pp->dispatch_log_end = dtl + N_DISPATCH_LOG;
+   pp->dtl_curr = dtl;
+   }
+}
+
+void register_dtl_buffer(int cpu)
+{
+   long ret;
+   struct paca_struct *pp;
+   struct dtl_entry *dtl;
+   int hwcpu = get_hard_smp_processor_id(cpu);
+
+   pp = paca_ptrs[cpu];
+   dtl = pp->dispatch_log;
+   if (dtl) {
+   pp->dtl_ridx = 0;
+   pp->dtl_curr = dtl;
+   lppaca_of(cpu).dtl_idx = 0;
+
+   /* hypervisor reads buffer length from this field */
+   dtl->enqueue_to_dispatch_time = cpu_to_be32(DISPATCH_LOG_BYTES);
+   ret = register_dtl(hwcpu, __pa(dtl));
+   if (ret)
+   pr_err("WARNING: DTL registration of cpu %d (hw %d) "
+  "failed with %ld\n", cpu, hwcpu, ret);
+   lppaca_of(cpu).dtl_enable_mask = DTL_LOG_PREEMPT;
+   }
+}
+
 void vpa_init(int cpu)
 {
int hwcpu = get_hard_smp_processor_id(cpu);
unsigned long addr;
long ret;
-   struct paca_struct *pp;
-   struct dtl_entry *dtl;
 
/*
 * The spec says it "may be problematic" if CPU x registers the VPA of
@@ -111,22 +156,7 @@ void vpa_init(int cpu)
/*
 * Register dispatch trace log, if one has been allocated.
 */
-   pp = paca_ptrs[cpu];
-   dtl = pp->dispatch_log;
-   if (dtl) {
-   pp->dtl_ridx = 0;
-   pp->dtl_curr = dtl;
-   lppaca_of(cpu).dtl_idx = 0;
-
-   /* hypervisor reads buffer length from this field */
-   dtl->enqueue_to_dispatch_time = cpu_to_be32(DISPATCH_LOG_BYTES);
-   ret = register_dtl(hwcpu, __pa(dtl));
-   if (ret)
-   pr_err("WARNING: DTL registration of cpu %d (hw %d) "
-  "failed with %ld\n", smp_processor_id(),
-  hwcpu, ret);
-   lppaca_of(cpu).dtl_enable_mask = DTL_LOG_PREEMPT;
-   }
+   register_dtl_buffer(cpu);
 }
 
 #ifdef CONFIG_PPC_BOOK3S_64
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index f3b5822e88c6..be6a3845b7ea 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -267,46 +267,16 @@ struct kmem_cache *dtl_cache;
  */
 static int alloc_dispatch_logs(void)
 {
-   int cpu, ret;
-   struct paca_struct *pp;
-   struct dtl_entry *dtl;
-
if (!firmware_has_feature(FW_FEATURE_SPLPAR))
return 0;
 
if (!dtl_cache)
return 0;
 
-   for_each_possible_cpu(cpu) {
-   pp = paca_ptrs[cpu];
-   dtl = kmem_cache_alloc(dtl_cache, GFP_KERNEL);
-   if (!dtl) {
-   pr_warn("Failed to allocate dispatch trace log for cpu 
%d\n",
-   cpu);
-   pr_warn("Stolen time statistics will be unreliable\n");
-   break;
-   }
-
-   pp->dtl_ridx = 0;
-

[PATCH v1 5/5] powerpc/pseries: Introduce dtl_entry tracepoint

2018-10-25 Thread Naveen N. Rao
This tracepoint provides access to the fields of each DTL entry in the
Dispatch Trace Log buffer. Since the buffer is populated by the
hypervisor and since we allocate just a 4k area per cpu for the buffer,
we need to process the entries on a regular basis before they are
overwritten by the hypervisor. We do this by using a static branch (or a
reference counter if we don't have jump labels) in ret_from_except
similar to how the hcall/opal tracepoints do.

Apart from making the DTL entries available for processing through the
usual trace interface, this tracepoint also adds a new field 'distance'
to each DTL entry, enabling enhanced statistics around the vcpu dispatch
behavior of the hypervisor.

For Shared Processor LPARs, the POWER Hypervisor maintains a relatively
static mapping of LPAR vcpus to physical processor cores and tries to
always dispatch vcpus on their associated physical processor core. The
LPAR can discover this through the H_VPHN(flags=1) hcall to obtain the
associativity of the LPAR vcpus.

However, under certain scenarios, vcpus may be dispatched on a different
processor core. The actual physical processor number on which a certain
vcpu is dispatched is available to the LPAR in the 'processor_id' field
of each DTL entry. The LPAR can then discover the associativity of that
physical processor through the H_VPHN(flags=2) hcall. This can then be
compared to the home node associativity for that specific vcpu to
determine if the vcpu was dispatched on the same core or not.  If the
vcpu was not dispatched on the home node, it is possible to determine if
the vcpu was dispatched in a different chip, socket or drawer.

The tracepoint field 'distance' encodes this information. If distance is
0, then the vcpu was dispatched on its home node/chip. If not,
increasing values of 'distance' indicate a dispatch on a different chip
in a MCM, different socket or a different drawer.

In terms of the implementation, we update our numa code to retain the
vcpu associativity that is retrieved while discovering our numa
topology. In addition, on tracepoint registration, we discover the
physical cpu associativity. This information is only retrieved during
the tracepoint registration and is not expected to change for the
duration of the trace.

To support configurations with/without CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
selected, we generalize and extend helpers for DTL buffer allocation,
freeing and registration. We also introduce a global variable 'dtl_mask'
to encode the DTL enable mask to be set for all cpus. This helps ensure
that cpus that come online honor the global enable mask.

Finally, to ensure that the new dtl_entry tracepoint usage does not
interfere with the dtl debugfs interface, we introduce helpers to ensure
only one of the two interfaces are used at any point in time.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/include/asm/plpar_wrappers.h |   7 +
 arch/powerpc/include/asm/trace.h  |  55 +++
 arch/powerpc/kernel/entry_64.S|  39 +
 arch/powerpc/mm/numa.c| 144 -
 arch/powerpc/platforms/pseries/dtl.c  |  10 +-
 arch/powerpc/platforms/pseries/lpar.c | 187 +-
 6 files changed, 434 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/plpar_wrappers.h 
b/arch/powerpc/include/asm/plpar_wrappers.h
index 7dcbf42e9e11..029f019ddfb6 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -88,7 +88,14 @@ static inline long register_dtl(unsigned long cpu, unsigned 
long vpa)
return vpa_call(H_VPA_REG_DTL, cpu, vpa);
 }
 
+extern void dtl_entry_tracepoint_enable(void);
+extern void dtl_entry_tracepoint_disable(void);
+extern int register_dtl_buffer_access(int global);
+extern void unregister_dtl_buffer_access(int global);
+extern void set_dtl_mask(u8 mask);
+extern void reset_dtl_mask(void);
 extern void alloc_dtl_buffers(void);
+extern void free_dtl_buffers(void);
 extern void register_dtl_buffer(int cpu);
 extern void vpa_init(int cpu);
 
diff --git a/arch/powerpc/include/asm/trace.h b/arch/powerpc/include/asm/trace.h
index d018e8602694..bcb8d66d3232 100644
--- a/arch/powerpc/include/asm/trace.h
+++ b/arch/powerpc/include/asm/trace.h
@@ -101,6 +101,61 @@ TRACE_EVENT_FN_COND(hcall_exit,
 
hcall_tracepoint_regfunc, hcall_tracepoint_unregfunc
 );
+
+#ifdef CONFIG_PPC_SPLPAR
+extern int dtl_entry_tracepoint_regfunc(void);
+extern void dtl_entry_tracepoint_unregfunc(void);
+extern u8 compute_dispatch_distance(unsigned int pcpu);
+
+TRACE_EVENT_FN(dtl_entry,
+
+   TP_PROTO(u8 dispatch_reason, u8 preempt_reason, u16 processor_id,
+   u32 enqueue_to_dispatch_time, u32 ready_to_enqueue_time,
+   u32 waiting_to_ready_time, u64 timebase, u64 fault_addr,
+   u64 srr0, u64 srr1),
+
+   TP_ARGS(dispatch_reason, preempt_reason, processor_id,
+   enqueue_to_dispatch_time, ready_to_enqueue_time,
+  

Re: [PATCH v2 0/2] arm64: Cut rebuild time when changing CONFIG_BLK_DEV_INITRD

2018-10-25 Thread Rob Herring
On Thu, Oct 25, 2018 at 12:30 PM Mike Rapoport  wrote:
>
> On Thu, Oct 25, 2018 at 08:15:15AM -0500, Rob Herring wrote:
> > +Ard
> >
> > On Thu, Oct 25, 2018 at 4:38 AM Mike Rapoport  wrote:
> > >
> > > On Wed, Oct 24, 2018 at 02:55:17PM -0500, Rob Herring wrote:
> > > > On Wed, Oct 24, 2018 at 2:33 PM Florian Fainelli  
> > > > wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > While investigating why ARM64 required a ton of objects to be rebuilt
> > > > > when toggling CONFIG_DEV_BLK_INITRD, it became clear that this was
> > > > > because we define __early_init_dt_declare_initrd() differently and we 
> > > > > do
> > > > > that in arch/arm64/include/asm/memory.h which gets included by a fair
> > > > > amount of other header files, and translation units as well.
> > > >
> > > > I scratch my head sometimes as to why some config options rebuild so
> > > > much stuff. One down, ? to go. :)
> > > >
> > > > > Changing the value of CONFIG_DEV_BLK_INITRD is a common thing with 
> > > > > build
> > > > > systems that generate two kernels: one with the initramfs and one
> > > > > without. buildroot is one of these build systems, OpenWrt is also
> > > > > another one that does this.
> > > > >
> > > > > This patch series proposes adding an empty initrd.h to satisfy the 
> > > > > need
> > > > > for drivers/of/fdt.c to unconditionally include that file, and moves 
> > > > > the
> > > > > custom __early_init_dt_declare_initrd() definition away from
> > > > > asm/memory.h
> > > > >
> > > > > This cuts the number of objects rebuilds from 1920 down to 26, so a
> > > > > factor 73 approximately.
> > > > >
> > > > > Apologies for the long CC list, please let me know how you would go
> > > > > about merging that and if another approach would be preferable, e.g:
> > > > > introducing a CONFIG_ARCH_INITRD_BELOW_START_OK Kconfig option or
> > > > > something like that.
> > > >
> > > > There may be a better way as of 4.20 because bootmem is now gone and
> > > > only memblock is used. This should unify what each arch needs to do
> > > > with initrd early. We need the physical address early for memblock
> > > > reserving. Then later on we need the virtual address to access the
> > > > initrd. Perhaps we should just change initrd_start and initrd_end to
> > > > physical addresses (or add 2 new variables would be less invasive and
> > > > allow for different translation than __va()). The sanity checks and
> > > > memblock reserve could also perhaps be moved to a common location.
> > > >
> > > > Alternatively, given arm64 is the only oddball, I'd be fine with an
> > > > "if (IS_ENABLED(CONFIG_ARM64))" condition in the default
> > > > __early_init_dt_declare_initrd as long as we have a path to removing
> > > > it like the above option.
> > >
> > > I think arm64 does not have to redefine __early_init_dt_declare_initrd().
> > > Something like this might be just all we need (completely untested,
> > > probably it won't even compile):
> > >
> > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > > index 9d9582c..e9ca238 100644
> > > --- a/arch/arm64/mm/init.c
> > > +++ b/arch/arm64/mm/init.c
> > > @@ -62,6 +62,9 @@ s64 memstart_addr __ro_after_init = -1;
> > >  phys_addr_t arm64_dma_phys_limit __ro_after_init;
> > >
> > >  #ifdef CONFIG_BLK_DEV_INITRD
> > > +
> > > +static phys_addr_t initrd_start_phys, initrd_end_phys;
> > > +
> > >  static int __init early_initrd(char *p)
> > >  {
> > > unsigned long start, size;
> > > @@ -71,8 +74,8 @@ static int __init early_initrd(char *p)
> > > if (*endp == ',') {
> > > size = memparse(endp + 1, NULL);
> > >
> > > -   initrd_start = start;
> > > -   initrd_end = start + size;
> > > +   initrd_start_phys = start;
> > > +   initrd_end_phys = end;
> > > }
> > > return 0;
> > >  }
> > > @@ -407,14 +410,27 @@ void __init arm64_memblock_init(void)
> > > memblock_add(__pa_symbol(_text), (u64)(_end - _text));
> > > }
> > >
> > > -   if (IS_ENABLED(CONFIG_BLK_DEV_INITRD) && initrd_start) {
> > > +   if (IS_ENABLED(CONFIG_BLK_DEV_INITRD) &&
> > > +   (initrd_start || initrd_start_phys)) {
> > > +   /*
> > > +* FIXME: ensure proper precendence between
> > > +* early_initrd and DT when both are present
> >
> > Command line takes precedence, so just reverse the order.
> >
> > > +*/
> > > +   if (initrd_start) {
> > > +   initrd_start_phys = __phys_to_virt(initrd_start);
> > > +   initrd_end_phys = __phys_to_virt(initrd_end);

BTW, I think you meant virt_to_phys() here?

> >
> > AIUI, the original issue was doing the P2V translation was happening
> > too early and the VA could be wrong if the linear range is adjusted.
> > So I don't think this would work.
>
> Probably things have changed since then, but in the current code there is
>
> in

Re: [PATCH 3/6] PCI: layerscape: Add the EP mode support

2018-10-25 Thread Rob Herring
On Thu, Oct 25, 2018 at 07:08:58PM +0800, Xiaowei Bao wrote:
> Add the EP mode support.
> 
> Signed-off-by: Xiaowei Bao 
> ---
>  .../devicetree/bindings/pci/layerscape-pci.txt |3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/pci/layerscape-pci.txt 
> b/Documentation/devicetree/bindings/pci/layerscape-pci.txt
> index 66df1e8..d3d7be1 100644
> --- a/Documentation/devicetree/bindings/pci/layerscape-pci.txt
> +++ b/Documentation/devicetree/bindings/pci/layerscape-pci.txt
> @@ -13,12 +13,15 @@ information.
>  
>  Required properties:
>  - compatible: should contain the platform identifier such as:
> +  RC mode:
>  "fsl,ls1021a-pcie", "snps,dw-pcie"
>  "fsl,ls2080a-pcie", "fsl,ls2085a-pcie", "snps,dw-pcie"
>  "fsl,ls2088a-pcie"
>  "fsl,ls1088a-pcie"
>  "fsl,ls1046a-pcie"
>  "fsl,ls1012a-pcie"
> +  EP mode:
> +"fsl,ls-pcie-ep"

You need SoC specific compatibles for the same reasons as the RC.

Rob


Re: [PATCH] selftests/powerpc: Relax L1d miss targets for rfi_flush test

2018-10-25 Thread Joel Stanley
On Tue, 23 Oct 2018 at 18:35, Naveen N. Rao
 wrote:
>
> When running the rfi_flush test, if the system is loaded, we see two
> issues:
> 1. The L1d misses when rfi_flush is disabled increase significantly due
> to other workloads interfering with the cache.
> 2. The L1d misses when rfi_flush is enabled sometimes goes slightly
> below the expected number of misses.
>
> To address these, let's relax the expected number of L1d misses:
> 1. When rfi_flush is disabled, we allow upto half the expected number of
> the misses for when rfi_flush is enabled.
> 2. When rfi_flush is enabled, we allow ~1% lower number of cache misses.
>
> Reported-by: Joel Stanley 
> Signed-off-by: Naveen N. Rao 

Thanks, this now passes 10/10 runs on my Romulus machine. A log is
attached below.

Tested-by: Joel Stanley 

Cheers,

Joel

---

for i in `seq 1 10`; do sudo ./rfi_flush; done
test: rfi_flush_test
tags: git_version:next-20181018-67-g61f7abf00719
PASS (L1D misses with rfi_flush=0: 5013939 < 9500) [10/10 pass]
PASS (L1D misses with rfi_flush=1: 195054696 > 19000) [10/10 pass]
success: rfi_flush_test
test: rfi_flush_test
tags: git_version:next-20181018-67-g61f7abf00719
PASS (L1D misses with rfi_flush=0: 11015957 < 9500) [10/10 pass]
PASS (L1D misses with rfi_flush=1: 195053292 > 19000) [10/10 pass]
success: rfi_flush_test
test: rfi_flush_test
tags: git_version:next-20181018-67-g61f7abf00719
PASS (L1D misses with rfi_flush=0: 8017248 < 9500) [10/10 pass]
PASS (L1D misses with rfi_flush=1: 195145579 > 19000) [10/10 pass]
success: rfi_flush_test
test: rfi_flush_test
tags: git_version:next-20181018-67-g61f7abf00719
PASS (L1D misses with rfi_flush=0: 11015308 < 9500) [10/10 pass]
PASS (L1D misses with rfi_flush=1: 195042376 > 19000) [10/10 pass]
success: rfi_flush_test
test: rfi_flush_test
tags: git_version:next-20181018-67-g61f7abf00719
PASS (L1D misses with rfi_flush=0: 1021356 < 9500) [10/10 pass]
PASS (L1D misses with rfi_flush=1: 195031624 > 19000) [10/10 pass]
success: rfi_flush_test
test: rfi_flush_test
tags: git_version:next-20181018-67-g61f7abf00719
PASS (L1D misses with rfi_flush=0: 6015342 < 9500) [10/10 pass]
PASS (L1D misses with rfi_flush=1: 195037322 > 19000) [10/10 pass]
success: rfi_flush_test
test: rfi_flush_test
tags: git_version:next-20181018-67-g61f7abf00719
PASS (L1D misses with rfi_flush=0: 16635 < 9500) [10/10 pass]
PASS (L1D misses with rfi_flush=1: 195032476 > 19000) [10/10 pass]
success: rfi_flush_test
test: rfi_flush_test
tags: git_version:next-20181018-67-g61f7abf00719
PASS (L1D misses with rfi_flush=0: 6013599 < 9500) [10/10 pass]
PASS (L1D misses with rfi_flush=1: 195060037 > 19000) [10/10 pass]
success: rfi_flush_test
test: rfi_flush_test
tags: git_version:next-20181018-67-g61f7abf00719
PASS (L1D misses with rfi_flush=0: 25236 < 9500) [10/10 pass]
PASS (L1D misses with rfi_flush=1: 195052859 > 19000) [10/10 pass]
success: rfi_flush_test
test: rfi_flush_test
tags: git_version:next-20181018-67-g61f7abf00719
PASS (L1D misses with rfi_flush=0: 18120 < 9500) [10/10 pass]
PASS (L1D misses with rfi_flush=1: 195014212 > 19000) [10/10 pass]
success: rfi_flush_test


Re: [PATCH v1 3/5] powerpc/pseries: Fix stolen time accounting when dtl debugfs is used

2018-10-25 Thread Paul Mackerras
On Fri, Oct 26, 2018 at 01:55:44AM +0530, Naveen N. Rao wrote:
> When the dtl debugfs interface is used, we usually set the
> dtl_enable_mask to 0x7 (DTL_LOG_ALL). When this happens, we start seeing
> DTL entries for all preempt reasons, including CEDE. In
> scan_dispatch_log(), we add up the times from all entries and account
> those towards stolen time. However, we should only be accounting stolen
> time when the preemption was due to HDEC at the end of our time slice.

It's always been the case that stolen time when idle has been
accounted as idle time, not stolen time.  That's why we didn't check
for this in the past.

Do you have a test that shows different results (as in reported idle
and stolen times) with this patch compared to without?

Paul.


Re: [PATCH] seccomp: Add pkru into seccomp_data

2018-10-25 Thread Andy Lutomirski
On Thu, Oct 25, 2018 at 9:42 AM Michael Sammler  wrote:
>
> On 10/25/2018 11:12 AM, Florian Weimer wrote:
> >> I understand your concern about exposing the number of protection keys
> >> in the ABI. One idea would be to state, that the pkru field (which
> >> should probably be renamed) contains an architecture specific value,
> >> which could then be the PKRU on x86 and AMR (or another register) on
> >> POWER. This new field should probably be extended to __u64 and the
> >> reserved field removed.
> > POWER also has proper read/write bit separation, not PKEY_DISABLE_ACCESS
> > (disable read and write) and PKEY_DISABLE_WRITE like Intel.  It's
> > currently translated by the kernel, but I really need a
> > PKEY_DISABLE_READ bit in glibc to implement pkey_get in case the memory
> > is write-only.
> The idea here would be to simply provide the raw value of the register
> (PKRU on x86, AMR on POWER) to the BPF program and let the BPF program
> (or maybe a higher level library like libseccomp) deal with the
> complications of interpreting this architecture specific value (similar
> how the BPF program currently already has to deal with architecture
> specific system call numbers). If an architecture were to support more
> protection keys than fit into the field, the architecture specific value
> stored in the field might simply be the first protection keys. If there
> was interest, it would be possible to add more architecture specific
> fields to seccomp_data.
> >> Another idea would be to not add a field in the seccomp_data
> >> structure, but instead provide a new BPF instruction, which reads the
> >> value of a specified protection key.
> > I would prefer that if it's possible.  We should make sure that the bits
> > are the same as those returned from pkey_get.  I have an implementation
> > on POWER, but have yet to figure out the implications for 32-bit because
> > I do not know the AMR register size there.
> >
> > Thanks,
> > Florian
> I have had a look at how BPF is implemented and it does not seem to be
> easy to just add an BPF instruction for seccomp since (as far as I
> understand) the code of the classical BPF (as used by seccomp) is shared
> with the code of eBPF, which is used in many parts of the kernel and
> there is at least one interpreter and one JIT compiler for BPF. But
> maybe someone with more experience than me can comment on how hard it
> would be to add an instruction to BPF.
>

You could bite the bullet and add seccomp eBPF support :)


Re: [PATCH v2 0/2] arm64: Cut rebuild time when changing CONFIG_BLK_DEV_INITRD

2018-10-25 Thread Florian Fainelli
On 10/25/18 2:13 PM, Rob Herring wrote:
> On Thu, Oct 25, 2018 at 12:30 PM Mike Rapoport  wrote:
>>
>> On Thu, Oct 25, 2018 at 08:15:15AM -0500, Rob Herring wrote:
>>> +Ard
>>>
>>> On Thu, Oct 25, 2018 at 4:38 AM Mike Rapoport  wrote:

 On Wed, Oct 24, 2018 at 02:55:17PM -0500, Rob Herring wrote:
> On Wed, Oct 24, 2018 at 2:33 PM Florian Fainelli  
> wrote:
>>
>> Hi all,
>>
>> While investigating why ARM64 required a ton of objects to be rebuilt
>> when toggling CONFIG_DEV_BLK_INITRD, it became clear that this was
>> because we define __early_init_dt_declare_initrd() differently and we do
>> that in arch/arm64/include/asm/memory.h which gets included by a fair
>> amount of other header files, and translation units as well.
>
> I scratch my head sometimes as to why some config options rebuild so
> much stuff. One down, ? to go. :)
>
>> Changing the value of CONFIG_DEV_BLK_INITRD is a common thing with build
>> systems that generate two kernels: one with the initramfs and one
>> without. buildroot is one of these build systems, OpenWrt is also
>> another one that does this.
>>
>> This patch series proposes adding an empty initrd.h to satisfy the need
>> for drivers/of/fdt.c to unconditionally include that file, and moves the
>> custom __early_init_dt_declare_initrd() definition away from
>> asm/memory.h
>>
>> This cuts the number of objects rebuilds from 1920 down to 26, so a
>> factor 73 approximately.
>>
>> Apologies for the long CC list, please let me know how you would go
>> about merging that and if another approach would be preferable, e.g:
>> introducing a CONFIG_ARCH_INITRD_BELOW_START_OK Kconfig option or
>> something like that.
>
> There may be a better way as of 4.20 because bootmem is now gone and
> only memblock is used. This should unify what each arch needs to do
> with initrd early. We need the physical address early for memblock
> reserving. Then later on we need the virtual address to access the
> initrd. Perhaps we should just change initrd_start and initrd_end to
> physical addresses (or add 2 new variables would be less invasive and
> allow for different translation than __va()). The sanity checks and
> memblock reserve could also perhaps be moved to a common location.
>
> Alternatively, given arm64 is the only oddball, I'd be fine with an
> "if (IS_ENABLED(CONFIG_ARM64))" condition in the default
> __early_init_dt_declare_initrd as long as we have a path to removing
> it like the above option.

 I think arm64 does not have to redefine __early_init_dt_declare_initrd().
 Something like this might be just all we need (completely untested,
 probably it won't even compile):

 diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
 index 9d9582c..e9ca238 100644
 --- a/arch/arm64/mm/init.c
 +++ b/arch/arm64/mm/init.c
 @@ -62,6 +62,9 @@ s64 memstart_addr __ro_after_init = -1;
  phys_addr_t arm64_dma_phys_limit __ro_after_init;

  #ifdef CONFIG_BLK_DEV_INITRD
 +
 +static phys_addr_t initrd_start_phys, initrd_end_phys;
 +
  static int __init early_initrd(char *p)
  {
 unsigned long start, size;
 @@ -71,8 +74,8 @@ static int __init early_initrd(char *p)
 if (*endp == ',') {
 size = memparse(endp + 1, NULL);

 -   initrd_start = start;
 -   initrd_end = start + size;
 +   initrd_start_phys = start;
 +   initrd_end_phys = end;
 }
 return 0;
  }
 @@ -407,14 +410,27 @@ void __init arm64_memblock_init(void)
 memblock_add(__pa_symbol(_text), (u64)(_end - _text));
 }

 -   if (IS_ENABLED(CONFIG_BLK_DEV_INITRD) && initrd_start) {
 +   if (IS_ENABLED(CONFIG_BLK_DEV_INITRD) &&
 +   (initrd_start || initrd_start_phys)) {
 +   /*
 +* FIXME: ensure proper precendence between
 +* early_initrd and DT when both are present
>>>
>>> Command line takes precedence, so just reverse the order.
>>>
 +*/
 +   if (initrd_start) {
 +   initrd_start_phys = __phys_to_virt(initrd_start);
 +   initrd_end_phys = __phys_to_virt(initrd_end);
> 
> BTW, I think you meant virt_to_phys() here?
> 
>>>
>>> AIUI, the original issue was doing the P2V translation was happening
>>> too early and the VA could be wrong if the linear range is adjusted.
>>> So I don't think this would work.
>>
>> Probably things have changed since then, but in the current code there is
>>
>> initrd_start = __phys_to_virt(initrd_start);
>>
>> and in between only the code related to CONFIG_RANDOMIZE_BASE, so I believe
>> it's safe to use __phys

Re: [PATCH] seccomp: Add pkru into seccomp_data

2018-10-25 Thread Kees Cook
On Fri, Oct 26, 2018 at 12:00 AM, Andy Lutomirski  wrote:
> You could bite the bullet and add seccomp eBPF support :)

I'm not convinced this is a good enough reason for gaining the eBPF
attack surface yet.

-Kees

-- 
Kees Cook


Re: [PATCH] seccomp: Add pkru into seccomp_data

2018-10-25 Thread Andy Lutomirski



> On Oct 25, 2018, at 5:35 PM, Kees Cook  wrote:
> 
>> On Fri, Oct 26, 2018 at 12:00 AM, Andy Lutomirski  
>> wrote:
>> You could bite the bullet and add seccomp eBPF support :)
> 
> I'm not convinced this is a good enough reason for gaining the eBPF
> attack surface yet.
> 
> 

Is it an interesting attack surface?  It’s certainly scarier if you’re worried 
about attacks from the sandbox creator, but the security inside the sandbox 
should be more or less equivalent, no?

RE: [PATCH 3/6] PCI: layerscape: Add the EP mode support

2018-10-25 Thread Xiaowei Bao


-Original Message-
From: Rob Herring  
Sent: 2018年10月26日 5:53
To: Xiaowei Bao 
Cc: bhelg...@google.com; mark.rutl...@arm.com; shawn...@kernel.org; Leo Li 
; kis...@ti.com; lorenzo.pieral...@arm.com; a...@arndb.de; 
gre...@linuxfoundation.org; M.h. Lian ; Mingkai Hu 
; Roy Zang ; 
kstew...@linuxfoundation.org; cyrille.pitc...@free-electrons.com; 
pombreda...@nexb.com; shawn@rock-chips.com; niklas.cas...@axis.com; 
linux-...@vger.kernel.org; devicet...@vger.kernel.org; 
linux-ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; 
linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH 3/6] PCI: layerscape: Add the EP mode support

On Thu, Oct 25, 2018 at 07:08:58PM +0800, Xiaowei Bao wrote:
> Add the EP mode support.
> 
> Signed-off-by: Xiaowei Bao 
> ---
>  .../devicetree/bindings/pci/layerscape-pci.txt |3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/pci/layerscape-pci.txt 
> b/Documentation/devicetree/bindings/pci/layerscape-pci.txt
> index 66df1e8..d3d7be1 100644
> --- a/Documentation/devicetree/bindings/pci/layerscape-pci.txt
> +++ b/Documentation/devicetree/bindings/pci/layerscape-pci.txt
> @@ -13,12 +13,15 @@ information.
>  
>  Required properties:
>  - compatible: should contain the platform identifier such as:
> +  RC mode:
>  "fsl,ls1021a-pcie", "snps,dw-pcie"
>  "fsl,ls2080a-pcie", "fsl,ls2085a-pcie", "snps,dw-pcie"
>  "fsl,ls2088a-pcie"
>  "fsl,ls1088a-pcie"
>  "fsl,ls1046a-pcie"
>  "fsl,ls1012a-pcie"
> +  EP mode:
> +"fsl,ls-pcie-ep"

You need SoC specific compatibles for the same reasons as the RC.
[Xiaowei Bao] I want to contains all layerscape platform use one compatible if 
the PCIe controller work in EP mode.

Rob


Re: [PATCH 5/6] pci: layerscape: Add the EP mode support.

2018-10-25 Thread Kishon Vijay Abraham I
Hi,

On Thursday 25 October 2018 04:39 PM, Xiaowei Bao wrote:
> Add the PCIe EP mode support for layerscape platform.
> 
> Signed-off-by: Xiaowei Bao 
> ---
>  drivers/pci/controller/dwc/Makefile|2 +-
>  drivers/pci/controller/dwc/pci-layerscape-ep.c |  161 
> 
>  2 files changed, 162 insertions(+), 1 deletions(-)
>  create mode 100644 drivers/pci/controller/dwc/pci-layerscape-ep.c
> 
> diff --git a/drivers/pci/controller/dwc/Makefile 
> b/drivers/pci/controller/dwc/Makefile
> index 5d2ce72..b26d617 100644
> --- a/drivers/pci/controller/dwc/Makefile
> +++ b/drivers/pci/controller/dwc/Makefile
> @@ -8,7 +8,7 @@ obj-$(CONFIG_PCI_EXYNOS) += pci-exynos.o
>  obj-$(CONFIG_PCI_IMX6) += pci-imx6.o
>  obj-$(CONFIG_PCIE_SPEAR13XX) += pcie-spear13xx.o
>  obj-$(CONFIG_PCI_KEYSTONE) += pci-keystone-dw.o pci-keystone.o
> -obj-$(CONFIG_PCI_LAYERSCAPE) += pci-layerscape.o
> +obj-$(CONFIG_PCI_LAYERSCAPE) += pci-layerscape.o pci-layerscape-ep.o
>  obj-$(CONFIG_PCIE_QCOM) += pcie-qcom.o
>  obj-$(CONFIG_PCIE_ARMADA_8K) += pcie-armada8k.o
>  obj-$(CONFIG_PCIE_ARTPEC6) += pcie-artpec6.o
> diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
> b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> new file mode 100644
> index 000..3b33bbc
> --- /dev/null
> +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> @@ -0,0 +1,161 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * PCIe controller EP driver for Freescale Layerscape SoCs
> + *
> + * Copyright (C) 2018 NXP Semiconductor.
> + *
> + * Author: Xiaowei Bao 
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "pcie-designware.h"
> +
> +#define PCIE_DBI2_OFFSET 0x1000  /* DBI2 base address*/

The base address should come from dt.
> +
> +struct ls_pcie_ep {
> + struct dw_pcie  *pci;
> +};
> +
> +#define to_ls_pcie_ep(x) dev_get_drvdata((x)->dev)
> +
> +static bool ls_pcie_is_bridge(struct ls_pcie_ep *pcie)
> +{
> + struct dw_pcie *pci = pcie->pci;
> + u32 header_type;
> +
> + header_type = ioread8(pci->dbi_base + PCI_HEADER_TYPE);
> + header_type &= 0x7f;
> +
> + return header_type == PCI_HEADER_TYPE_BRIDGE;
> +}
> +
> +static int ls_pcie_establish_link(struct dw_pcie *pci)
> +{
> + return 0;
> +}

There should be some way by which EP should tell RC that it is not configured
yet. Are there no bits to control LTSSM state initialization or Configuration
retry status enabling?
> +
> +static const struct dw_pcie_ops ls_pcie_ep_ops = {
> + .start_link = ls_pcie_establish_link,
> +};
> +
> +static const struct of_device_id ls_pcie_ep_of_match[] = {
> + { .compatible = "fsl,ls-pcie-ep",},
> + { },
> +};
> +
> +static void ls_pcie_ep_init(struct dw_pcie_ep *ep)
> +{
> + struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> + struct pci_epc *epc = ep->epc;
> + enum pci_barno bar;
> +
> + for (bar = BAR_0; bar <= BAR_5; bar++)
> + dw_pcie_ep_reset_bar(pci, bar);
> +
> + epc->features |= EPC_FEATURE_NO_LINKUP_NOTIFIER;
> +}
> +
> +static int ls_pcie_ep_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
> +   enum pci_epc_irq_type type, u16 interrupt_num)
> +{
> + struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> +
> + switch (type) {
> + case PCI_EPC_IRQ_LEGACY:
> + return dw_pcie_ep_raise_legacy_irq(ep, func_no);
> + case PCI_EPC_IRQ_MSI:
> + return dw_pcie_ep_raise_msi_irq(ep, func_no, interrupt_num);
> + case PCI_EPC_IRQ_MSIX:
> + return dw_pcie_ep_raise_msix_irq(ep, func_no, interrupt_num);
> + default:
> + dev_err(pci->dev, "UNKNOWN IRQ type\n");
> + }
> +
> + return 0;
> +}
> +
> +static struct dw_pcie_ep_ops pcie_ep_ops = {
> + .ep_init = ls_pcie_ep_init,
> + .raise_irq = ls_pcie_ep_raise_irq,
> +};
> +
> +static int __init ls_add_pcie_ep(struct ls_pcie_ep *pcie,
> + struct platform_device *pdev)
> +{
> + struct dw_pcie *pci = pcie->pci;
> + struct device *dev = pci->dev;
> + struct dw_pcie_ep *ep;
> + struct resource *res;
> + int ret;
> +
> + ep = &pci->ep;
> + ep->ops = &pcie_ep_ops;
> +
> + res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "addr_space");
> + if (!res)
> + return -EINVAL;
> +
> + ep->phys_base = res->start;
> + ep->addr_size = resource_size(res);
> +
> + ret = dw_pcie_ep_init(ep);
> + if (ret) {
> + dev_err(dev, "failed to initialize endpoint\n");
> + return ret;
> + }
> +
> + return 0;
> +}
> +
> +static int __init ls_pcie_ep_probe(struct platform_device *pdev)
> +{
> + struct device *dev = &pdev->dev;
> + struct dw_pcie *pci;
> + struct ls_pcie_ep *pcie;
> + struct resource *dbi_base;
> + int ret;
> +
> + pcie = devm_kzalloc(dev, sizeof(*pcie), GFP_KERNEL);
> + if (!pcie)
> 

Re: [PATCH] seccomp: Add pkru into seccomp_data

2018-10-25 Thread Ram Pai
On Thu, Oct 25, 2018 at 11:12:25AM +0200, Florian Weimer wrote:
> * Michael Sammler:
> 
> > Thank you for the pointer about the POWER implementation. I am not
> > familiar with POWER in general and its protection key feature at
> > all. Would the AMR register be the correct register to expose here?
> 
> Yes, according to my notes, the register is called AMR (special purpose
> register 13).

Yes. it is AMR register.

RP



[PATCH 1/5] powerpc/64s: Guarded Userspace Access Prevention

2018-10-25 Thread Russell Currey
Guarded Userspace Access Prevention (GUAP)  utilises a feature of
the Radix MMU which disallows read and write access to userspace
addresses.  By utilising this, the kernel is prevented from accessing
user data from outside of trusted paths that perform proper safety checks,
such as copy_{to/from}_user() and friends.

Userspace access is disabled from early boot and is only enabled when:

- exiting the kernel and entering userspace
- performing an operation like copy_{to/from}_user()
- context switching to a process that has access enabled

and similarly, access is disabled again when exiting userspace and entering
the kernel.

This feature has a slight performance impact which I roughly measured to be
3% slower in the worst case (performing 1GB of 1 byte read()/write()
syscalls), and is gated behind the CONFIG_PPC_RADIX_GUAP option for
performance-critical builds.

This feature can be tested by using the lkdtm driver (CONFIG_LKDTM=y) and
performing the following:

echo ACCESS_USERSPACE > [debugfs]/provoke-crash/DIRECT

if enabled, this should send SIGSEGV to the thread.

Signed-off-by: Russell Currey 
---
Since the previous version of this patchset (named KHRAP) there have been
several changes, some of which include:

- macro naming, suggested by Nick
- builds should be fixed outside of 64s
- no longer unlock heading out to userspace
- removal of unnecessary isyncs
- more config option testing
- removal of save/restore
- use pr_crit() and reword message on fault

 arch/powerpc/include/asm/exception-64e.h |  3 ++
 arch/powerpc/include/asm/exception-64s.h | 19 +++-
 arch/powerpc/include/asm/mmu.h   |  7 +++
 arch/powerpc/include/asm/paca.h  |  3 ++
 arch/powerpc/include/asm/reg.h   |  1 +
 arch/powerpc/include/asm/uaccess.h   | 57 
 arch/powerpc/kernel/asm-offsets.c|  1 +
 arch/powerpc/kernel/dt_cpu_ftrs.c|  4 ++
 arch/powerpc/kernel/entry_64.S   | 17 ++-
 arch/powerpc/mm/fault.c  | 12 +
 arch/powerpc/mm/pgtable-radix.c  |  2 +
 arch/powerpc/mm/pkeys.c  |  7 ++-
 arch/powerpc/platforms/Kconfig.cputype   | 15 +++
 13 files changed, 135 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64e.h 
b/arch/powerpc/include/asm/exception-64e.h
index 555e22d5e07f..bf25015834ee 100644
--- a/arch/powerpc/include/asm/exception-64e.h
+++ b/arch/powerpc/include/asm/exception-64e.h
@@ -215,5 +215,8 @@ exc_##label##_book3e:
 #define RFI_TO_USER\
rfi
 
+#define UNLOCK_USER_ACCESS(reg)
+#define LOCK_USER_ACCESS(reg)
+
 #endif /* _ASM_POWERPC_EXCEPTION_64E_H */
 
diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 3b4767ed3ec5..0cac5bd380ca 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -264,6 +264,19 @@ BEGIN_FTR_SECTION_NESTED(943)  
\
std ra,offset(r13); \
 END_FTR_SECTION_NESTED(ftr,ftr,943)
 
+#define LOCK_USER_ACCESS(reg)  
\
+BEGIN_MMU_FTR_SECTION_NESTED(944)  \
+   LOAD_REG_IMMEDIATE(reg,AMR_LOCKED); \
+   mtspr   SPRN_AMR,reg;   \
+END_MMU_FTR_SECTION_NESTED(MMU_FTR_RADIX_GUAP,MMU_FTR_RADIX_GUAP,944)
+
+#define UNLOCK_USER_ACCESS(reg)
\
+BEGIN_MMU_FTR_SECTION_NESTED(945)  \
+   li  reg,0;  \
+   mtspr   SPRN_AMR,reg;   \
+   isync   \
+END_MMU_FTR_SECTION_NESTED(MMU_FTR_RADIX_GUAP,MMU_FTR_RADIX_GUAP,945)
+
 #define EXCEPTION_PROLOG_0(area)   \
GET_PACA(r13);  \
std r9,area+EX_R9(r13); /* save r9 */   \
@@ -500,7 +513,11 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
beq 4f; /* if from kernel mode  */ \
ACCOUNT_CPU_USER_ENTRY(r13, r9, r10);  \
SAVE_PPR(area, r9);\
-4: EXCEPTION_PROLOG_COMMON_2(area)\
+4: lbz r9,PACA_USER_ACCESS_ALLOWED(r13);  \
+   cmpwi   cr1,r9,0;  \
+   beq 5f;\
+   LOCK_USER_ACCESS(r9);   
   \
+5: EXCEPTION_

[PATCH 2/5] powerpc/futex: GUAP support for futex ops

2018-10-25 Thread Russell Currey
Wrap the futex operations in GUAP locks and unlocks.

Signed-off-by: Russell Currey 
---
 arch/powerpc/include/asm/futex.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/include/asm/futex.h b/arch/powerpc/include/asm/futex.h
index 94542776a62d..3aed640ee9ef 100644
--- a/arch/powerpc/include/asm/futex.h
+++ b/arch/powerpc/include/asm/futex.h
@@ -35,6 +35,7 @@ static inline int arch_futex_atomic_op_inuser(int op, int 
oparg, int *oval,
 {
int oldval = 0, ret;
 
+   unlock_user_access();
pagefault_disable();
 
switch (op) {
@@ -62,6 +63,7 @@ static inline int arch_futex_atomic_op_inuser(int op, int 
oparg, int *oval,
if (!ret)
*oval = oldval;
 
+   lock_user_access();
return ret;
 }
 
@@ -75,6 +77,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
return -EFAULT;
 
+   unlock_user_access();
 __asm__ __volatile__ (
 PPC_ATOMIC_ENTRY_BARRIER
 "1: lwarx   %1,0,%3 # futex_atomic_cmpxchg_inatomic\n\
@@ -95,6 +98,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
 : "cc", "memory");
 
*uval = prev;
+   lock_user_access();
 return ret;
 }
 
-- 
2.19.1



[PATCH 0/5] Guarded Userspace Access Prevention on Radix

2018-10-25 Thread Russell Currey
Guarded Userspace Access Prevention is a security mechanism that prevents
the kernel from being able to read and write userspace addresses outside of
the allowed paths, most commonly copy_{to/from}_user().

At present, the only CPU that supports this is POWER9, and only while using
the Radix MMU.  Privileged reads and writes cannot access user data when
key 0 of the AMR is set.  This is described in the "Radix Tree Translation
Storage Protection" section of the POWER ISA as of version 3.0.

GUAP code sets key 0 of the AMR (thus disabling accesses of user data)
early during boot, and only ever "unlocks" access prior to certain
operations, like copy_{to/from}_user(), futex ops, etc.  Setting this does
not prevent unprivileged access, so userspace can operate fine while access
is locked.

There is a performance impact, although I don't consider it heavy.  Running
a worst-case benchmark of a 1GB copy 1 byte at a time (and thus constant
read(1) write(1) syscalls), I found enabling GUAP to be 3.5% slower than
when disabled.  In most cases, the difference is negligible.  The main
performance impact is the mtspr instruction, which is quite slow.

There are a few caveats with this series that could be improved upon in
future.  Right now there is no saving and restoring of the AMR value -
there is no userspace exploitation of the AMR on Radix in POWER9, but if
this were to change in future, saving and restoring the value would be
necessary.

No attempt to optimise cases of repeated calls - for example, if some
code was repeatedly calling copy_to_user() for small sizes very frequently,
it would be slower than the equivalent of wrapping that code in an unlock
and lock and only having to modify the AMR once.

There are some interesting cases that I've attempted to handle, such as if
the AMR is unlocked (i.e. because a copy_{to_from}_user is in progress)...

- and an exception is taken, the kernel would then be running with the
AMR unlocked and freely able to access userspace again.  I am working
around this by storing a flag in the PACA to indicate if the AMR is
unlocked (to save a costly SPR read), and if so, locking the AMR in
the exception entry path and unlocking it on the way out.

- and gets context switched out, goes into a path that locks the AMR,
then context switches back, access will be disabled and will fault.
As a result, I context switch the AMR between tasks as if it was used
by userspace like hash (which already implements this).

Another consideration is use of the isync instruction.  Without an isync
following the mtspr instruction, there is no guarantee that the change
takes effect.  The issue is that isync is very slow, and so I tried to
avoid them wherever necessary.  In this series, the only place an isync
gets used is after *unlocking* the AMR, because if an access takes place
and access is still prevented, the kernel will fault.

On the flipside, a slight delay in unlocking caused by skipping an isync
potentially allows a small window of vulnerability.  It is my opinion
that this window is practically impossible to exploit, but if someone
thinks otherwise, please do share.

This series is my first attempt at POWER assembly so all feedback is very
welcome.

The official theme song of this series can be found here:
https://www.youtube.com/watch?v=QjTrnKAcYjE

Russell Currey (5):
  powerpc/64s: Guarded Userspace Access Prevention
  powerpc/futex: GUAP support for futex ops
  powerpc/lib: checksum GUAP support
  powerpc/64s: Disable GUAP with nosmap option
  powerpc/64s: Document that PPC supports nosmap

 .../admin-guide/kernel-parameters.txt |  2 +-
 arch/powerpc/include/asm/exception-64e.h  |  3 +
 arch/powerpc/include/asm/exception-64s.h  | 19 ++-
 arch/powerpc/include/asm/futex.h  |  6 ++
 arch/powerpc/include/asm/mmu.h|  7 +++
 arch/powerpc/include/asm/paca.h   |  3 +
 arch/powerpc/include/asm/reg.h|  1 +
 arch/powerpc/include/asm/uaccess.h| 57 ---
 arch/powerpc/kernel/asm-offsets.c |  1 +
 arch/powerpc/kernel/dt_cpu_ftrs.c |  4 ++
 arch/powerpc/kernel/entry_64.S| 17 +-
 arch/powerpc/lib/checksum_wrappers.c  |  6 +-
 arch/powerpc/mm/fault.c   |  9 +++
 arch/powerpc/mm/init_64.c | 15 +
 arch/powerpc/mm/pgtable-radix.c   |  2 +
 arch/powerpc/mm/pkeys.c   |  7 ++-
 arch/powerpc/platforms/Kconfig.cputype| 15 +
 17 files changed, 158 insertions(+), 16 deletions(-)

-- 
2.19.1



[PATCH 3/5] powerpc/lib: checksum GUAP support

2018-10-25 Thread Russell Currey
Wrap the checksumming code in GUAP locks and unlocks.

Signed-off-by: Russell Currey 
---
 arch/powerpc/lib/checksum_wrappers.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/lib/checksum_wrappers.c 
b/arch/powerpc/lib/checksum_wrappers.c
index a0cb63fb76a1..c67db0a6e18b 100644
--- a/arch/powerpc/lib/checksum_wrappers.c
+++ b/arch/powerpc/lib/checksum_wrappers.c
@@ -28,6 +28,7 @@ __wsum csum_and_copy_from_user(const void __user *src, void 
*dst,
 {
unsigned int csum;
 
+   unlock_user_access();
might_sleep();
 
*err_ptr = 0;
@@ -60,6 +61,7 @@ __wsum csum_and_copy_from_user(const void __user *src, void 
*dst,
}
 
 out:
+   lock_user_access();
return (__force __wsum)csum;
 }
 EXPORT_SYMBOL(csum_and_copy_from_user);
@@ -69,6 +71,7 @@ __wsum csum_and_copy_to_user(const void *src, void __user 
*dst, int len,
 {
unsigned int csum;
 
+   unlock_user_access();
might_sleep();
 
*err_ptr = 0;
@@ -97,6 +100,7 @@ __wsum csum_and_copy_to_user(const void *src, void __user 
*dst, int len,
}
 
 out:
+   lock_user_access();
return (__force __wsum)csum;
 }
 EXPORT_SYMBOL(csum_and_copy_to_user);
-- 
2.19.1



[PATCH 4/5] powerpc/64s: Disable GUAP with nosmap option

2018-10-25 Thread Russell Currey
GUAP is similar to SMAP on x86 platforms, so implement support for
the same kernel parameter.

Signed-off-by: Russell Currey 
---
 arch/powerpc/mm/init_64.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 7a9886f98b0c..b26641df36f2 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -312,6 +312,7 @@ void register_page_bootmem_memmap(unsigned long section_nr,
 
 #ifdef CONFIG_PPC_BOOK3S_64
 static bool disable_radix = !IS_ENABLED(CONFIG_PPC_RADIX_MMU_DEFAULT);
+static bool disable_guap = !IS_ENABLED(CONFIG_PPC_RADIX_GUAP);
 
 static int __init parse_disable_radix(char *p)
 {
@@ -328,6 +329,18 @@ static int __init parse_disable_radix(char *p)
 }
 early_param("disable_radix", parse_disable_radix);
 
+static int __init parse_nosmap(char *p)
+{
+   /*
+* nosmap is an existing option on x86 where it doesn't return -EINVAL
+* if the parameter is set to something, so even though it's different
+* to disable_radix, don't return an error for compatibility.
+*/
+   disable_guap = true;
+   return 0;
+}
+early_param("nosmap", parse_nosmap);
+
 /*
  * If we're running under a hypervisor, we need to check the contents of
  * /chosen/ibm,architecture-vec-5 to see if the hypervisor is willing to do
@@ -381,6 +394,8 @@ void __init mmu_early_init_devtree(void)
/* Disable radix mode based on kernel command line. */
if (disable_radix)
cur_cpu_spec->mmu_features &= ~MMU_FTR_TYPE_RADIX;
+   if (disable_radix || disable_guap)
+   cur_cpu_spec->mmu_features &= ~MMU_FTR_RADIX_GUAP;
 
/*
 * Check /chosen/ibm,architecture-vec-5 if running as a guest.
-- 
2.19.1



[PATCH 5/5] powerpc/64s: Document that PPC supports nosmap

2018-10-25 Thread Russell Currey
Signed-off-by: Russell Currey 
---
 Documentation/admin-guide/kernel-parameters.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index a5ad67d5cb16..8f78e75965f0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2764,7 +2764,7 @@
noexec=on: enable non-executable mappings (default)
noexec=off: disable non-executable mappings
 
-   nosmap  [X86]
+   nosmap  [X86,PPC]
Disable SMAP (Supervisor Mode Access Prevention)
even if it is supported by processor.
 
-- 
2.19.1