date:20180824

Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline timer softirq

2018-08-24 Thread Thomas Gleixner

On Fri, 24 Aug 2018, Greg KH wrote:
> On Thu, Aug 23, 2018 at 05:57:06PM -0500, Grygorii Strashko wrote:
> > This patch was back ported to the Stable linux-4.14.y and It causes 
> > regression -
> >  flood of "NOHZ: local_softirq_pending" messages on all TI boards during 
> > boot (NFS boot):
> > 
> > [4.179796] NOHZ: local_softirq_pending 2c2 in sirq 256
> > [4.185051] NOHZ: local_softirq_pending 2c2 in sirq 256

This printout is weird. Did you add something here?

> > the same is not reproducible with LKML - seems due to changes in 
> > tick-sched.c 
> > __tick_nohz_idle_enter()/tick_nohz_irq_exit().
> 
> What changes do you think fixed this?
> 
> > I've generated backtrace from  can_stop_idle_tick() (see below) and seems 
> > this
> > patch makes tick_nohz_irq_exit() call unconditional in case of nested 
> > interrupt:
> > 
> > gic_handle_irq
> >  |- irq_exit
> > |- preempt_count_sub(HARDIRQ_OFFSET); <-- [1]
> > |-__do_softirq 
> > 
> > |- gic_handle_irq()
> >|- irq_exit()
> > |- tick_irq_exit()
> >if (!in_irq()) <-- My understanding is that this condition 
> > will be always true due to [1]

Correct, but that's not the problem. The issue is that this happens in a
softirq disabled region. Does the below fix it?

Thanks,

tglx

8<
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 5b33e2f5c0ed..6aab9d54a331 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -888,7 +888,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched 
*ts)
if (unlikely(local_softirq_pending() && cpu_online(cpu))) {
static int ratelimit;
 
-   if (ratelimit < 10 &&
+   if (ratelimit < 10 && !in_softirq() &&
(local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) {
pr_warn("NOHZ: local_softirq_pending %02x\n",
(unsigned int) local_softirq_pending());

RE: [PATCH] thermal: of-thermal: disable passive polling when thermal zone is disabled

2018-08-24 Thread Anson Huang

Gentle Ping...

Anson Huang
Best Regards!


> -Original Message-
> From: Anson Huang
> Sent: Tuesday, July 31, 2018 12:57 AM
> To: rui.zh...@intel.com; edubez...@gmail.com; linux...@vger.kernel.org;
> linux-kernel@vger.kernel.org
> Cc: dl-linux-imx ; Nitin Garg 
> Subject: [PATCH] thermal: of-thermal: disable passive polling when thermal
> zone is disabled
> 
> When thermal zone is in passive mode, disabling its mode from sysfs is NOT
> taking effect at all, it is still polling the temperature of the disabled 
> thermal
> zone and handling all thermal trips, it makes user confused. The disabling
> operation should disable the thermal zone behavior completely, for both active
> and passive mode, this patch clears the passive_delay when thermal zone is
> disabled and restores it when it is enabled.
> 
> Signed-off-by: Anson Huang 
> ---
>  drivers/thermal/of-thermal.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/thermal/of-thermal.c b/drivers/thermal/of-thermal.c index
> 977a830..4f28165 100644
> --- a/drivers/thermal/of-thermal.c
> +++ b/drivers/thermal/of-thermal.c
> @@ -260,10 +260,13 @@ static int of_thermal_set_mode(struct
> thermal_zone_device *tz,
> 
>   mutex_lock(&tz->lock);
> 
> - if (mode == THERMAL_DEVICE_ENABLED)
> + if (mode == THERMAL_DEVICE_ENABLED) {
>   tz->polling_delay = data->polling_delay;
> - else
> + tz->passive_delay = data->passive_delay;
> + } else {
>   tz->polling_delay = 0;
> + tz->passive_delay = 0;
> + }
> 
>   mutex_unlock(&tz->lock);
> 
> --
> 2.7.4

Re: [PATCH] x86/speculation/l1tf: suggest what to do on systems with too much RAM

2018-08-24 Thread Vlastimil Babka

On 08/23/2018 09:27 PM, Michal Hocko wrote:
> On Thu 23-08-18 16:28:12, Vlastimil Babka wrote:
>> Two users have reported [1] that they have an "extremely unlikely" system
>> with more than MAX_PA/2 memory and L1TF mitigation is not effective. Let's
>> make the warning more helpful by suggesting the proper mem=X kernel boot 
>> param,
>> a rough calculation of how much RAM can be lost (not precise if there's holes
>> between MAX_PA/2 and max_pfn in the e820 map) and a link to the L1TF document
>> to help decide if the mitigation is worth the unusable RAM.
>>
>> [1] https://bugzilla.suse.com/show_bug.cgi?id=1105536
>>
>> Suggested-by: Michal Hocko 
>> Cc: sta...@vger.kernel.org
>> Signed-off-by: Vlastimil Babka 
> 
> I wouldn't bother with max_pfn-half_pa part but other than that this is
> much more useful than the original message.

Right, and it causes build failures on some configs.

> Acked-by: Michal Hocko 

Thanks! Here's a v2:

8<
>From 977c5db27fe35a84807850b947bc5678c4d467b3 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka 
Date: Thu, 23 Aug 2018 16:21:29 +0200
Subject: [PATCH] x86/speculation/l1tf: suggest what to do on systems with too
 much RAM

Two users have reported [1] that they have an "extremely unlikely" system
with more than MAX_PA/2 memory and L1TF mitigation is not effective. Let's
make the warning more helpful by suggesting the proper mem=X kernel boot param
to make it effective and a link to the L1TF document to help decide if the
mitigation is worth the unusable RAM.

[1] https://bugzilla.suse.com/show_bug.cgi?id=1105536

Suggested-by: Michal Hocko 
Acked-by: Michal Hocko 
Signed-off-by: Vlastimil Babka 
---
 arch/x86/kernel/cpu/bugs.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index cb4a16292aa7..5c32b5006738 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -702,6 +702,10 @@ static void __init l1tf_select_mitigation(void)
half_pa = (u64)l1tf_pfn_limit() << PAGE_SHIFT;
if (e820__mapped_any(half_pa, ULLONG_MAX - half_pa, E820_TYPE_RAM)) {
pr_warn("System has more than MAX_PA/2 memory. L1TF mitigation 
not effective.\n");
+   pr_info("You may make it effective by booting the kernel with 
mem=%llu parameter.\n",
+   half_pa);
+   pr_info("However, doing so will make a part of your RAM 
unusable.\n");
+   pr_info("Reading Documentation/admin-guide/l1tf.rst might help 
you decide.\n");
return;
}
 
-- 
2.18.0

[PATCH v2 2/2] PCI: meson: add the Amlogic Meson PCIe controller driver

2018-08-24 Thread Hanjie Lin

From: Yue Wang 

The Amlogic Meson PCIe host controller is based on the Synopsys DesignWare
PCI core. This patch adds the driver support for Meson PCIe controller.

Signed-off-by: Yue Wang 
Signed-off-by: Hanjie Lin 
---
 drivers/pci/controller/dwc/Kconfig |  12 +
 drivers/pci/controller/dwc/Makefile|   1 +
 drivers/pci/controller/dwc/pci-meson.c | 613 +
 3 files changed, 626 insertions(+)
 create mode 100644 drivers/pci/controller/dwc/pci-meson.c

diff --git a/drivers/pci/controller/dwc/Kconfig 
b/drivers/pci/controller/dwc/Kconfig
index 91b0194..6cb36f6 100644
--- a/drivers/pci/controller/dwc/Kconfig
+++ b/drivers/pci/controller/dwc/Kconfig
@@ -193,4 +193,16 @@ config PCIE_HISI_STB
help
   Say Y here if you want PCIe controller support on HiSilicon STB SoCs
 
+config PCI_MESON
+   bool "MESON PCIe controller"
+   depends on PCI
+   depends on PCI_MSI_IRQ_DOMAIN
+   select PCIEPORTBUS
+   select PCIE_DW_HOST
+   help
+ Say Y here if you want to enable PCI controller support on Amlogic
+ SoCs. The PCI controller on Amlogic is based on DesignWare hardware
+ and therefore the driver re-uses the DesignWare core functions to
+ implement the driver.
+
 endmenu
diff --git a/drivers/pci/controller/dwc/Makefile 
b/drivers/pci/controller/dwc/Makefile
index 5d2ce72..cf676bd 100644
--- a/drivers/pci/controller/dwc/Makefile
+++ b/drivers/pci/controller/dwc/Makefile
@@ -14,6 +14,7 @@ obj-$(CONFIG_PCIE_ARMADA_8K) += pcie-armada8k.o
 obj-$(CONFIG_PCIE_ARTPEC6) += pcie-artpec6.o
 obj-$(CONFIG_PCIE_KIRIN) += pcie-kirin.o
 obj-$(CONFIG_PCIE_HISI_STB) += pcie-histb.o
+obj-$(CONFIG_PCI_MESON) += pci-meson.o
 
 # The following drivers are for devices that use the generic ACPI
 # pci_root.c driver but don't support standard ECAM config access.
diff --git a/drivers/pci/controller/dwc/pci-meson.c 
b/drivers/pci/controller/dwc/pci-meson.c
new file mode 100644
index 000..a9edf20
--- /dev/null
+++ b/drivers/pci/controller/dwc/pci-meson.c
@@ -0,0 +1,613 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * PCIe host controller driver for Amlogic MESON SoCs
+ *
+ * Copyright (c) 2018 Amlogic, inc.
+ * Author: Yue Wang 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "pcie-designware.h"
+
+#define to_meson_pcie(x) dev_get_drvdata((x)->dev)
+
+/* External local bus interface registers */
+#define PLR_OFFSET 0x700
+#define PCIE_PORT_LINK_CTRL_OFF(PLR_OFFSET + 0x10)
+#define FAST_LINK_MODE BIT(7)
+#define LINK_CAPABLE_MASK  GENMASK(21, 16)
+#define LINK_CAPABLE_X1BIT(16)
+
+#define PCIE_GEN2_CTRL_OFF (PLR_OFFSET + 0x10c)
+#define NUM_OF_LANES_MASK  GENMASK(12, 8)
+#define NUM_OF_LANES_X1BIT(8)
+#define DIRECT_SPEED_CHANGEBIT(17)
+
+#define TYPE1_HDR_OFFSET   0x0
+#define PCIE_STATUS_COMMAND(TYPE1_HDR_OFFSET + 0x04)
+#define PCI_IO_EN  BIT(0)
+#define PCI_MEM_SPACE_EN   BIT(1)
+#define PCI_BUS_MASTER_EN  BIT(2)
+
+#define PCIE_BASE_ADDR0(TYPE1_HDR_OFFSET + 0x10)
+#define PCIE_BASE_ADDR1(TYPE1_HDR_OFFSET + 0x14)
+
+#define PCIE_CAP_OFFSET0x70
+#define PCIE_DEV_CTRL_DEV_STUS (PCIE_CAP_OFFSET + 0x08)
+#define PCIE_CAP_MAX_PAYLOAD_MASK  GENMASK(7, 5)
+#define PCIE_CAP_MAX_PAYLOAD_SIZE(x)   ((x) << 5)
+#define PCIE_CAP_MAX_READ_REQ_MASK GENMASK(14, 12)
+#define PCIE_CAP_MAX_READ_REQ_SIZE(x)  ((x) << 12)
+
+#define PCI_CLASS_REVISION_MASKGENMASK(7, 0)
+
+/* PCIe specific config registers */
+#define PCIE_CFG0  0x0
+#define APP_LTSSM_ENABLE   BIT(7)
+
+#define PCIE_CFG_STATUS12  0x30
+#define IS_SMLH_LINK_UP(x) ((x) & (1 << 6))
+#define IS_RDLH_LINK_UP(x) ((x) & (1 << 16))
+#define IS_LTSSM_UP(x) x) >> 10) & 0x1f) == 0x11)
+
+#define PCIE_CFG_STATUS17  0x44
+#define PM_CURRENT_STATE(x)(((x) >> 7) & 0x1)
+
+#define WAIT_LINKUP_TIMEOUT2000
+#define PORT_CLK_RATE  1UL
+#define MAX_PAYLOAD_SIZE   256
+#define MAX_READ_REQ_SIZE  256
+
+enum pcie_data_rate {
+   PCIE_GEN1,
+   PCIE_GEN2,
+   PCIE_GEN3,
+   PCIE_GEN4
+};
+
+struct meson_pcie_mem_res {
+   void __iomem *elbi_base; /* DT 0th resource */
+   void __iomem *cfg_base; /* DT 2nd resource */
+};
+
+struct meson_pcie_clk_res {
+   struct clk *clk;
+   struct clk *mipi_gate;
+   struct clk *port_clk;
+   struct clk *general_clk;
+};
+
+struct meson_pcie_rc_reset {
+   struct reset_control *port;
+   struct reset_control *apb;
+};
+
+struct meson_pcie {
+   struct dw_pcie pci;

[PATCH v2 0/2] add the Amlogic Meson PCIe controller driver.

2018-08-24 Thread Hanjie Lin

The Amlogic Meson PCIe host controller is based on the Synopsys DesignWare
PCI core. This patchset add the driver and dt-bindings of the controller.

Changes since v1: [0]
 - use gpio lib instead open code
 - move 'apb' and 'port' reset from phy driver
 - format correcting

[0] : https://lkml.org/lkml/2018/8/14/70

Yue Wang (2):
  dt-bindings: PCI: meson: add DT bindings for Amlogic Meson PCIe
controller
  PCI: meson: add the Amlogic Meson PCIe controller driver

 .../devicetree/bindings/pci/amlogic,meson-pcie.txt |  63 +++
 drivers/pci/controller/dwc/Kconfig |  12 +
 drivers/pci/controller/dwc/Makefile|   1 +
 drivers/pci/controller/dwc/pci-meson.c | 613 +
 4 files changed, 689 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/pci/amlogic,meson-pcie.txt
 create mode 100644 drivers/pci/controller/dwc/pci-meson.c

-- 
2.7.4

[PATCH v2 1/2] dt-bindings: PCI: meson: add DT bindings for Amlogic Meson PCIe controller

2018-08-24 Thread Hanjie Lin

From: Yue Wang 

The Amlogic Meson PCIe host controller is based on the Synopsys DesignWare
PCI core. This patch adds documentation for the DT bindings in Meson PCIe
controller.

Signed-off-by: Yue Wang 
Signed-off-by: Hanjie Lin 
---
 .../devicetree/bindings/pci/amlogic,meson-pcie.txt | 63 ++
 1 file changed, 63 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/pci/amlogic,meson-pcie.txt

diff --git a/Documentation/devicetree/bindings/pci/amlogic,meson-pcie.txt 
b/Documentation/devicetree/bindings/pci/amlogic,meson-pcie.txt
new file mode 100644
index 000..8a831d1
--- /dev/null
+++ b/Documentation/devicetree/bindings/pci/amlogic,meson-pcie.txt
@@ -0,0 +1,63 @@
+Amlogic Meson AXG DWC PCIE SoC controller
+
+Amlogic Meson PCIe host controller is based on the Synopsys DesignWare PCI 
core.
+It shares common functions with the PCIe DesignWare core driver and
+inherits common properties defined in
+Documentation/devicetree/bindings/pci/designware-pci.txt.
+
+Additional properties are described here:
+
+Required properties:
+- compatible:
+   should contain "amlogic,axg-pcie" to identify the core.
+- reg:
+   Should contain the configuration address space.
+- reg-names: Must be
+   - "elbi"External local bus interface registers
+   - "cfg" Meson specific registers
+   - "config"  PCIe configuration space
+- reset-gpios: The GPIO to generate PCIe PERST# assert and deassert signal.
+- clocks: Must contain an entry for each entry in clock-names.
+- clock-names: Must include the following entries:
+   - "pclk"   PCIe GEN 100M PLL clock
+   - "port"   PCIe_x(A or B) RC clock gate
+   - "general"PCIe Phy clock
+   - "mipi"   PCIe_x(A or B) 100M ref clock gate
+- resets: phandle to the reset lines.
+- reset-names: must contain "phy" and "peripheral"
+   - "port" Port A or B reset
+   - "apb" APB reset
+
+Example configuration:
+
+   pcie: pcie@f980 {
+   compatible = "amlogic,axg-pcie", "snps,dw-pcie";
+   reg = <0x0 0xf980 0x0 0x40
+   0x0 0xff646000 0x0 0x2000
+   0x0 0xf9f0 0x0 0x10>;
+   reg-names = "elbi", "cfg", "config";
+   reset-gpios = <&gpio GPIOX_19 GPIO_ACTIVE_HIGH>;
+   interrupts = ;
+   #interrupt-cells = <1>;
+   interrupt-map-mask = <0 0 0 0>;
+   interrupt-map = <0 0 0 0 &gic GIC_SPI 179 
IRQ_TYPE_EDGE_RISING>;
+   bus-range = <0x0 0xff>;
+   #address-cells = <3>;
+   #size-cells = <2>;
+   device_type = "pci";
+   phys = <&pcie_phy>;
+   ranges = <0x8200 0 0 0x0 0xf9c0 0 0x0030>;
+
+   clocks = <&clkc CLKID_USB
+   &clkc CLKID_MIPI_ENABLE
+   &clkc CLKID_PCIE_A
+   &clkc CLKID_PCIE_CML_EN0>;
+   clock-names = "general",
+   "mipi",
+   "pclk",
+   "port";
+   resets = <&reset RESET_PCIE_A>,
+   <&reset RESET_PCIE_APB>;
+   reset-names = "port",
+   "apb";
+   };
-- 
2.7.4

Re: [PATCH v6 3/5] clk: imx: add SCCG PLL type

2018-08-24 Thread Sascha Hauer

+Cc Andrey Smirnov who made me aware of this issue.

On Wed, Aug 22, 2018 at 04:48:21PM +0300, Abel Vesa wrote:
> From: Lucas Stach 
> 
> The SCCG is a new PLL type introduced on i.MX8. Add support for this.
> The driver currently misses the PLL lock check, as the preliminary
> documentation mentions lock configurations, but is quiet about where
> to find the actual lock status signal.
> 
> Signed-off-by: Lucas Stach 
> Signed-off-by: Abel Vesa 
> ---
> +static int clk_pll1_set_rate(struct clk_hw *hw, unsigned long rate,
> + unsigned long parent_rate)
> +{
> + struct clk_sccg_pll *pll = to_clk_sccg_pll(hw);
> + u32 val;
> + u32 divf;
> +
> + divf = rate / (parent_rate * 2);
> +
> + val = readl_relaxed(pll->base + PLL_CFG2);
> + val &= ~(PLL_DIVF_MASK << PLL_DIVF1_SHIFT);
> + val |= (divf - 1) << PLL_DIVF1_SHIFT;
> + writel_relaxed(val, pll->base + PLL_CFG2);
> +
> + /* FIXME: PLL lock check */

Shouldn't be too hard to add, no?

> +
> + return 0;
> +}
> +
> +static int clk_pll1_prepare(struct clk_hw *hw)
> +{
> + struct clk_sccg_pll *pll = to_clk_sccg_pll(hw);
> + u32 val;
> +
> + val = readl_relaxed(pll->base);
> + val &= ~(1 << PLL_PD);
> + writel_relaxed(val, pll->base);

pll->base + PLL_CFG0 please.

> +static const struct clk_ops clk_sccg_pll1_ops = {
> + .is_prepared= clk_pll1_is_prepared,
> + .recalc_rate= clk_pll1_recalc_rate,
> + .round_rate = clk_pll1_round_rate,
> + .set_rate   = clk_pll1_set_rate,
> +};
> +
> +static const struct clk_ops clk_sccg_pll2_ops = {
> + .prepare= clk_pll1_prepare,
> + .unprepare  = clk_pll1_unprepare,
> + .recalc_rate= clk_pll2_recalc_rate,
> + .round_rate = clk_pll2_round_rate,
> + .set_rate   = clk_pll2_set_rate,
> +};

So these are two PLLs that share the same enable register. Doing the
prepare/unprepare for only one PLL can lead to all kinds of trouble.
Finding a good abstraction the properly handles this case with the
clock framework is probably also not easy.

I could imagine we'll need to track the enable state on both PLLs and
only if both are disabled we disable it in hardware.

With the current code we disable the PLLs when all consumers are
reparented to pll1, which probably has bad effects.

Sascha

-- 
Pengutronix e.K.   | |
Industrial Linux Solutions | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0|
Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |

[GIT PULL] s390 patches for the 4.19 merge window #2

2018-08-24 Thread Martin Schwidefsky

Hi Linus,

please pull from the 'for-linus' branch of

git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git for-linus

to receive the following updates:

- A couple of patches for the zcrypt driver
  + Add two masks to determine which AP cards and queues are host devices,
this will be useful for KVM AP device passthrough
  + Add-on patch to improve the parsing of the new apmask and aqmask
  + Some code beautification

- Second try to reenable the GCC plugins, the first patch set had a
  patch to do this but the merge somehow missed this

- Remove the s390 specific GCC version check and use the generic one

- Three patches for kdump, two bug fixes and one cleanup

- Three patches for the PCI layer, one bug fix and two cleanups

Harald Freudenberger (5):
  s390/zcrypt: fix ap_instructions_available() returncodes
  s390/zcrypt: switch return type to bool for ap_instructions_available()
  s390/zcrypt: code beautify
  s390/zcrypt: AP bus support for alternate driver(s)
  s390/zcrypt: hex string mask improvements for apmask and aqmask.

Heiko Carstens (2):
  s390: reenable gcc plugins for real
  s390: remove gcc version check (4.3 or newer)

Philipp Rudo (3):
  s390/kdump: Make elfcorehdr size calculation ABI compliant
  s390/kdump: Fix memleak in nt_vmcoreinfo
  s390/kdump: Remove kzalloc_panic

Sebastian Ott (3):
  s390/pci: fix out of bounds access during irq setup
  s390/pci: remove stale rc
  s390/pci: remove fmb address from debug output

 arch/s390/Kconfig  |   2 +-
 arch/s390/include/asm/ap.h |  14 +-
 arch/s390/include/uapi/asm/zcrypt.h|  72 +++---
 arch/s390/kernel/asm-offsets.c |   8 -
 arch/s390/kernel/crash_dump.c  |  70 +++---
 arch/s390/pci/pci.c|   3 +-
 arch/s390/pci/pci_debug.c  |   1 -
 drivers/s390/crypto/ap_bus.c   | 432 ++---
 drivers/s390/crypto/ap_bus.h   |  36 ++-
 drivers/s390/crypto/ap_card.c  |  50 ++--
 drivers/s390/crypto/ap_queue.c |  38 +--
 drivers/s390/crypto/pkey_api.c |  91 +++
 drivers/s390/crypto/zcrypt_api.c   |  30 ++-
 drivers/s390/crypto/zcrypt_api.h   |   2 +-
 drivers/s390/crypto/zcrypt_card.c  |  29 ++-
 drivers/s390/crypto/zcrypt_cca_key.h   |  12 +-
 drivers/s390/crypto/zcrypt_cex2a.c |   2 +
 drivers/s390/crypto/zcrypt_cex2a.h |  18 +-
 drivers/s390/crypto/zcrypt_cex4.c  |   2 +
 drivers/s390/crypto/zcrypt_error.h |   2 +-
 drivers/s390/crypto/zcrypt_msgtype50.c |  17 +-
 drivers/s390/crypto/zcrypt_msgtype50.h |   8 +-
 drivers/s390/crypto/zcrypt_msgtype6.c  |  34 ++-
 drivers/s390/crypto/zcrypt_msgtype6.h  |   2 +-
 drivers/s390/crypto/zcrypt_pcixcc.c|   6 +-
 drivers/s390/crypto/zcrypt_pcixcc.h|   2 +-
 drivers/s390/crypto/zcrypt_queue.c |  23 +-
 27 files changed, 706 insertions(+), 300 deletions(-)

Re: [PATCH v6 4/5] clk: imx: add imx composite clock

2018-08-24 Thread Sascha Hauer

On Wed, Aug 22, 2018 at 04:48:22PM +0300, Abel Vesa wrote:
> Since a lot of clocks on imx8 are formed by a mux, gate, predivider and
> divider, the idea here is to combine all of those into one composite clock,
> but we need to deal with both predivider and divider at the same time and
> therefore we add the imx_clk_composite_divider_ops and register the composite
> clock with those.
> 
> Signed-off-by: Abel Vesa 
> Suggested-by: Sascha Hauer 
> ---
>  drivers/clk/imx/Makefile|   1 +
>  drivers/clk/imx/clk-composite.c | 157 
> 
>  drivers/clk/imx/clk.h   |   9 +++
>  3 files changed, 167 insertions(+)
>  create mode 100644 drivers/clk/imx/clk-composite.c
> 
> diff --git a/drivers/clk/imx/Makefile b/drivers/clk/imx/Makefile
> index b87513c..4fabb0a 100644
> --- a/drivers/clk/imx/Makefile
> +++ b/drivers/clk/imx/Makefile
> @@ -3,6 +3,7 @@
>  obj-y += \
>   clk.o \
>   clk-busy.o \
> + clk-composite.o \
>   clk-cpu.o \
>   clk-fixup-div.o \
>   clk-fixup-mux.o \
> diff --git a/drivers/clk/imx/clk-composite.c b/drivers/clk/imx/clk-composite.c
> new file mode 100644
> index 000..a5c0080
> --- /dev/null
> +++ b/drivers/clk/imx/clk-composite.c
> @@ -0,0 +1,157 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright 2018 NXP
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "clk.h"
> +
> +#define PCG_PREDIV_SHIFT 16
> +#define PCG_PREDIV_WIDTH 3
> +
> +#define PCG_DIV_SHIFT0
> +#define PCG_DIV_WIDTH6
> +
> +#define PCG_PCS_SHIFT24
> +#define PCG_PCS_MASK 0x7
> +
> +#define PCG_CGC_SHIFT28
> +
> +static unsigned long imx_clk_composite_divider_recalc_rate(struct clk_hw *hw,
> + unsigned long parent_rate)
> +{
> + struct clk_divider *divider = to_clk_divider(hw);
> + unsigned long prediv_rate;
> + unsigned int prediv_value;
> + unsigned int div_value;
> +
> + prediv_value = clk_readl(divider->reg) >> divider->shift;
> + prediv_value &= clk_div_mask(divider->width);
> +
> + prediv_rate = divider_recalc_rate(hw, parent_rate, prediv_value,
> + divider->table, divider->flags,
> + divider->width);
> +
> + div_value = clk_readl(divider->reg) >> PCG_DIV_SHIFT;
> + div_value &= clk_div_mask(PCG_DIV_WIDTH);
> +
> + return divider_recalc_rate(hw, prediv_rate, div_value, divider->table,
> +divider->flags, PCG_DIV_WIDTH);

This is no table based divider, so divider->table is NULL, right? It's
clearer to just write NULL here instead.

Otherwise this looks good for me now.

Sascha


-- 
Pengutronix e.K.   | |
Industrial Linux Solutions | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0|
Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |

Re: [PATCH v2 1/3] mm: rework memcg kernel stack accounting

2018-08-24 Thread Michal Hocko

On Thu 23-08-18 09:23:50, Roman Gushchin wrote:
> On Wed, Aug 22, 2018 at 04:12:13PM +0200, Michal Hocko wrote:
[...]
> > > @@ -248,9 +253,20 @@ static unsigned long *alloc_thread_stack_node(struct 
> > > task_struct *tsk, int node)
> > >  static inline void free_thread_stack(struct task_struct *tsk)
> > >  {
> > >  #ifdef CONFIG_VMAP_STACK
> > > - if (task_stack_vm_area(tsk)) {
> > > + struct vm_struct *vm = task_stack_vm_area(tsk);
> > > +
> > > + if (vm) {
> > >   int i;
> > >  
> > > + for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++) {
> > > + mod_memcg_page_state(vm->pages[i],
> > > +  MEMCG_KERNEL_STACK_KB,
> > > +  -(int)(PAGE_SIZE / 1024));
> > > +
> > > + memcg_kmem_uncharge(vm->pages[i],
> > > + compound_order(vm->pages[i]));
> > 
> > when do we have order > 0 here?
> 
> I guess, it's not possible, but hard-coded 1 looked a bit crappy.
> Do you think it's better?

I guess you meant 0 here. Well, I do not mind, I was just wondering
whether I am missing something.
-- 
Michal Hocko
SUSE Labs

[tip:x86/urgent] x86/speculation/l1tf: Fix off-by-one error when warning that system has too much RAM

2018-08-24 Thread tip-bot for Vlastimil Babka

Commit-ID:  b0a182f875689647b014bc01d36b340217792852
Gitweb: https://git.kernel.org/tip/b0a182f875689647b014bc01d36b340217792852
Author: Vlastimil Babka 
AuthorDate: Thu, 23 Aug 2018 15:44:18 +0200
Committer:  Thomas Gleixner 
CommitDate: Fri, 24 Aug 2018 09:51:14 +0200

x86/speculation/l1tf: Fix off-by-one error when warning that system has too 
much RAM

Two users have reported [1] that they have an "extremely unlikely" system
with more than MAX_PA/2 memory and L1TF mitigation is not effective. In
fact it's a CPU with 36bits phys limit (64GB) and 32GB memory, but due to
holes in the e820 map, the main region is almost 500MB over the 32GB limit:

[0.00] BIOS-e820: [mem 0x0001-0x00081eff] usable

Suggestions to use 'mem=32G' to enable the L1TF mitigation while losing the
500MB revealed, that there's an off-by-one error in the check in
l1tf_select_mitigation().

l1tf_pfn_limit() returns the last usable pfn (inclusive) and the range
check in the mitigation path does not take this into account.

Instead of amending the range check, make l1tf_pfn_limit() return the first
PFN which is over the limit which is less error prone. Adjust the other
users accordingly.

[1] https://bugzilla.suse.com/show_bug.cgi?id=1105536

Fixes: 17dbca119312 ("x86/speculation/l1tf: Add sysfs reporting for l1tf")
Reported-by: George Anchev 
Reported-by: Christopher Snowhill 
Signed-off-by: Vlastimil Babka 
Signed-off-by: Thomas Gleixner 
Cc: "H . Peter Anvin" 
Cc: Linus Torvalds 
Cc: Andi Kleen 
Cc: Dave Hansen 
Cc: Michal Hocko 
Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/20180823134418.17008-1-vba...@suse.cz

---
 arch/x86/include/asm/processor.h | 2 +-
 arch/x86/mm/init.c   | 2 +-
 arch/x86/mm/mmap.c   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index a0a52274cb4a..c24297268ebc 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -183,7 +183,7 @@ extern void cpu_detect(struct cpuinfo_x86 *c);
 
 static inline unsigned long long l1tf_pfn_limit(void)
 {
-   return BIT_ULL(boot_cpu_data.x86_phys_bits - 1 - PAGE_SHIFT) - 1;
+   return BIT_ULL(boot_cpu_data.x86_phys_bits - 1 - PAGE_SHIFT);
 }
 
 extern void early_cpu_init(void);
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 02de3d6065c4..63a6f9fcaf20 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -923,7 +923,7 @@ unsigned long max_swapfile_size(void)
 
if (boot_cpu_has_bug(X86_BUG_L1TF)) {
/* Limit the swap file size to MAX_PA/2 for L1TF workaround */
-   unsigned long long l1tf_limit = l1tf_pfn_limit() + 1;
+   unsigned long long l1tf_limit = l1tf_pfn_limit();
/*
 * We encode swap offsets also with 3 bits below those for pfn
 * which makes the usable limit higher.
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index f40ab8185d94..1e95d57760cf 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -257,7 +257,7 @@ bool pfn_modify_allowed(unsigned long pfn, pgprot_t prot)
/* If it's real memory always allow */
if (pfn_valid(pfn))
return true;
-   if (pfn > l1tf_pfn_limit() && !capable(CAP_SYS_ADMIN))
+   if (pfn >= l1tf_pfn_limit() && !capable(CAP_SYS_ADMIN))
return false;
return true;
 }

[tip:x86/urgent] x86/speculation/l1tf: Suggest what to do on systems with too much RAM

2018-08-24 Thread tip-bot for Vlastimil Babka

Commit-ID:  4b81eae3d37dee69231592182e1e34706f149a6e
Gitweb: https://git.kernel.org/tip/4b81eae3d37dee69231592182e1e34706f149a6e
Author: Vlastimil Babka 
AuthorDate: Thu, 23 Aug 2018 16:21:29 +0200
Committer:  Thomas Gleixner 
CommitDate: Fri, 24 Aug 2018 09:51:14 +0200

x86/speculation/l1tf: Suggest what to do on systems with too much RAM

Two users have reported [1] that they have an "extremely unlikely" system
with more than MAX_PA/2 memory and L1TF mitigation is not effective.

Make the warning more helpful by suggesting the proper mem=X kernel boot
parameter to make it effective and a link to the L1TF document to help
decide if the mitigation is worth the unusable RAM.

[1] https://bugzilla.suse.com/show_bug.cgi?id=1105536

Suggested-by: Michal Hocko 
Signed-off-by: Vlastimil Babka 
Acked-by: Michal Hocko 
Cc: "H . Peter Anvin" 
Cc: Linus Torvalds 
Cc: Andi Kleen 
Cc: Dave Hansen 
Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/966571f0-9d7f-43dc-92c6-a10eec7a1...@suse.cz
---
 arch/x86/kernel/cpu/bugs.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index cb4a16292aa7..5c32b5006738 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -702,6 +702,10 @@ static void __init l1tf_select_mitigation(void)
half_pa = (u64)l1tf_pfn_limit() << PAGE_SHIFT;
if (e820__mapped_any(half_pa, ULLONG_MAX - half_pa, E820_TYPE_RAM)) {
pr_warn("System has more than MAX_PA/2 memory. L1TF mitigation 
not effective.\n");
+   pr_info("You may make it effective by booting the kernel with 
mem=%llu parameter.\n",
+   half_pa);
+   pr_info("However, doing so will make a part of your RAM 
unusable.\n");
+   pr_info("Reading Documentation/admin-guide/l1tf.rst might help 
you decide.\n");
return;
}

Re: [BUGFIX PATCH -tip] kprobes/x86: Fix to copy RIP relative instruction correctly

2018-08-24 Thread Masami Hiramatsu

On Thu, 23 Aug 2018 21:41:09 -0400
Steven Rostedt  wrote:

> On Fri, 24 Aug 2018 02:16:12 +0900
> Masami Hiramatsu  wrote:
> 
> > Dump of assembler code from 0xa000207a to 0xa00020ea:
> > 54  push   %rsp
> > ...
> > 48 83 c4 08 add$0x8,%rsp
> > 9d  popfq
> > 48 89 f0mov%rsi,%rax
> > 8b 35 82 7d db e2   mov-0x1d24827e(%rip),%esi
> > # 0x82db9e67 
> > 
> > As it shows, the 2nd mov accesses *(nr_cpu_ids+3) instead of
> > *nr_cpu_ids. This leads a kernel freeze because cpumask_next()
> > always returns 0 and for_each_cpu() never ended.
> 
> Ouch! Nice catch.
> 
> > 
> > Fixing this by adding len correctly to real RIP address while
> > copying.
> > 
> > Fixes: 63fef14fc98a ("kprobes/x86: Make insn buffer always ROX and use 
> > text_poke()")
> > Reported-by: Michael Rodin 
> > Signed-off-by: Masami Hiramatsu 
> > Cc: sta...@vger.kernel.org
> > ---
> >  arch/x86/kernel/kprobes/opt.c |3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c
> > index eaf02f2e7300..e92672b8b490 100644
> > --- a/arch/x86/kernel/kprobes/opt.c
> > +++ b/arch/x86/kernel/kprobes/opt.c
> > @@ -189,7 +189,8 @@ static int copy_optimized_instructions(u8 *dest, u8 
> > *src, u8 *real)
> > int len = 0, ret;
> >  
> > while (len < RELATIVEJUMP_SIZE) {
> > -   ret = __copy_instruction(dest + len, src + len, real, &insn);
> > +   ret = __copy_instruction(dest + len, src + len, real + len,
> > +   &insn);
> > if (!ret || !can_boost(&insn, src + len))
> > return -EINVAL;
> > len += ret;
> 
> Looking at the change that broke this we have:
> 
> > -static int copy_optimized_instructions(u8 *dest, u8 *src)
> > +static int copy_optimized_instructions(u8 *dest, u8 *src, u8 *real)
> >  {
> > struct insn insn;
> > int len = 0, ret;
> >  
> > while (len < RELATIVEJUMP_SIZE) {
> > -   ret = __copy_instruction(dest + len, src + len, &insn);
> > +   ret = __copy_instruction(dest + len, src + len, real, 
> > &insn);
> 
> Where "real" was added as a parameter to __copy_instruction. Note that
> we pass in "dest + len" but not "real + len" as you patch fixes.
> __copy_instruction was changed by the bad commit with:
> 
> > -int __copy_instruction(u8 *dest, u8 *src, struct insn *insn)
> > +int __copy_instruction(u8 *dest, u8 *src, u8 *real, struct insn *insn)
> >  {
> > kprobe_opcode_t buf[MAX_INSN_SIZE];
> > unsigned long recovered_insn =
> > @@ -387,11 +388,11 @@ int __copy_instruction(u8 *dest, u8 *src, struct insn 
> > *insn)
> >  * have given.
> >  */
> > newdisp = (u8 *) src + (s64) insn->displacement.value
> > - - (u8 *) dest;
> > + - (u8 *) real;
> 
> "real" replaces "dest", which was the first parameter to __copy_instruction.
> 
> > return 0;
> 
> And:
> 
> >  int arch_prepare_optimized_kprobe(struct optimized_kprobe *op,
> >   struct kprobe *__unused)
> >  {
> > -   u8 *buf;
> > -   int ret;
> > +   u8 *buf = NULL, *slot;
> > +   int ret, len;
> > long rel;
> >  
> > if (!can_optimize((unsigned long)op->kp.addr))
> > return -EILSEQ;
> >  
> > -   op->optinsn.insn = get_optinsn_slot();
> > -   if (!op->optinsn.insn)
> > +   buf = kzalloc(MAX_OPTINSN_SIZE, GFP_KERNEL);
> > +   if (!buf)
> > return -ENOMEM;
> >  
> > +   op->optinsn.insn = slot = get_optinsn_slot();
> > +   if (!slot) {
> > +   ret = -ENOMEM;
> > +   goto out;
> > +   }
> > +
> > /*
> >  * Verify if the address gap is in 2GB range, because this uses
> >  * a relative jump.
> >  */
> > -   rel = (long)op->optinsn.insn - (long)op->kp.addr + 
> > RELATIVEJUMP_SIZE;
> > +   rel = (long)slot - (long)op->kp.addr + RELATIVEJUMP_SIZE;
> > if (abs(rel) > 0x7fff) {
> > -   __arch_remove_optimized_kprobe(op, 0);
> > -   return -ERANGE;
> > +   ret = -ERANGE;
> > +   goto err;
> > }
> >  
> > -   buf = (u8 *)op->optinsn.insn;
> 
> "slot" is equivalent to the old "buf".
> 
> > -   set_memory_rw((unsigned long)buf & PAGE_MASK, 1);
> > +   /* Copy arch-dep-instance from template */
> > +   memcpy(buf, &optprobe_template_entry, TMPL_END_IDX);
> >  
> > /* Copy instructions into the out-of-line buffer */
> > -   ret = copy_optimized_instructions(buf + TMPL_END_IDX, op->kp.addr);
> > -   if (ret < 0) {
> > -   __arch_remove_optimized_kprobe(op, 0);
> > -   return ret;
> > -   }
> > +   ret = copy_optimized_instructions(buf + TMPL_END_IDX, op->kp.addr,
> > +

Re: [PATCH] mm/hugetlb: filter out hugetlb pages if HUGEPAGE migration is not supported.

2018-08-24 Thread Michal Hocko

On Fri 24-08-18 12:03:14, Aneesh Kumar K.V wrote:
> When scanning for movable pages, filter out Hugetlb pages if hugepage 
> migration
> is not supported. Without this we hit infinte loop in __offline pages where we
> do
>   pfn = scan_movable_pages(start_pfn, end_pfn);
>   if (pfn) { /* We have movable pages */
>   ret = do_migrate_range(pfn, end_pfn);
>   goto repeat;
>   }
> 
> We do support hugetlb migration ony if the hugetlb pages are at pmd level. 
> Here
> we just check for Kernel config. The gigantic page size check is done in
> page_huge_active.

Well, this is a bit misleading. I would say that

Fix this by checking hugepage_migration_supported both in has_unmovable_pages
which is the primary backoff mechanism for page offlining and for
consistency reasons also into scan_movable_pages because it doesn't make
any sense to return a pfn to non-migrateable huge page.

> Acked-by: Michal Hocko 
> Reported-by: Haren Myneni 
> CC: Naoya Horiguchi 
> Signed-off-by: Aneesh Kumar K.V 

I would add
Fixes: 72b39cfc4d75 ("mm, memory_hotplug: do not fail offlining too early")

Not because the bug has been introduced by that commit but rather
because the issue would be latent before that commit.

My Acked-by still holds.

> ---
>  mm/memory_hotplug.c | 3 ++-
>  mm/page_alloc.c | 4 
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 9eea6e809a4e..38d94b703e9d 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1333,7 +1333,8 @@ static unsigned long scan_movable_pages(unsigned long 
> start, unsigned long end)
>   if (__PageMovable(page))
>   return pfn;
>   if (PageHuge(page)) {
> - if (page_huge_active(page))
> + if 
> (hugepage_migration_supported(page_hstate(page)) &&
> + page_huge_active(page))
>   return pfn;
>   else
>   pfn = round_up(pfn + 1,
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c677c1506d73..b8d91f59b836 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7709,6 +7709,10 @@ bool has_unmovable_pages(struct zone *zone, struct 
> page *page, int count,
>* handle each tail page individually in migration.
>*/
>   if (PageHuge(page)) {
> +
> + if (!hugepage_migration_supported(page_hstate(page)))
> + goto unmovable;
> +
>   iter = round_up(iter + 1, 1<   continue;
>   }
> -- 
> 2.17.1

-- 
Michal Hocko
SUSE Labs

Re: [PATCH v6 5/5] clk: imx: add clock driver for i.MX8MQ CCM

2018-08-24 Thread Sascha Hauer

On Wed, Aug 22, 2018 at 04:48:23PM +0300, Abel Vesa wrote:
> From: Lucas Stach 
> 
> Add driver for the Clock Control Module found on i.MX8MQ.
> 
> This is largely based on the downstream driver from Anson Huang and
> Bai Ping at NXP, with only some small adaptions to mainline from me.

It's time to rephrase the commit message. With the new composite clock
the adaptions are no longer that small.

> +
> +static int const clks_init_on[] __initconst = {
> + IMX8MQ_CLK_DRAM_CORE, IMX8MQ_CLK_AHB_CG,
> + IMX8MQ_CLK_NOC, IMX8MQ_CLK_NOC_APB,
> + IMX8MQ_CLK_USB_BUS, IMX8MQ_CLK_NAND_USDHC_BUS,
> + IMX8MQ_CLK_MAIN_AXI, IMX8MQ_CLK_A53_CG,
> + IMX8MQ_CLK_AUDIO_AHB_DIV, IMX8MQ_CLK_TMU_ROOT,
> + IMX8MQ_CLK_DRAM_APB,
> +};

This is unused and you seem to have converted all these clocks to add
the CLK_IS_CRITICAL flag.

Are all these really needed? some clocks like IMX8MQ_CLK_AUDIO_AHB_DIV,
IMX8MQ_CLK_USB_BUS and IMX8MQ_CLK_NAND_USDHC_BUS look suspicious.

Sascha


-- 
Pengutronix e.K.   | |
Industrial Linux Solutions | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0|
Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |

Re: [PATCH] perf annotate: fix parsing aarch64 branch instructions after objdump update

2018-08-24 Thread Thomas-Mich Richter

On 08/24/2018 02:10 AM, Kim Phillips wrote:
> Starting with binutils 2.28, aarch64 objdump adds comments to the
> disassembly output to show the alternative names of a condition code [1].
> 
> It is assumed that commas in objdump comments could occur in other arches
> now or in the future, so this fix is arch-independent.
> 
> The fix could have been done with arm64 specific jump__parse and
> jump__scnprintf functions, but the jump__scnprintf instruction would
> have to have its comment character be a literal, since the scnprintf
> functions cannot receive a struct arch easily.
> 
> This inconvenience also applies to the generic jump__scnprintf, which
> is why we add a raw_comment pointer to struct ins_operands, so the
> __parse function assigns it to be re-used by its corresponding __scnprintf
> function.
> 
> Example differences in 'perf annotate --stdio2' output on an
> aarch64 perf.data file:
> 
> BEFORE: → b.cs   28133d1c   // b.hs, d7ecc47b
> AFTER : ↓ b.cs   18c
> 
> BEFORE: → b.cc   28d8d9cc   // b.lo, b.ul, 
> d727295b
> AFTER : ↓ b.cc   31c
> 
> The branch target labels 18c and 31c also now appear in the output:
> 
> BEFORE:addx26, x29, #0x80
> AFTER : 18c:   addx26, x29, #0x80
> 
> BEFORE:addx21, x21, #0x8
> AFTER : 31c:   addx21, x21, #0x8
> 
> The Fixes: tag below is added so stable branches will get the update; it
> doesn't necessarily mean that commit was broken at the time, rather it
> didn't withstand the aarch64 objdump update.
> 
> Tested no difference in output for sample x86_64, power arch perf.data files.

Tested,  no difference in output on s390. Just to let you know.
-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may have revealed another problem

2018-08-24 Thread Thomas Gleixner

On Fri, 24 Aug 2018, Heiner Kallweit wrote:
> On 24.08.2018 06:12, Frederic Weisbecker wrote:
> > On Thu, Aug 16, 2018 at 08:13:03AM +0200, Heiner Kallweit wrote:
> >> Recently I started to get warning "NOHZ: local_softirq_pending 202" and
> >> I think it's related to mentioned commit (didn't bisect it yet).
> >> See log from suspending.
> >>
> >> I have no reason to think the fix is wrong, it may just have revealed
> >> another issue which existed before and was hidden by the bug.
> >>
> >> Rgds, Heiner
> >>
> >> [   75.073353] random: crng init done
> >> [   75.073402] random: 7 urandom warning(s) missed due to ratelimiting
> >> [   78.619564] PM: suspend entry (deep)
> >> [   78.619675] PM: Syncing filesystems ... done.
> >> [   78.653684] Freezing user space processes ... (elapsed 0.002 seconds) 
> >> done.
> >> [   78.656094] OOM killer disabled.
> >> [   78.656113] Freezing remaining freezable tasks ... (elapsed 0.001 
> >> seconds) done.
> >> [   78.658177] Suspending console(s) (use no_console_suspend to debug)
> >> [   78.663066] nuvoton-cir 00:07: disabled
> >> [   78.671817] sd 0:0:0:0: [sda] Synchronizing SCSI cache
> >> [   78.672210] sd 0:0:0:0: [sda] Stopping disk
> >> [   78.786651] ACPI: Preparing to enter system sleep state S3
> >> [   78.789613] PM: Saving platform NVS memory
> >> [   78.789759] Disabling non-boot CPUs ...
> >> [   78.805154] NOHZ: local_softirq_pending 202
> >> [   78.805182] NOHZ: local_softirq_pending 202
> >> [   78.807102] smpboot: CPU 1 is now offline
> > 
> > I've tried to reproduce with suspend on disk but got unsuccessful.
> > 
> > A small question as I see someone is having a similar issue with a stable
> > release only. On which kernel did you trigger that: upstream or stable?
> > 
> > I'll continue investigating.
> > 
> > Thanks.
> > 
> Affected is recent linux-next, after the commit mentioned in the subject.
> I can work around the warning (not sure whether it's a proper fix),
> see here:
> https://lkml.org/lkml/2018/8/18/272

Can you try the one I posted in this thread:

 
https://lkml.kernel.org/r/alpine.deb.2.21.1808240851420.1...@nanos.tec.linutronix.de

Also below for reference.

Thanks,

tglx

8<
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 5b33e2f5c0ed..6aab9d54a331 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -888,7 +888,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched 
*ts)
if (unlikely(local_softirq_pending() && cpu_online(cpu))) {
static int ratelimit;
 
-   if (ratelimit < 10 &&
+   if (ratelimit < 10 && !in_softirq() &&
(local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) {
pr_warn("NOHZ: local_softirq_pending %02x\n",
(unsigned int) local_softirq_pending());

Re: [RFC PATCH 0/2] minor mmu_gather patches

2018-08-24 Thread Peter Zijlstra

On Thu, Aug 23, 2018 at 12:15:37PM -0700, Linus Torvalds wrote:
> PeterZ - your "mm/tlb, x86/mm: Support invalidating TLB caches for
> RCU_TABLE_FREE" patch looks exactly the same, but it now no longer has
> the split of tlb_flush_mmu_tlbonly(), since with Nick's patch to move
> the call to tlb_table_flush(tlb) into tlb_flush_mmu_free, there's no
> need for the separate double-underscore version.
> 
> I hope nothing I did screwed things up. It all looks sane to me.
> Famous last words.

Sorry; I got distracted by building a bunk bed for the kids -- and
somehow these things always take way more time than expected.

Anyway, the code looks good to me. Thanks all!

Re: [PATCH v1] KVM: s390: store DXC/VXC in fpc on DATA/Vector-processing exceptions

2018-08-24 Thread David Hildenbrand

On 24.08.2018 08:34, Christian Borntraeger wrote:
> 
> 
> On 08/23/2018 07:44 PM, David Hildenbrand wrote:
>> On 23.08.2018 17:43, Christian Borntraeger wrote:
>>>
>>>
>>> On 08/22/2018 11:53 AM, David Hildenbrand wrote:
 When DATA exceptions and vector-processing exceptions (program interrupts)
 are injected, the DXC/VXC is also to be stored in the fpc, if AFP is
 enabled in CR0.

 This can happen inside KVM when reinjecting an interrupt during program
 interrupt intercepts. These are triggered for example when debugging the
 guest (concurrent PER events result in an intercept instead of an
 injection of such interrupts).

 Signed-off-by: David Hildenbrand 
 ---

 Only compile-tested.
>>>
>>> I checked the Linux code (arch/s390/kernel/traps.c) and Linux uses the FPC 
>>> (and
>>> not the lowcore field) to decide about the signal (SIGFPE) and si_code. So 
>>> we want
>>> to have the correct DXC/VXC value.
>>>
>>> Now, I wrote a short test program that does
>>> feenableexcept(FE_DIVBYZERO);
>>> and a division by zero.
>>> and attached gdb to that guest together with a breakpoint on the divide 
>>> (and the instruction
>>> after).
>>> I get the pint exit for the instruction after (as it is suppressing) and at 
>>> this point in
>>> time the guest fpc already contains the correct DXC value. So you patch 
>>> will certainly not
>>> hurt, but it seems not necessary.
>>
>> Thanks for trying. Wonder if that is documented behavior or just works
>> by pure luck.
> 
> 
> My guess is, that this is works as designed. There is the interruption
> parameter block that is used instead of the guest lowcore for program
> interrupt exits. To me it looks like that everything is "prepared" except
> for the psw swap itself and the data in the lowcore. The data is written 
> to the interruption parameter block instead. So that the hypervisor then
> just has to move the data and do the psw swap. 
> 
>>
>> E.g. it would be interesting to see what other instructions do that
>> usually don't touch the DXC, except when injecting an exception. E.g. CRT.
>>
>> But if you believe this is not needed, we can also drop it. (if ever
>> somebody would want to inject from QEMU, he could also just set the fpc
>> directly)
> 
> The (unlikely to ever happen) inject from QEMU is indeed a thing where this
> patch would simplify things.
> 
> I will talk to some hardware folks to verify my assumption but for the time
> being, lets drop this patch.

Just tested CRTG, and it also seems to work fine when single-stepping
over it, landing in the PGM handler.

-- 

Thanks,

David / dhildenb

Re: linux-next: build warnings from the build of Linus' tree

2018-08-24 Thread Masami Hiramatsu

On Fri, 24 Aug 2018 13:32:06 +1000
Stephen Rothwell  wrote:

> Hi all,
> 
> After merging the origin tree, today's linux-next build (powerpc
> allyesconfig) produced these warnings:
> 
> ld: warning: orphan section `.data..LPBX1' from 
> `kernel/trace/trace_selftest_dynamic.o' being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from 
> `kernel/trace/trace_kprobe_selftest.o' being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from `kernel/trace/trace_clock.o' 
> being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from `kernel/trace/ftrace.o' being 
> placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from `kernel/trace/ring_buffer.o' 
> being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from 
> `kernel/trace/ring_buffer_benchmark.o' being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from `kernel/trace/trace.o' being 
> placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from `kernel/trace/trace_output.o' 
> being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from `kernel/trace/trace_seq.o' 
> being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from `kernel/trace/trace_stat.o' 
> being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from `kernel/trace/trace_printk.o' 
> being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from `kernel/trace/tracing_map.o' 
> being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from 
> `kernel/trace/trace_sched_switch.o' being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from 
> `kernel/trace/trace_functions.o' being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from 
> `kernel/trace/trace_preemptirq.o' being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from 
> `kernel/trace/trace_irqsoff.o' being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from 
> `kernel/trace/trace_sched_wakeup.o' being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from `kernel/trace/trace_hwlat.o' 
> being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from `kernel/trace/trace_nop.o' 
> being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from `kernel/trace/trace_stack.o' 
> being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from 
> `kernel/trace/trace_functions_graph.o' being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from `kernel/trace/blktrace.o' 
> being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from `kernel/trace/trace_events.o' 
> being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from `kernel/trace/trace_export.o' 
> being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from 
> `kernel/trace/trace_syscalls.o' being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from 
> `kernel/trace/trace_event_perf.o' being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from 
> `kernel/trace/trace_events_filter.o' being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from 
> `kernel/trace/trace_events_trigger.o' being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from 
> `kernel/trace/trace_events_hist.o' being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from `kernel/trace/bpf_trace.o' 
> being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from `kernel/trace/trace_kprobe.o' 
> being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from `kernel/trace/power-traces.o' 
> being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from `kernel/trace/rpm-traces.o' 
> being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from `kernel/trace/trace_kdb.o' 
> being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from `kernel/trace/trace_probe.o' 
> being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from `kernel/trace/trace_uprobe.o' 
> being placed in section `.data..LPBX1'
> ld: warning: orphan section `.data..LPBX1' from 
> `kernel/trace/trace_benchmark.o' being placed in section `.data..LPBX1'
> 
> Maybe introduced by commit
> 
>   6b7dca401cb1 ("tracing: Allow gcov profiling on only ftrace subsystem")
> 
> I am guessing, but that is the only new thing that affects all of
> kernel/trace ...

Yes, I agree. But I just followed Documentation/dev

Re: [PATCH RESEND v1 2/5] drivers: pinctrl: msm: enable PDC interrupt only during suspend

2018-08-24 Thread Stephen Boyd

Quoting Lina Iyer (2018-08-17 12:10:23)
> During suspend the system may power down some of the system rails. As a
> result, the TLMM hw block may not be operational anymore and wakeup
> capable GPIOs will not be detected. The PDC however will be operational
> and the GPIOs that are routed to the PDC as IRQs can wake the system up.
> 
> To avoid being interrupted twice (for TLMM and once for PDC IRQ) when a
> GPIO trips, use TLMM for active and switch to PDC for suspend. When
> entering suspend, disable the TLMM wakeup interrupt and instead enable
> the PDC IRQ and revert upon resume.

What about idle paths? Don't we want to disable the TLMM interrupt and
enable the PDC interrupt when the whole cluster goes idle so we get
wakeup interrupts? It's really unfortunate that the hardware can't
replay the interrupt from PDC to TLMM when it knows TLMM didn't get the
interrupt (because the whole chip was off) or the GIC didn't get the
summary irq (because the GIC was powered off). A little more hardware
effort would make this completely transparent to software and make TLMM
work across all low power modes.

Because of this complicated dance, it may make sense to always get the
interrupt at the PDC and then replay it into the TLMM chip "manually"
with the irq_set_irqchip_state() APIs. This way the duplicate interrupt
can't happen. The only way for the interrupt handler to run would be by
PDC poking the TLMM hardware to inject the irq into the status register.
I think with the TLMM that's possible if we configure the pin to have
the raw status bit disabled (so that edges on the physical line don't
latch into the GPIO interrupt status register) and the normal status bit
enabled (so that if the status register changes we'll interrupt the
CPU). It needs some testing to make sure that actually works though. If
it does work, then we have a way to inject interrupts on TLMM without
worry that the TLMM hardware will also see the interrupt.

Is there a good way to test an interrupt to see if it's edge or level
type configured? And is it really a problem to make PDC the hierarchical
parent of TLMM here so that PDC can intercept the type and wake state of
the GPIO irq? Plus there's the part where a GIC SPI interrupt runs for
some GPIO irq, and that needs to be decoded to figure out which GPIO it
is for and if it should be replayed or not. Maybe all types of GPIO irqs
can be replayed and if it's a level type interrupt we waste some time
handling the PDC interrupt just to do nothing besides forward what would
presumably already work without PDC intervention.

Re: [PATCH v2 2/2] PCI: meson: add the Amlogic Meson PCIe controller driver

2018-08-24 Thread Jerome Brunet

On Fri, 2018-08-24 at 15:36 +0800, Hanjie Lin wrote:
> From: Yue Wang 
> 
> The Amlogic Meson PCIe host controller is based on the Synopsys DesignWare
> PCI core. This patch adds the driver support for Meson PCIe controller.
> 
> Signed-off-by: Yue Wang 
> Signed-off-by: Hanjie Lin 
> ---
>  drivers/pci/controller/dwc/Kconfig |  12 +
>  drivers/pci/controller/dwc/Makefile|   1 +
>  drivers/pci/controller/dwc/pci-meson.c | 613 
> +
>  3 files changed, 626 insertions(+)
>  create mode 100644 drivers/pci/controller/dwc/pci-meson.c
> 
> diff --git a/drivers/pci/controller/dwc/Kconfig 
> b/drivers/pci/controller/dwc/Kconfig
> index 91b0194..6cb36f6 100644
> --- a/drivers/pci/controller/dwc/Kconfig
> +++ b/drivers/pci/controller/dwc/Kconfig
> @@ -193,4 +193,16 @@ config PCIE_HISI_STB
>   help
>Say Y here if you want PCIe controller support on HiSilicon STB 
> SoCs
>  
> +config PCI_MESON
> + bool "MESON PCIe controller"
> + depends on PCI
> + depends on PCI_MSI_IRQ_DOMAIN
> + select PCIEPORTBUS
> + select PCIE_DW_HOST
> + help
> +   Say Y here if you want to enable PCI controller support on Amlogic
> +   SoCs. The PCI controller on Amlogic is based on DesignWare hardware
> +   and therefore the driver re-uses the DesignWare core functions to
> +   implement the driver.
> +
>  endmenu
> diff --git a/drivers/pci/controller/dwc/Makefile 
> b/drivers/pci/controller/dwc/Makefile
> index 5d2ce72..cf676bd 100644
> --- a/drivers/pci/controller/dwc/Makefile
> +++ b/drivers/pci/controller/dwc/Makefile
> @@ -14,6 +14,7 @@ obj-$(CONFIG_PCIE_ARMADA_8K) += pcie-armada8k.o
>  obj-$(CONFIG_PCIE_ARTPEC6) += pcie-artpec6.o
>  obj-$(CONFIG_PCIE_KIRIN) += pcie-kirin.o
>  obj-$(CONFIG_PCIE_HISI_STB) += pcie-histb.o
> +obj-$(CONFIG_PCI_MESON) += pci-meson.o
>  
>  # The following drivers are for devices that use the generic ACPI
>  # pci_root.c driver but don't support standard ECAM config access.
> diff --git a/drivers/pci/controller/dwc/pci-meson.c 
> b/drivers/pci/controller/dwc/pci-meson.c
> new file mode 100644
> index 000..a9edf20
> --- /dev/null
> +++ b/drivers/pci/controller/dwc/pci-meson.c
> @@ -0,0 +1,613 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * PCIe host controller driver for Amlogic MESON SoCs
> + *
> + * Copyright (c) 2018 Amlogic, inc.
> + * Author: Yue Wang 
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "pcie-designware.h"
> +
> +#define to_meson_pcie(x) dev_get_drvdata((x)->dev)
> +
> +/* External local bus interface registers */
> +#define PLR_OFFSET   0x700
> +#define PCIE_PORT_LINK_CTRL_OFF  (PLR_OFFSET + 0x10)
> +#define FAST_LINK_MODE   BIT(7)
> +#define LINK_CAPABLE_MASKGENMASK(21, 16)
> +#define LINK_CAPABLE_X1  BIT(16)
> +
> +#define PCIE_GEN2_CTRL_OFF   (PLR_OFFSET + 0x10c)
> +#define NUM_OF_LANES_MASKGENMASK(12, 8)
> +#define NUM_OF_LANES_X1  BIT(8)
> +#define DIRECT_SPEED_CHANGE  BIT(17)
> +
> +#define TYPE1_HDR_OFFSET 0x0
> +#define PCIE_STATUS_COMMAND  (TYPE1_HDR_OFFSET + 0x04)
> +#define PCI_IO_ENBIT(0)
> +#define PCI_MEM_SPACE_EN BIT(1)
> +#define PCI_BUS_MASTER_ENBIT(2)
> +
> +#define PCIE_BASE_ADDR0  (TYPE1_HDR_OFFSET + 0x10)
> +#define PCIE_BASE_ADDR1  (TYPE1_HDR_OFFSET + 0x14)
> +
> +#define PCIE_CAP_OFFSET  0x70
> +#define PCIE_DEV_CTRL_DEV_STUS   (PCIE_CAP_OFFSET + 0x08)
> +#define PCIE_CAP_MAX_PAYLOAD_MASKGENMASK(7, 5)
> +#define PCIE_CAP_MAX_PAYLOAD_SIZE(x) ((x) << 5)
> +#define PCIE_CAP_MAX_READ_REQ_MASK   GENMASK(14, 12)
> +#define PCIE_CAP_MAX_READ_REQ_SIZE(x)((x) << 12)
> +
> +#define PCI_CLASS_REVISION_MASK  GENMASK(7, 0)
> +
> +/* PCIe specific config registers */
> +#define PCIE_CFG00x0
> +#define APP_LTSSM_ENABLE BIT(7)
> +
> +#define PCIE_CFG_STATUS120x30
> +#define IS_SMLH_LINK_UP(x)   ((x) & (1 << 6))
> +#define IS_RDLH_LINK_UP(x)   ((x) & (1 << 16))
> +#define IS_LTSSM_UP(x)   x) >> 10) & 0x1f) == 0x11)
> +
> +#define PCIE_CFG_STATUS170x44
> +#define PM_CURRENT_STATE(x)  (((x) >> 7) & 0x1)
> +
> +#define WAIT_LINKUP_TIMEOUT  2000
> +#define PORT_CLK_RATE1UL
> +#define MAX_PAYLOAD_SIZE 256
> +#define MAX_READ_REQ_SIZE256
> +
> +enum pcie_data_rate {
> + PCIE_GEN1,
> + PCIE_GEN2,
> + PCIE_GEN3,
> + PCIE_GEN4
> +};
> +
> +struct meson_pcie_mem_res {
> + void __iomem *elbi_base; /* DT 0th resource */
> + void __iomem *cfg_base; /* DT 2nd resource */
> +};
> +
> +struct meson_pcie_clk_res {
> +

Re: [PATCH v2 1/2] dt-bindings: PCI: meson: add DT bindings for Amlogic Meson PCIe controller

2018-08-24 Thread Jerome Brunet

On Fri, 2018-08-24 at 15:36 +0800, Hanjie Lin wrote:
> From: Yue Wang 
> 
> The Amlogic Meson PCIe host controller is based on the Synopsys DesignWare
> PCI core. This patch adds documentation for the DT bindings in Meson PCIe
> controller.
> 
> Signed-off-by: Yue Wang 
> Signed-off-by: Hanjie Lin 
> ---
>  .../devicetree/bindings/pci/amlogic,meson-pcie.txt | 63 
> ++
>  1 file changed, 63 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/pci/amlogic,meson-pcie.txt
> 
> diff --git a/Documentation/devicetree/bindings/pci/amlogic,meson-pcie.txt 
> b/Documentation/devicetree/bindings/pci/amlogic,meson-pcie.txt
> new file mode 100644
> index 000..8a831d1
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/pci/amlogic,meson-pcie.txt
> @@ -0,0 +1,63 @@
> +Amlogic Meson AXG DWC PCIE SoC controller
> +
> +Amlogic Meson PCIe host controller is based on the Synopsys DesignWare PCI 
> core.
> +It shares common functions with the PCIe DesignWare core driver and
> +inherits common properties defined in
> +Documentation/devicetree/bindings/pci/designware-pci.txt.
> +
> +Additional properties are described here:
> +
> +Required properties:
> +- compatible:
> + should contain "amlogic,axg-pcie" to identify the core.
> +- reg:
> + Should contain the configuration address space.
> +- reg-names: Must be
> + - "elbi"External local bus interface registers
> + - "cfg" Meson specific registers
> + - "config"  PCIe configuration space
> +- reset-gpios: The GPIO to generate PCIe PERST# assert and deassert signal.
> +- clocks: Must contain an entry for each entry in clock-names.
> +- clock-names: Must include the following entries:
> + - "pclk"   PCIe GEN 100M PLL clock
> + - "port"   PCIe_x(A or B) RC clock gate
> + - "general"PCIe Phy clock
> + - "mipi"   PCIe_x(A or B) 100M ref clock gate
> +- resets: phandle to the reset lines.
> +- reset-names: must contain "phy" and "peripheral"
> +   - "port" Port A or B reset
> +   - "apb" APB reset

The above description is not coherent (phy <=> port)

> +
> +Example configuration:
> +
> + pcie: pcie@f980 {
> + compatible = "amlogic,axg-pcie", "snps,dw-pcie";
> + reg = <0x0 0xf980 0x0 0x40
> + 0x0 0xff646000 0x0 0x2000
> + 0x0 0xf9f0 0x0 0x10>;
> + reg-names = "elbi", "cfg", "config";
> + reset-gpios = <&gpio GPIOX_19 GPIO_ACTIVE_HIGH>;
> + interrupts = ;
> + #interrupt-cells = <1>;
> + interrupt-map-mask = <0 0 0 0>;
> + interrupt-map = <0 0 0 0 &gic GIC_SPI 179 
> IRQ_TYPE_EDGE_RISING>;
> + bus-range = <0x0 0xff>;
> + #address-cells = <3>;
> + #size-cells = <2>;
> + device_type = "pci";

Not described above - is it even used ?

> + phys = <&pcie_phy>;

Not documented and not necessary. Please remove this.

> + ranges = <0x8200 0 0 0x0 0xf9c0 0 0x0030>;
> +
> + clocks = <&clkc CLKID_USB
> + &clkc CLKID_MIPI_ENABLE
> + &clkc CLKID_PCIE_A
> + &clkc CLKID_PCIE_CML_EN0>;
> + clock-names = "general",
> + "mipi",
> + "pclk",
> + "port";
> + resets = <&reset RESET_PCIE_A>,
> + <&reset RESET_PCIE_APB>;
> + reset-names = "port",
> + "apb";
> + };

Re: [PATCH 1/2] Revert "x86/e820: put !E820_TYPE_RAM regions into memblock.reserved"

2018-08-24 Thread Michal Hocko

On Fri 24-08-18 00:03:25, Naoya Horiguchi wrote:
> (CCed related people)

Fixup Pavel email.

> 
> Hi Mizuma-san,
> 
> Thank you for the report.
> The mentioned patch was created based on feedbacks from reviewers/maintainers,
> so I'd like to hear from them about how we should handle the issue.
> 
> And one note is that there is a follow-up patch for "x86/e820: put 
> !E820_TYPE_RAM
> regions into memblock.reserved" which might be affected by your changes.
> 
> > commit e181ae0c5db9544de9c53239eb22bc012ce75033
> > Author: Pavel Tatashin 
> > Date:   Sat Jul 14 09:15:07 2018 -0400
> > 
> > mm: zero unavailable pages before memmap init
> 
> Thanks,
> Naoya Horiguchi
> 
> On Thu, Aug 23, 2018 at 02:25:12PM -0400, Masayoshi Mizuma wrote:
> > From: Masayoshi Mizuma 
> > 
> > commit 124049decbb1 ("x86/e820: put !E820_TYPE_RAM regions into
> > memblock.reserved") breaks movable_node kernel option because it
> > changed the memory gap range to reserved memblock. So, the node
> > is marked as Normal zone even if the SRAT has Hot plaggable affinity.
> > 
> > =
> > kernel: BIOS-e820: [mem 0x1800-0x180f] usable
> > kernel: BIOS-e820: [mem 0x1c00-0x1c0f] usable
> > ...
> > kernel: reserved[0x12]#011[0x1810-0x1bff], 
> > 0x03f0 bytes flags: 0x0
> > ...
> > kernel: ACPI: SRAT: Node 2 PXM 6 [mem 0x1800-0x1bff] 
> > hotplug
> > kernel: ACPI: SRAT: Node 3 PXM 7 [mem 0x1c00-0x1fff] 
> > hotplug
> > ...
> > kernel: Movable zone start for each node
> > kernel:  Node 3: 0x1c00
> > kernel: Early memory node ranges
> > ...
> > =
> > 
> > Naoya's v1 patch [*] fixes the original issue and this movable_node
> > issue doesn't occur.
> > Let's revert commit 124049decbb1 ("x86/e820: put !E820_TYPE_RAM
> > regions into memblock.reserved") and apply the v1 patch.
> > 
> > [*] https://lkml.org/lkml/2018/6/13/27
> > 
> > Signed-off-by: Masayoshi Mizuma 
> > ---
> >  arch/x86/kernel/e820.c | 15 +++
> >  1 file changed, 3 insertions(+), 12 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> > index c88c23c658c1..d1f25c831447 100644
> > --- a/arch/x86/kernel/e820.c
> > +++ b/arch/x86/kernel/e820.c
> > @@ -1248,7 +1248,6 @@ void __init e820__memblock_setup(void)
> >  {
> > int i;
> > u64 end;
> > -   u64 addr = 0;
> >  
> > /*
> >  * The bootstrap memblock region count maximum is 128 entries
> > @@ -1265,21 +1264,13 @@ void __init e820__memblock_setup(void)
> > struct e820_entry *entry = &e820_table->entries[i];
> >  
> > end = entry->addr + entry->size;
> > -   if (addr < entry->addr)
> > -   memblock_reserve(addr, entry->addr - addr);
> > -   addr = end;
> > if (end != (resource_size_t)end)
> > continue;
> >  
> > -   /*
> > -* all !E820_TYPE_RAM ranges (including gap ranges) are put
> > -* into memblock.reserved to make sure that struct pages in
> > -* such regions are not left uninitialized after bootup.
> > -*/
> > if (entry->type != E820_TYPE_RAM && entry->type != 
> > E820_TYPE_RESERVED_KERN)
> > -   memblock_reserve(entry->addr, entry->size);
> > -   else
> > -   memblock_add(entry->addr, entry->size);
> > +   continue;
> > +
> > +   memblock_add(entry->addr, entry->size);
> > }
> >  
> > /* Throw away partial pages: */
> > -- 
> > 2.18.0
> > 
> > 

-- 
Michal Hocko
SUSE Labs

Re: [PATCH v8 07/26] PM / Domains: Add genpd governor for CPUs

2018-08-24 Thread Ulf Hansson

On 6 August 2018 at 11:20, Rafael J. Wysocki  wrote:
> On Fri, Aug 3, 2018 at 4:28 PM, Ulf Hansson  wrote:
>> On 26 July 2018 at 11:14, Rafael J. Wysocki  wrote:
>>> On Thursday, July 19, 2018 12:32:52 PM CEST Rafael J. Wysocki wrote:
 On Wednesday, June 20, 2018 7:22:07 PM CEST Ulf Hansson wrote:
 > As it's now perfectly possible that a PM domain managed by genpd contains
 > devices belonging to CPUs, we should start to take into account the
 > residency values for the idle states during the state selection process.
 > The residency value specifies the minimum duration of time, the CPU or a
 > group of CPUs, needs to spend in an idle state to not waste energy 
 > entering
 > it.
 >
 > To deal with this, let's add a new genpd governor, pm_domain_cpu_gov, 
 > that
 > may be used for a PM domain that have CPU devices attached or if the CPUs
 > are attached through subdomains.
 >
 > The new governor computes the minimum expected idle duration time for the
 > online CPUs being attached to the PM domain and its subdomains. Then in 
 > the
 > state selection process, trying the deepest state first, it verifies that
 > the idle duration time satisfies the state's residency value.
 >
 > It should be noted that, when computing the minimum expected idle 
 > duration
 > time, we use the information from tick_nohz_get_next_wakeup(), to find 
 > the
 > next wakeup for the related CPUs. Future wise, this may deserve to be
 > improved, as there are more reasons to why a CPU may be woken up from 
 > idle.
 >
 > Cc: Thomas Gleixner 
 > Cc: Daniel Lezcano 
 > Cc: Lina Iyer 
 > Cc: Frederic Weisbecker 
 > Cc: Ingo Molnar 
 > Co-developed-by: Lina Iyer 
 > Signed-off-by: Ulf Hansson 
 > ---
 >  drivers/base/power/domain_governor.c | 58 
 >  include/linux/pm_domain.h|  2 +
 >  2 files changed, 60 insertions(+)
 >
 > diff --git a/drivers/base/power/domain_governor.c 
 > b/drivers/base/power/domain_governor.c
 > index 99896fbf18e4..1aad55719537 100644
 > --- a/drivers/base/power/domain_governor.c
 > +++ b/drivers/base/power/domain_governor.c
 > @@ -10,6 +10,9 @@
 >  #include 
 >  #include 
 >  #include 
 > +#include 
 > +#include 
 > +#include 
 >
 >  static int dev_update_qos_constraint(struct device *dev, void *data)
 >  {
 > @@ -245,6 +248,56 @@ static bool always_on_power_down_ok(struct 
 > dev_pm_domain *domain)
 > return false;
 >  }
 >
 > +static bool cpu_power_down_ok(struct dev_pm_domain *pd)
 > +{
 > +   struct generic_pm_domain *genpd = pd_to_genpd(pd);
 > +   ktime_t domain_wakeup, cpu_wakeup;
 > +   s64 idle_duration_ns;
 > +   int cpu, i;
 > +
 > +   if (!(genpd->flags & GENPD_FLAG_CPU_DOMAIN))
 > +   return true;
 > +
 > +   /*
 > +* Find the next wakeup for any of the online CPUs within the PM 
 > domain
 > +* and its subdomains. Note, we only need the genpd->cpus, as it 
 > already
 > +* contains a mask of all CPUs from subdomains.
 > +*/
 > +   domain_wakeup = ktime_set(KTIME_SEC_MAX, 0);
 > +   for_each_cpu_and(cpu, genpd->cpus, cpu_online_mask) {
 > +   cpu_wakeup = tick_nohz_get_next_wakeup(cpu);
 > +   if (ktime_before(cpu_wakeup, domain_wakeup))
 > +   domain_wakeup = cpu_wakeup;
 > +   }
>>>
>>> Here's a concern I have missed before. :-/
>>>
>>> Say, one of the CPUs you're walking here is woken up in the meantime.
>>
>> Yes, that can happen - when we miss-predicted "next wakeup".
>>
>>>
>>> I don't think it is valid to evaluate tick_nohz_get_next_wakeup() for it 
>>> then
>>> to update domain_wakeup.  We really should just avoid the domain power off 
>>> in
>>> that case at all IMO.
>>
>> Correct.
>>
>> However, we also want to avoid locking contentions in the idle path,
>> which is what this boils done to.
>
> This already is done under genpd_lock() AFAICS, so I'm not quite sure
> what exactly you mean.
>
> Besides, this is not just about increased latency, which is a concern
> by itself but maybe not so much in all environments, but also about
> possibility of missing a CPU wakeup, which is a major issue.
>
> If one of the CPUs sharing the domain with the current one is woken up
> during cpu_power_down_ok() and the wakeup is an edge-triggered
> interrupt and the domain is turned off regardless, the wakeup may be
> missed entirely if I'm not mistaken.
>
> It looks like there needs to be a way for the hardware to prevent a
> domain poweroff when there's a pending interrupt or I don't quite see
> how this can be handled correctly.

Well, the job of genpd and its new cpu governor is not directly to
power off the PM domain, but rather to try to select/promote an idle
state for it. Alo

Re: [PATCH 3/4] mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE

2018-08-24 Thread Peter Zijlstra

On Thu, Aug 23, 2018 at 02:54:20PM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2018-08-22 at 20:59 -0700, Linus Torvalds wrote:

> > The problem is that x86 _used_ to do this all correctly long long ago.
> > 
> > And then we switched over to the "generic" table flushing (which
> > harkens back to the powerpc code).
> 
> Yes, we wrote it the RCU stuff to solve the races with SW walking,
> which is completely orthogonal with HW walking & TLB content. We didn't
> do the move to generic code though ;-)
> 
> > Which actually turned out to be not generic at all, and did not flush
> > the internal pages like x86 used to (back when x86 just used
> > tlb_remove_page for everything).
> 
> Well, having RCU do the flushing is rather generic, it makes sense
> whenever there's somebody doing a SW walk *and* you don't have IPIs to
> synchronize your flushes (ie, anybody with HW TLB invalidation
> broadcast basically, so ARM and us).

Right, so (many many years ago) I moved it over to generic code because
Sparc-hash wanted fast_gup and I figured having multiple copies of this
stuff wasn't ideal.

Then ARM came along and used it because it does the invalidate
broadcast.

And then when we switched x86 over last year or so; because paravirt; I
had long since forgotten all details and completely overlooked this.

Worse; somewhere along the line we tried to get s390 on this and they
ran into the exact problem being fixed now. That _should_ have been a
big clue, but somehow I never got around to thinking about it properly
and they went back to a private copy of all this.

So double fail on me I suppose :-/

Anyway, its sorted now; although I'd like to write me a fairly big
comment in asm-generic/tlb.h about things, before I forget again.

[PATCH] riscv: move GCC version check for ARCH_SUPPORTS_INT128 to Kconfig

2018-08-24 Thread Masahiro Yamada

This becomes much neater in Kconfig.

Signed-off-by: Masahiro Yamada 
---

 arch/riscv/Kconfig  | 1 +
 arch/riscv/Makefile | 2 --
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index a344980..ed81df4 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -112,6 +112,7 @@ config ARCH_RV32I
 config ARCH_RV64I
bool "RV64I"
select 64BIT
+   select ARCH_SUPPORTS_INT128 if GCC_VERSION >= 5
select HAVE_FUNCTION_TRACER
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FTRACE_MCOUNT_RECORD
diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index 61ec424..33700e4 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -25,8 +25,6 @@ ifeq ($(CONFIG_ARCH_RV64I),y)
 
KBUILD_CFLAGS += -mabi=lp64
KBUILD_AFLAGS += -mabi=lp64
-   
-   KBUILD_CFLAGS   += $(call cc-ifversion, -ge, 0500, 
-DCONFIG_ARCH_SUPPORTS_INT128)
 
KBUILD_MARCH = rv64im
KBUILD_LDFLAGS += -melf64lriscv
-- 
2.7.4

Re: [PATCH v6 1/2] mm: migration: fix migration of huge PMD shared pages

2018-08-24 Thread Michal Hocko

On Thu 23-08-18 13:59:16, Mike Kravetz wrote:
> The page migration code employs try_to_unmap() to try and unmap the
> source page.  This is accomplished by using rmap_walk to find all
> vmas where the page is mapped.  This search stops when page mapcount
> is zero.  For shared PMD huge pages, the page map count is always 1
> no matter the number of mappings.  Shared mappings are tracked via
> the reference count of the PMD page.  Therefore, try_to_unmap stops
> prematurely and does not completely unmap all mappings of the source
> page.
> 
> This problem can result is data corruption as writes to the original
> source page can happen after contents of the page are copied to the
> target page.  Hence, data is lost.
> 
> This problem was originally seen as DB corruption of shared global
> areas after a huge page was soft offlined due to ECC memory errors.
> DB developers noticed they could reproduce the issue by (hotplug)
> offlining memory used to back huge pages.  A simple testcase can
> reproduce the problem by creating a shared PMD mapping (note that
> this must be at least PUD_SIZE in size and PUD_SIZE aligned (1GB on
> x86)), and using migrate_pages() to migrate process pages between
> nodes while continually writing to the huge pages being migrated.
> 
> To fix, have the try_to_unmap_one routine check for huge PMD sharing
> by calling huge_pmd_unshare for hugetlbfs huge pages.  If it is a
> shared mapping it will be 'unshared' which removes the page table
> entry and drops the reference on the PMD page.  After this, flush
> caches and TLB.
> 
> mmu notifiers are called before locking page tables, but we can not
> be sure of PMD sharing until page tables are locked.  Therefore,
> check for the possibility of PMD sharing before locking so that
> notifiers can prepare for the worst possible case.
> 
> Fixes: 39dde65c9940 ("shared page table for hugetlb page")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Mike Kravetz 

Acked-by: Michal Hocko 

One nit below.

[...]
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 3103099f64fd..a73c5728e961 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4548,6 +4548,9 @@ static unsigned long page_table_shareable(struct 
> vm_area_struct *svma,
>   return saddr;
>  }
>  
> +#define _range_in_vma(vma, start, end) \
> + ((vma)->vm_start <= (start) && (end) <= (vma)->vm_end)
> +

static inline please. Macros and potential side effects on given
arguments are just not worth the risk. I also think this is something
for more general use. We have that pattern at many places. So I would
stick that to linux/mm.h

Thanks!
-- 
Michal Hocko
SUSE Labs

Re: [PATCH 2/4] mm/tlb: Remove tlb_remove_table() non-concurrent condition

2018-08-24 Thread Peter Zijlstra

On Wed, Aug 22, 2018 at 09:54:48PM -0700, Linus Torvalds wrote:

> It honored it for the *normal* case, which is why it took so long to
> notice that the TLB shootdown had been broken on x86 when it moved to
> the "generic" code. The *normal* case does this all right, and batches
> things up, and then when the batch fills up it does a
> tlb_table_flush() which does the TLB flush and schedules the actual
> freeing.
> 
> But there were two cases that *didn't* do that. The special "I'm the
> only thread" fast case, and the "oops I ran out of memory, so now I'll
> fake it, and just synchronize with page twalkers manually, and then do
> that special direct remove without flushing the tlb".

The actual RCU batching case was also busted; there was no guarantee
that by the time we run the RCU callbacks the invalidate would've
happened. Exceedingly unlikely, but no guarantee.

So really, all 3 cases in tlb_remove_table() were busted in this
respect.

Re: [PATCH 3/4] mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE

2018-08-24 Thread Peter Zijlstra

On Thu, Aug 23, 2018 at 02:39:59PM +0100, Will Deacon wrote:
> The only problem with this approach is that we've lost track of the granule
> size by the point we get to the tlb_flush(), so we can't adjust the stride of
> the TLB invalidations for huge mappings, which actually works nicely in the
> synchronous case (e.g. we perform a single invalidation for a 2MB mapping,
> rather than iterating over it at a 4k granule).
> 
> One thing we could do is switch to synchronous mode if we detect a change in
> granule (i.e. treat it like a batch failure).

We could use tlb_start_vma() to track that, I think. Shouldn't be too
hard.

Re: [PATCH] mm/hugetlb: filter out hugetlb pages if HUGEPAGE migration is not supported.

2018-08-24 Thread Naoya Horiguchi

On Fri, Aug 24, 2018 at 09:58:15AM +0200, Michal Hocko wrote:
> On Fri 24-08-18 12:03:14, Aneesh Kumar K.V wrote:
> > When scanning for movable pages, filter out Hugetlb pages if hugepage 
> > migration
> > is not supported. Without this we hit infinte loop in __offline pages where 
> > we
> > do
> > pfn = scan_movable_pages(start_pfn, end_pfn);
> > if (pfn) { /* We have movable pages */
> > ret = do_migrate_range(pfn, end_pfn);
> > goto repeat;
> > }
> > 
> > We do support hugetlb migration ony if the hugetlb pages are at pmd level. 
> > Here
> > we just check for Kernel config. The gigantic page size check is done in
> > page_huge_active.
> 
> Well, this is a bit misleading. I would say that
> 
> Fix this by checking hugepage_migration_supported both in has_unmovable_pages
> which is the primary backoff mechanism for page offlining and for
> consistency reasons also into scan_movable_pages because it doesn't make
> any sense to return a pfn to non-migrateable huge page.
> 
> > Acked-by: Michal Hocko 
> > Reported-by: Haren Myneni 
> > CC: Naoya Horiguchi 
> > Signed-off-by: Aneesh Kumar K.V 
> 
> I would add
> Fixes: 72b39cfc4d75 ("mm, memory_hotplug: do not fail offlining too early")
> 
> Not because the bug has been introduced by that commit but rather
> because the issue would be latent before that commit.
> 
> My Acked-by still holds.

Looks good to me (with Michal's update on description).

Reviewed-by: Naoya Horiguchi 

> 
> > ---
> >  mm/memory_hotplug.c | 3 ++-
> >  mm/page_alloc.c | 4 
> >  2 files changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > index 9eea6e809a4e..38d94b703e9d 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -1333,7 +1333,8 @@ static unsigned long scan_movable_pages(unsigned long 
> > start, unsigned long end)
> > if (__PageMovable(page))
> > return pfn;
> > if (PageHuge(page)) {
> > -   if (page_huge_active(page))
> > +   if 
> > (hugepage_migration_supported(page_hstate(page)) &&
> > +   page_huge_active(page))
> > return pfn;
> > else
> > pfn = round_up(pfn + 1,
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index c677c1506d73..b8d91f59b836 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -7709,6 +7709,10 @@ bool has_unmovable_pages(struct zone *zone, struct 
> > page *page, int count,
> >  * handle each tail page individually in migration.
> >  */
> > if (PageHuge(page)) {
> > +
> > +   if (!hugepage_migration_supported(page_hstate(page)))
> > +   goto unmovable;
> > +
> > iter = round_up(iter + 1, 1< > continue;
> > }
> > -- 
> > 2.17.1
> 
> -- 
> Michal Hocko
> SUSE Labs
>

Re: [perf] perf_event.h ABI visibility question

2018-08-24 Thread Peter Zijlstra

On Thu, Aug 23, 2018 at 02:25:06PM -0400, Vince Weaver wrote:
> 
> I notice that Linux 4.18 has the following changeset which changes the
> user visible perf_event.h file
> 
>   commit 6cbc304f2f360f25cc8607817239d6f4a2fd3dc5
>   Author: Peter Zijlstra 
>   Date:   Thu May 10 15:48:41 2018 +0200
> 
> perf/x86/intel: Fix unwind errors from PEBS entries (mk-II)
> 
> which contains
> 
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -143,6 +143,8 @@ enum perf_event_sample_format {
> PERF_SAMPLE_PHYS_ADDR   = 1U << 19,
>  
> PERF_SAMPLE_MAX = 1U << 20, /* non-ABI */
> +
> +   __PERF_SAMPLE_CALLCHAIN_EARLY   = 1ULL << 63,
>  };
> 
> 
> Is this supposed to be a user-visible interface?
> 
> I realize that if the user tries to set anything above PERF_SAMPLE_MAX
> it will be caught and flagged as EINVAL.
> 
> However even with the double-underscore hint in 
> __PERF_SAMPLE_CALLCHAIN_EARLY the value is still in the user-visible 
> header so it's now part of the ABI and I guess the manpage has to document it.

Hurphm.. visible yes, but as you say, also quite useless. Does it really
make sense to document that?

Re: [PATCH] perf: Force USER_DS when recording user stack data.

2018-08-24 Thread Peter Zijlstra

On Thu, Aug 23, 2018 at 03:59:35PM -0700, Yabin Cui wrote:
> Perf can record user stack data in response to a synchronous request, such
> as a tracepoint firing. If this happens under set_fs(KERNEL_DS), then we
> end up reading user stack data using __copy_from_user_inatomic() under
> set_fs(KERNEL_DS). I think this conflicts with the intention of using
> set_fs(KERNEL_DS). And it is explicitly forbidden by hardware on ARM64
> when both CONFIG_ARM64_UAO and CONFIG_ARM64_PAN are used.
> 
> So fix this by forcing USER_DS when recording user stack data.
> 
> Signed-off-by: Yabin Cui 

Ingo, I think this wants a stable tag too; seems to be a corrolary of:

  88b0193d9418 ("perf/callchain: Force USER_DS when invoking 
perf_callchain_user()")

Acked-by: Peter Zijlstra (Intel) 

> ---
>  kernel/events/core.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 2a62b96600ad..9bc047421e75 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -5948,6 +5948,7 @@ perf_output_sample_ustack(struct perf_output_handle 
> *handle, u64 dump_size,
>   unsigned long sp;
>   unsigned int rem;
>   u64 dyn_size;
> + mm_segment_t fs;
>  
>   /*
>* We dump:
> @@ -5965,7 +5966,10 @@ perf_output_sample_ustack(struct perf_output_handle 
> *handle, u64 dump_size,
>  
>   /* Data. */
>   sp = perf_user_stack_pointer(regs);
> + fs = get_fs();
> + set_fs(USER_DS);
>   rem = __output_copy_user(handle, (void *) sp, dump_size);
> + set_fs(fs);
>   dyn_size = dump_size - rem;
>  
>   perf_output_skip(handle, rem);
> -- 
> 2.19.0.rc0.228.g281dcd1b4d0-goog
>

Re: [PATCH] x86/speculation/l1tf: fix off-by-one error when warning that system has too much RAM

2018-08-24 Thread George Anchev

On Thu, 23 Aug 2018 08:44:37 -0700 Andi Kleen wrote:

> Ah I see it's a client part with very large DIMMs
> and someone being very brave and using that much
> memory without ECC.

It is not about being "brave" but about being
informed. As of 2018 you can probably call "brave"
everyone who uses any modern computer. However this
machine was purchased in 2012. Consider what was known
then and what is known now. The motherboard ASUS
P8Z77-V does not support ECC memory. Still we needed
and paid for 32GB of RAM, not for 31.5.

--
George

Re: [PATCH v2 0/8] Tegra SDHCI support HS400 on Tegra210 and Tegra186

2018-08-24 Thread Thierry Reding

On Fri, Aug 10, 2018 at 09:13:57PM +0300, Aapo Vienamo wrote:
> Hi all,
> This series implements support for HS400 signaling on Tegra210 and
> Tegra186. This includes programming the DQS trimmer values, implementing
> enhanced strobe and HS400 delay line calibration.
> 
> This series depends on the "Tegra SDHCI add support for HS200 and UHS
> signaling" series.
> 
> Changelog:
> v2:
>   - Document in dt-bindings which controllers support HS400
>   - Use val instead of reg in tegra_sdhci_set_dqs_trim()
>   - Change "dt" to "DT" in "mmc: tegra: Parse and program DQS trim
> value" commit message
>   - Add spaces around << in tegra_sdhci_set_dqs_trim()
>   - Make the "mmc: tegra: Implement HS400 enhanced strobe" commit
> message more detailed
>   - Remove a debug print from tegra_sdhci_hs400_enhanced_strobe()
>   - Add blank lines around if-else-block in
> tegra_sdhci_hs400_enhanced_strobe()
>   - Use val instead of reg in tegra_sdhci_hs400_enhanced_strobe()
>   - Make commit message of "mmc: tegra: Implement HS400 delay line
> calibration" more detailed
> 
> Aapo Vienamo (8):
>   dt-bindings: mmc: Add DQS trim value to Tegra SDHCI
>   mmc: tegra: Parse and program DQS trim value
>   mmc: tegra: Implement HS400 enhanced strobe
>   mmc: tegra: Implement HS400 delay line calibration
>   arm64: dts: tegra186: Add SDMMC4 DQS trim value
>   arm64: dts: tegra210: Add SDMMC4 DQS trim value
>   arm64: dts: tegra186: Enable HS400
>   arm64: dts: tegra210: Enable HS400
> 
>  .../bindings/mmc/nvidia,tegra20-sdhci.txt  |  4 ++
>  arch/arm64/boot/dts/nvidia/tegra186.dtsi   |  2 +
>  arch/arm64/boot/dts/nvidia/tegra210.dtsi   |  2 +
>  drivers/mmc/host/sdhci-tegra.c | 84 
> +-
>  4 files changed, 89 insertions(+), 3 deletions(-)

Ulf, Adrian,

Aapo just reminded me of this small series that also has a dependency on
the UHS signalling series posted earlier. I think it's easiest if I just
stash this on top of the existing branch that I have and send this along
with the rest as part of a pull request early after v4.19-rc1.

Thierry


signature.asc
Description: PGP signature

[PATCH] Bluetooth: bt3c_cs: Fix obsolete function

2018-08-24 Thread Ding Xiang

simple_strtol and simple_strtoul are obsolete, both place
use kstrtoul instead.

Signed-off-by: Ding Xiang 
---
 drivers/bluetooth/bt3c_cs.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/bluetooth/bt3c_cs.c b/drivers/bluetooth/bt3c_cs.c
index 25b0cf9..5e4800d 100644
--- a/drivers/bluetooth/bt3c_cs.c
+++ b/drivers/bluetooth/bt3c_cs.c
@@ -449,7 +449,7 @@ static int bt3c_load_firmware(struct bt3c_info *info,
char *ptr = (char *) firmware;
char b[9];
unsigned int iobase, tmp;
-   unsigned long size, addr, fcs;
+   unsigned long size, addr, fcs, tn;
int i, err = 0;
 
iobase = info->p_dev->resource[0]->start;
@@ -490,7 +490,9 @@ static int bt3c_load_firmware(struct bt3c_info *info,
memset(b, 0, sizeof(b));
for (tmp = 0, i = 0; i < size; i++) {
memcpy(b, ptr + (i * 2) + 2, 2);
-   tmp += simple_strtol(b, NULL, 16);
+   if (kstrtoul(b, 16, &tn))
+   return -EINVAL;
+   tmp += tn;
}
 
if (((tmp + fcs) & 0xff) != 0xff) {
@@ -505,7 +507,9 @@ static int bt3c_load_firmware(struct bt3c_info *info,
memset(b, 0, sizeof(b));
for (i = 0; i < (size - 4) / 2; i++) {
memcpy(b, ptr + (i * 4) + 12, 4);
-   tmp = simple_strtoul(b, NULL, 16);
+   if (kstrtoul(b, 16, &tn))
+   return -EINVAL;
+   tmp += tn;
bt3c_put(iobase, tmp);
}
}
-- 
1.8.3.1

Re: [PATCH 1/4] mfd: sec-core: Add SPDX license identifiers

2018-08-24 Thread Lee Jones

On Tue, 07 Aug 2018, Krzysztof Kozlowski wrote:

> Replace GPL v2.0+ license statements with SPDX license identifiers.
> 
> Signed-off-by: Krzysztof Kozlowski 
> ---
>  drivers/mfd/sec-core.c  | 16 
>  drivers/mfd/sec-irq.c   | 16 
>  include/linux/mfd/samsung/core.h| 11 ++-
>  include/linux/mfd/samsung/irq.h | 10 ++
>  include/linux/mfd/samsung/rtc.h | 15 ++-
>  include/linux/mfd/samsung/s2mpa01.h |  7 +--
>  include/linux/mfd/samsung/s2mps11.h |  9 +
>  include/linux/mfd/samsung/s2mps13.h | 14 +-
>  include/linux/mfd/samsung/s2mps14.h | 14 +-
>  include/linux/mfd/samsung/s2mps15.h | 11 +--
>  include/linux/mfd/samsung/s2mpu02.h | 14 +-
>  include/linux/mfd/samsung/s5m8763.h | 10 ++
>  include/linux/mfd/samsung/s5m8767.h | 10 ++
>  13 files changed, 24 insertions(+), 133 deletions(-)

Applied, thanks.

-- 
Lee Jones [李琼斯]
Linaro Services Technical Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog

Re: [PATCH 2/4] mfd: maxim: Add SPDX license identifiers

2018-08-24 Thread Lee Jones

On Tue, 07 Aug 2018, Krzysztof Kozlowski wrote:

> Replace GPL v2.0+ license statements with SPDX license identifiers.
> 
> Signed-off-by: Krzysztof Kozlowski 
> ---
>  drivers/mfd/max14577.c   | 28 +---
>  drivers/mfd/max77686.c   | 32 +---
>  drivers/mfd/max77693.c   | 34 ++
>  drivers/mfd/max77843.c   | 19 +++
>  drivers/mfd/max8997-irq.c| 30 --
>  drivers/mfd/max8997.c| 30 --
>  drivers/mfd/max8998-irq.c| 18 ++
>  drivers/mfd/max8998.c| 28 +++-
>  include/linux/mfd/max14577-private.h | 11 +--
>  include/linux/mfd/max14577.h | 11 +--
>  include/linux/mfd/max77686-private.h | 15 +--
>  include/linux/mfd/max77686.h | 15 +--
>  include/linux/mfd/max77693-common.h  |  6 +-
>  include/linux/mfd/max77693-private.h | 15 +--
>  include/linux/mfd/max77693.h | 15 +--
>  include/linux/mfd/max77843-private.h |  6 +-
>  include/linux/mfd/max8997-private.h  | 15 +--
>  include/linux/mfd/max8997.h  | 15 +--
>  include/linux/mfd/max8998-private.h  | 15 +--
>  include/linux/mfd/max8998.h  | 15 +--
>  20 files changed, 76 insertions(+), 297 deletions(-)

Applied, thanks.

-- 
Lee Jones [李琼斯]
Linaro Services Technical Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog

[PATCH V6 0/9] mmc: add support for sdhci 4.0

2018-08-24 Thread Chunyan Zhang

>From the SD host controller version 4.0 on, SDHCI implementation either
is version 3 compatible or version 4 mode. This patch-set covers those
changes which are common for SDHCI 4.0 version, regardless of whether
they are used with SD or eMMC storage devices.

This patchset also added a new sdhci driver for Spreadtrum's controller
which supports v4.0 mode.

This patchset has been tested on Spreadtrum's mobile phone, emmc can be
initialized, mounted, read and written, with these changes for common
sdhci framework and sdhci-sprd driver.

This patchset is based on the unmerged patch: 
https://lkml.org/lkml/2018/8/20/140

Changes from V5:
- Added SDHCI_QUIRK2_BROKEN_32BIT_BLK_CNT;
- Removed sdhci_enable_cmd23();
- Added setting SDHCI_ARGUMENT2 for CMD23 in v4_mode;
- Clear SDHCI_CMD23_ENABLE when not using CMD23;
- Moved disabling auto-CMD23 to sprd sdhci driver;
- Added sdhci_sprd_request() function to hook to mmc_host_ops.request.


Previous patch series:
v5: https://lkml.org/lkml/2018/8/16/122
v4: https://lkml.org/lkml/2018/7/23/269
v3: https://lkml.org/lkml/2018/7/8/239
v2: https://lkml.org/lkml/2018/6/14/936
v1: https://lkml.org/lkml/2018/6/8/108

Chunyan Zhang (9):
  mmc: sdhci: Add version V4 definition
  mmc: sdhci: Add sd host v4 mode
  mmc: sdhci: Change SDMA address register for v4 mode
  mmc: sdhci: Add ADMA2 64-bit addressing support for V4 mode
  mmc: sdhci: Add 32-bit block count support for v4 mode
  mmc: sdhci: Add Auto CMD Auto Select support
  mmc: sdhci: SDMA may use Auto-CMD23 in v4 mode
  mmc: sdhci-sprd: Add Spreadtrum's initial host controller
  dt-bindings: sdhci-sprd: Add bindings for the sdhci-sprd controller

 .../devicetree/bindings/mmc/sdhci-sprd.txt |  41 ++
 drivers/mmc/host/Kconfig   |  13 +
 drivers/mmc/host/Makefile  |   1 +
 drivers/mmc/host/sdhci-sprd.c  | 485 +
 drivers/mmc/host/sdhci.c   | 225 --
 drivers/mmc/host/sdhci.h   |  23 +-
 6 files changed, 738 insertions(+), 50 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/mmc/sdhci-sprd.txt
 create mode 100644 drivers/mmc/host/sdhci-sprd.c

-- 
2.7.4

Re: [PATCH 3/4] mfd: sec-core: Fix indentation of Kconfig description

2018-08-24 Thread Lee Jones

On Tue, 07 Aug 2018, Krzysztof Kozlowski wrote:

> The indentation should be a tab followed by two spaces.
> 
> Signed-off-by: Krzysztof Kozlowski 
> ---
>  drivers/mfd/Kconfig | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)

Applied, thanks.

-- 
Lee Jones [李琼斯]
Linaro Services Technical Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog

[PATCH V6 1/9] mmc: sdhci: Add version V4 definition

2018-08-24 Thread Chunyan Zhang

Added definitions for v400, v410, v420.

Signed-off-by: Chunyan Zhang 
---
 drivers/mmc/host/sdhci.c | 2 +-
 drivers/mmc/host/sdhci.h | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 8793340..f70135c 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -3501,7 +3501,7 @@ int sdhci_setup_host(struct sdhci_host *host)
 
override_timeout_clk = host->timeout_clk;
 
-   if (host->version > SDHCI_SPEC_300) {
+   if (host->version > SDHCI_SPEC_420) {
pr_err("%s: Unknown controller version (%d). You may experience 
problems.\n",
   mmc_hostname(mmc), host->version);
}
diff --git a/drivers/mmc/host/sdhci.h b/drivers/mmc/host/sdhci.h
index 5db81de..7ae95f8 100644
--- a/drivers/mmc/host/sdhci.h
+++ b/drivers/mmc/host/sdhci.h
@@ -270,6 +270,9 @@
 #define   SDHCI_SPEC_100   0
 #define   SDHCI_SPEC_200   1
 #define   SDHCI_SPEC_300   2
+#define   SDHCI_SPEC_400   3
+#define   SDHCI_SPEC_410   4
+#define   SDHCI_SPEC_420   5
 
 /*
  * End of controller registers.
-- 
2.7.4

Re: [PATCH 4/4] mfd: sec-core: Allow building as module

2018-08-24 Thread Lee Jones

On Tue, 07 Aug 2018, Krzysztof Kozlowski wrote:

> The main MFD driver for Samsung PMICs (S2MPSXX, S5M876X) used with
> Exynos SoCs can be compiled and used as a module.  The dependent clock,
> regulator and RTC drivers already can be built as a module.
> 
> Building entire set of drivers as modules might require using initial
> ramdisk and can make booting process longer (due to probe deferrals).
> However adding such option is useful for testing and for multi-platform
> configurations.
> 
> This also add required module authors to sec-irq.c file based on recent
> main contributors.
> 
> Signed-off-by: Krzysztof Kozlowski 
> ---
>  drivers/mfd/Kconfig   | 11 +--
>  drivers/mfd/sec-irq.c |  8 
>  2 files changed, 17 insertions(+), 2 deletions(-)

Applied, thanks.

-- 
Lee Jones [李琼斯]
Linaro Services Technical Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog

[PATCH V6 2/9] mmc: sdhci: Add sd host v4 mode

2018-08-24 Thread Chunyan Zhang

For SD host controller version 4.00 or later ones, there're two
modes of implementation - Version 3.00 compatible mode or
Version 4 mode.  This patch introduced an interface to enable
v4 mode.

Signed-off-by: Chunyan Zhang 
---
 drivers/mmc/host/sdhci.c | 29 +
 drivers/mmc/host/sdhci.h |  3 +++
 2 files changed, 32 insertions(+)

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index f70135c..a50842c 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -123,6 +123,29 @@ EXPORT_SYMBOL_GPL(sdhci_dumpregs);
  *   *
 \*/
 
+static void sdhci_do_enable_v4_mode(struct sdhci_host *host)
+{
+   u16 ctrl2;
+
+   ctrl2 = sdhci_readb(host, SDHCI_HOST_CONTROL2);
+   if (ctrl2 & SDHCI_CTRL_V4_MODE)
+   return;
+
+   ctrl2 |= SDHCI_CTRL_V4_MODE;
+   sdhci_writeb(host, ctrl2, SDHCI_HOST_CONTROL);
+}
+
+/*
+ * This can be called before sdhci_add_host() by Vendor's host controller
+ * driver to enable v4 mode if supported.
+ */
+void sdhci_enable_v4_mode(struct sdhci_host *host)
+{
+   host->v4_mode = true;
+   sdhci_do_enable_v4_mode(host);
+}
+EXPORT_SYMBOL_GPL(sdhci_enable_v4_mode);
+
 static inline bool sdhci_data_line_cmd(struct mmc_command *cmd)
 {
return cmd->data || cmd->flags & MMC_RSP_BUSY;
@@ -252,6 +275,9 @@ static void sdhci_init(struct sdhci_host *host, int soft)
else
sdhci_do_reset(host, SDHCI_RESET_ALL);
 
+   if (host->v4_mode)
+   sdhci_do_enable_v4_mode(host);
+
sdhci_set_default_irqs(host);
 
host->cqe_on = false;
@@ -3371,6 +3397,9 @@ void __sdhci_read_caps(struct sdhci_host *host, u16 *ver, 
u32 *caps, u32 *caps1)
 
sdhci_do_reset(host, SDHCI_RESET_ALL);
 
+   if (host->v4_mode)
+   sdhci_do_enable_v4_mode(host);
+
of_property_read_u64(mmc_dev(host->mmc)->of_node,
 "sdhci-caps-mask", &dt_caps_mask);
of_property_read_u64(mmc_dev(host->mmc)->of_node,
diff --git a/drivers/mmc/host/sdhci.h b/drivers/mmc/host/sdhci.h
index 7ae95f8..131d869 100644
--- a/drivers/mmc/host/sdhci.h
+++ b/drivers/mmc/host/sdhci.h
@@ -184,6 +184,7 @@
 #define   SDHCI_CTRL_DRV_TYPE_D0x0030
 #define  SDHCI_CTRL_EXEC_TUNING0x0040
 #define  SDHCI_CTRL_TUNED_CLK  0x0080
+#define  SDHCI_CTRL_V4_MODE0x1000
 #define  SDHCI_CTRL_PRESET_VAL_ENABLE  0x8000
 
 #define SDHCI_CAPABILITIES 0x40
@@ -504,6 +505,7 @@ struct sdhci_host {
bool preset_enabled;/* Preset is enabled */
bool pending_reset; /* Cmd/data reset is pending */
bool irq_wake_enabled;  /* IRQ wakeup is enabled */
+   bool v4_mode;   /* Host Version 4 Enable */
 
struct mmc_request *mrqs_done[SDHCI_MAX_MRQS];  /* Requests done */
struct mmc_command *cmd;/* Current command */
@@ -751,5 +753,6 @@ bool sdhci_cqe_irq(struct sdhci_host *host, u32 intmask, 
int *cmd_error,
   int *data_error);
 
 void sdhci_dumpregs(struct sdhci_host *host);
+void sdhci_enable_v4_mode(struct sdhci_host *host);
 
 #endif /* __SDHCI_HW_H */
-- 
2.7.4

[PATCH V6 4/9] mmc: sdhci: Add ADMA2 64-bit addressing support for V4 mode

2018-08-24 Thread Chunyan Zhang

ADMA2 64-bit addressing support is divided into V3 mode and V4 mode.
So there are two kinds of descriptors for ADMA2 64-bit addressing
i.e. 96-bit Descriptor for V3 mode, and 128-bit Descriptor for V4
mode. 128-bit Descriptor is aligned to 8-byte.

For V4 mode, ADMA2 64-bit addressing is enabled via Host Control 2
register.

Signed-off-by: Chunyan Zhang 
---
 drivers/mmc/host/sdhci.c | 92 +++-
 drivers/mmc/host/sdhci.h | 12 +--
 2 files changed, 78 insertions(+), 26 deletions(-)

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index df283ca..38d083c 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -266,6 +266,52 @@ static void sdhci_set_default_irqs(struct sdhci_host *host)
sdhci_writel(host, host->ier, SDHCI_SIGNAL_ENABLE);
 }
 
+static void sdhci_config_dma(struct sdhci_host *host)
+{
+   u8 ctrl;
+   u16 ctrl2;
+
+   if (host->version < SDHCI_SPEC_200)
+   return;
+
+   ctrl = sdhci_readb(host, SDHCI_HOST_CONTROL);
+
+   /*
+* Always adjust the DMA selection as some controllers
+* (e.g. JMicron) can't do PIO properly when the selection
+* is ADMA.
+*/
+   ctrl &= ~SDHCI_CTRL_DMA_MASK;
+   if (!(host->flags & SDHCI_REQ_USE_DMA))
+   goto out;
+
+   /* Note if DMA Select is zero then SDMA is selected */
+   if (host->flags & SDHCI_USE_ADMA)
+   ctrl |= SDHCI_CTRL_ADMA32;
+
+   if (host->flags & SDHCI_USE_64_BIT_DMA) {
+   /*
+* If v4 mode, all supported DMA can be 64-bit addressing if
+* controller supports 64-bit system address, otherwise only
+* ADMA can support 64-bit addressing.
+*/
+   if (host->v4_mode) {
+   ctrl2 = sdhci_readw(host, SDHCI_HOST_CONTROL2);
+   ctrl2 |= SDHCI_CTRL_64BIT_ADDR;
+   sdhci_writew(host, ctrl2, SDHCI_HOST_CONTROL2);
+   } else if (host->flags & SDHCI_USE_ADMA) {
+   /*
+* Don't need to undo SDHCI_CTRL_ADMA32 in order to
+* set SDHCI_CTRL_ADMA64.
+*/
+   ctrl |= SDHCI_CTRL_ADMA64;
+   }
+   }
+
+out:
+   sdhci_writeb(host, ctrl, SDHCI_HOST_CONTROL);
+}
+
 static void sdhci_init(struct sdhci_host *host, int soft)
 {
struct mmc_host *mmc = host->mmc;
@@ -913,7 +959,6 @@ static void sdhci_set_timeout(struct sdhci_host *host, 
struct mmc_command *cmd)
 
 static void sdhci_prepare_data(struct sdhci_host *host, struct mmc_command 
*cmd)
 {
-   u8 ctrl;
struct mmc_data *data = cmd->data;
 
host->data_timeout = 0;
@@ -1009,25 +1054,7 @@ static void sdhci_prepare_data(struct sdhci_host *host, 
struct mmc_command *cmd)
}
}
 
-   /*
-* Always adjust the DMA selection as some controllers
-* (e.g. JMicron) can't do PIO properly when the selection
-* is ADMA.
-*/
-   if (host->version >= SDHCI_SPEC_200) {
-   ctrl = sdhci_readb(host, SDHCI_HOST_CONTROL);
-   ctrl &= ~SDHCI_CTRL_DMA_MASK;
-   if ((host->flags & SDHCI_REQ_USE_DMA) &&
-   (host->flags & SDHCI_USE_ADMA)) {
-   if (host->flags & SDHCI_USE_64_BIT_DMA)
-   ctrl |= SDHCI_CTRL_ADMA64;
-   else
-   ctrl |= SDHCI_CTRL_ADMA32;
-   } else {
-   ctrl |= SDHCI_CTRL_SDMA;
-   }
-   sdhci_writeb(host, ctrl, SDHCI_HOST_CONTROL);
-   }
+   sdhci_config_dma(host);
 
if (!(host->flags & SDHCI_REQ_USE_DMA)) {
int flags;
@@ -3504,6 +3531,19 @@ static int sdhci_allocate_bounce_buffer(struct 
sdhci_host *host)
return 0;
 }
 
+static inline bool sdhci_can_64bit_dma(struct sdhci_host *host)
+{
+   /*
+* According to SD Host Controller spec v4.10, bit[27] added from
+* version 4.10 in Capabilities Register is used as 64-bit System
+* Address support for V4 mode.
+*/
+   if (host->version >= SDHCI_SPEC_410 && host->v4_mode)
+   return host->caps & SDHCI_CAN_64BIT_V4;
+
+   return host->caps & SDHCI_CAN_64BIT;
+}
+
 int sdhci_setup_host(struct sdhci_host *host)
 {
struct mmc_host *mmc;
@@ -3575,7 +3615,7 @@ int sdhci_setup_host(struct sdhci_host *host)
 * SDHCI_QUIRK2_BROKEN_64_BIT_DMA must be left to the drivers to
 * implement.
 */
-   if (host->caps & SDHCI_CAN_64BIT)
+   if (sdhci_can_64bit_dma(host))
host->flags |= SDHCI_USE_64_BIT_DMA;
 
if (host->flags & (SDHCI_USE_SDMA | SDHCI_USE_ADMA)) {
@@ -3609,8 +3649,8 @@ int sdhci_setup_host(struct sdhci_host *host)
 */
if (host->fla

[PATCH V6 6/9] mmc: sdhci: Add Auto CMD Auto Select support

2018-08-24 Thread Chunyan Zhang

As SD Host Controller Specification v4.10 documents:
Host Controller Version 4.10 defines this "Auto CMD Auto Select" mode.
Selection of Auto CMD depends on setting of CMD23 Enable in the Host
Control 2 register which indicates whether card supports CMD23. If CMD23
Enable =1, Auto CMD23 is used and if CMD23 Enable =0, Auto CMD12 is
used. In case of Version 4.10 or later, use of Auto CMD Auto Select is
recommended rather than use of Auto CMD12 Enable or Auto CMD23
Enable.

This patch add this new mode support.

Signed-off-by: Chunyan Zhang 
---
 drivers/mmc/host/sdhci.c | 49 ++--
 drivers/mmc/host/sdhci.h |  2 ++
 2 files changed, 41 insertions(+), 10 deletions(-)

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 05f9fff..7e01601 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -1096,6 +1096,43 @@ static inline bool sdhci_auto_cmd12(struct sdhci_host 
*host,
   !mrq->cap_cmd_during_tfr;
 }
 
+static inline void sdhci_auto_cmd_select(struct sdhci_host *host,
+struct mmc_command *cmd,
+u16 *mode)
+{
+   bool use_cmd12 = sdhci_auto_cmd12(host, cmd->mrq) &&
+(cmd->opcode != SD_IO_RW_EXTENDED);
+   bool use_cmd23 = cmd->mrq->sbc && (host->flags & SDHCI_AUTO_CMD23);
+   u16 ctrl2;
+
+   /*
+* In case of Version 4.10 or later, use of 'Auto CMD Auto
+* Select' is recommended rather than use of 'Auto CMD12
+* Enable' or 'Auto CMD23 Enable'.
+*/
+   if (host->version >= SDHCI_SPEC_410 && (use_cmd12 || use_cmd23)) {
+   *mode |= SDHCI_TRNS_AUTO_SEL;
+
+   ctrl2 = sdhci_readw(host, SDHCI_HOST_CONTROL2);
+   if (use_cmd23)
+   ctrl2 |= SDHCI_CMD23_ENABLE;
+   else
+   ctrl2 &= ~SDHCI_CMD23_ENABLE;
+   sdhci_writew(host, ctrl2, SDHCI_HOST_CONTROL2);
+
+   return;
+   }
+
+   /*
+* If we are sending CMD23, CMD12 never gets sent
+* on successful completion (so no Auto-CMD12).
+*/
+   if (use_cmd12)
+   *mode |= SDHCI_TRNS_AUTO_CMD12;
+   else if (use_cmd23)
+   *mode |= SDHCI_TRNS_AUTO_CMD23;
+}
+
 static void sdhci_set_transfer_mode(struct sdhci_host *host,
struct mmc_command *cmd)
 {
@@ -1122,17 +1159,9 @@ static void sdhci_set_transfer_mode(struct sdhci_host 
*host,
 
if (mmc_op_multi(cmd->opcode) || data->blocks > 1) {
mode = SDHCI_TRNS_BLK_CNT_EN | SDHCI_TRNS_MULTI;
-   /*
-* If we are sending CMD23, CMD12 never gets sent
-* on successful completion (so no Auto-CMD12).
-*/
-   if (sdhci_auto_cmd12(host, cmd->mrq) &&
-   (cmd->opcode != SD_IO_RW_EXTENDED))
-   mode |= SDHCI_TRNS_AUTO_CMD12;
-   else if (cmd->mrq->sbc && (host->flags & SDHCI_AUTO_CMD23)) {
-   mode |= SDHCI_TRNS_AUTO_CMD23;
+   sdhci_auto_cmd_select(host, cmd, &mode);
+   if (cmd->mrq->sbc && (host->flags & SDHCI_AUTO_CMD23))
sdhci_writel(host, cmd->mrq->sbc->arg, SDHCI_ARGUMENT2);
-   }
}
 
if (data->flags & MMC_DATA_READ)
diff --git a/drivers/mmc/host/sdhci.h b/drivers/mmc/host/sdhci.h
index 0a1e25f..4913d75 100644
--- a/drivers/mmc/host/sdhci.h
+++ b/drivers/mmc/host/sdhci.h
@@ -42,6 +42,7 @@
 #define  SDHCI_TRNS_BLK_CNT_EN 0x02
 #define  SDHCI_TRNS_AUTO_CMD12 0x04
 #define  SDHCI_TRNS_AUTO_CMD23 0x08
+#define  SDHCI_TRNS_AUTO_SEL   0x0C
 #define  SDHCI_TRNS_READ   0x10
 #define  SDHCI_TRNS_MULTI  0x20
 
@@ -185,6 +186,7 @@
 #define   SDHCI_CTRL_DRV_TYPE_D0x0030
 #define  SDHCI_CTRL_EXEC_TUNING0x0040
 #define  SDHCI_CTRL_TUNED_CLK  0x0080
+#define  SDHCI_CMD23_ENABLE0x0800
 #define  SDHCI_CTRL_V4_MODE0x1000
 #define  SDHCI_CTRL_64BIT_ADDR 0x2000
 #define  SDHCI_CTRL_PRESET_VAL_ENABLE  0x8000
-- 
2.7.4

[PATCH V6 5/9] mmc: sdhci: Add 32-bit block count support for v4 mode

2018-08-24 Thread Chunyan Zhang

Host Controller Version 4.10 re-defines SDMA System Address register
as 32-bit Block Count for v4 mode, and SDMA uses ADMA System
Address register (05Fh-058h) instead if v4 mode is enabled. Also
when using 32-bit block count, 16-bit block count register need
to be set to zero.

Since using 32-bit Block Count would cause problems for auto-cmd23,
it can be chosen via host->quirk2.

Signed-off-by: Chunyan Zhang 
---
 drivers/mmc/host/sdhci.c | 15 ++-
 drivers/mmc/host/sdhci.h |  3 +++
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 38d083c..05f9fff 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -1073,7 +1073,20 @@ static void sdhci_prepare_data(struct sdhci_host *host, 
struct mmc_command *cmd)
/* Set the DMA boundary value and block size */
sdhci_writew(host, SDHCI_MAKE_BLKSZ(host->sdma_boundary, data->blksz),
 SDHCI_BLOCK_SIZE);
-   sdhci_writew(host, data->blocks, SDHCI_BLOCK_COUNT);
+
+   /*
+* For Version 4.10 onwards, if v4 mode is enabled, 16-bit Block Count
+* register need to be set to zero, 32-bit Block Count register would
+* be selected.
+*/
+   if (host->version >= SDHCI_SPEC_410 && host->v4_mode &&
+   !(host->quirks2 & SDHCI_QUIRK2_BROKEN_32BIT_BLK_CNT)) {
+   if (sdhci_readw(host, SDHCI_BLOCK_COUNT))
+   sdhci_writew(host, 0, SDHCI_BLOCK_COUNT);
+   sdhci_writew(host, data->blocks, SDHCI_32BIT_BLK_CNT);
+   } else {
+   sdhci_writew(host, data->blocks, SDHCI_BLOCK_COUNT);
+   }
 }
 
 static inline bool sdhci_auto_cmd12(struct sdhci_host *host,
diff --git a/drivers/mmc/host/sdhci.h b/drivers/mmc/host/sdhci.h
index f3b9ebc..0a1e25f 100644
--- a/drivers/mmc/host/sdhci.h
+++ b/drivers/mmc/host/sdhci.h
@@ -28,6 +28,7 @@
 
 #define SDHCI_DMA_ADDRESS  0x00
 #define SDHCI_ARGUMENT2SDHCI_DMA_ADDRESS
+#define SDHCI_32BIT_BLK_CNTSDHCI_DMA_ADDRESS
 
 #define SDHCI_BLOCK_SIZE   0x04
 #define  SDHCI_MAKE_BLKSZ(dma, blksz) (((dma & 0x7) << 12) | (blksz & 0xFFF))
@@ -462,6 +463,8 @@ struct sdhci_host {
  * obtainable timeout.
  */
 #define SDHCI_QUIRK2_DISABLE_HW_TIMEOUT(1<<17)
+/* Controller broken with using 32-bit block count in v4_mode */
+#define SDHCI_QUIRK2_BROKEN_32BIT_BLK_CNT  (1<<18)
 
int irq;/* Device IRQ */
void __iomem *ioaddr;   /* Mapped address */
-- 
2.7.4

[PATCH V6 3/9] mmc: sdhci: Change SDMA address register for v4 mode

2018-08-24 Thread Chunyan Zhang

According to the SD host controller specification version 4.10, when
Host Version 4 is enabled, SDMA uses ADMA System Address register
(05Fh-058h) instead of using SDMA System Address register to
support both 32-bit and 64-bit addressing.

Signed-off-by: Chunyan Zhang 
---
 drivers/mmc/host/sdhci.c | 30 --
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index a50842c..df283ca 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -727,7 +727,7 @@ static void sdhci_adma_table_post(struct sdhci_host *host,
}
 }
 
-static u32 sdhci_sdma_address(struct sdhci_host *host)
+static dma_addr_t sdhci_sdma_address(struct sdhci_host *host)
 {
if (host->bounce_buffer)
return host->bounce_addr;
@@ -735,6 +735,17 @@ static u32 sdhci_sdma_address(struct sdhci_host *host)
return sg_dma_address(host->data->sg);
 }
 
+static void sdhci_set_sdma_addr(struct sdhci_host *host, dma_addr_t addr)
+{
+   if (host->v4_mode) {
+   sdhci_writel(host, addr, SDHCI_ADMA_ADDRESS);
+   if (host->flags & SDHCI_USE_64_BIT_DMA)
+   sdhci_writel(host, (u64)addr >> 32, 
SDHCI_ADMA_ADDRESS_HI);
+   } else {
+   sdhci_writel(host, addr, SDHCI_DMA_ADDRESS);
+   }
+}
+
 static unsigned int sdhci_target_timeout(struct sdhci_host *host,
 struct mmc_command *cmd,
 struct mmc_data *data)
@@ -994,8 +1005,7 @@ static void sdhci_prepare_data(struct sdhci_host *host, 
struct mmc_command *cmd)
 SDHCI_ADMA_ADDRESS_HI);
} else {
WARN_ON(sg_cnt != 1);
-   sdhci_writel(host, sdhci_sdma_address(host),
-SDHCI_DMA_ADDRESS);
+   sdhci_set_sdma_addr(host, sdhci_sdma_address(host));
}
}
 
@@ -2823,7 +2833,7 @@ static void sdhci_data_irq(struct sdhci_host *host, u32 
intmask)
 * some controllers are faulty, don't trust them.
 */
if (intmask & SDHCI_INT_DMA_END) {
-   u32 dmastart, dmanow;
+   dma_addr_t dmastart, dmanow;
 
dmastart = sdhci_sdma_address(host);
dmanow = dmastart + host->data->bytes_xfered;
@@ -2831,12 +2841,12 @@ static void sdhci_data_irq(struct sdhci_host *host, u32 
intmask)
 * Force update to the next DMA block boundary.
 */
dmanow = (dmanow &
-   ~(SDHCI_DEFAULT_BOUNDARY_SIZE - 1)) +
+   ~((dma_addr_t)SDHCI_DEFAULT_BOUNDARY_SIZE - 1)) 
+
SDHCI_DEFAULT_BOUNDARY_SIZE;
host->data->bytes_xfered = dmanow - dmastart;
-   DBG("DMA base 0x%08x, transferred 0x%06x bytes, next 
0x%08x\n",
-   dmastart, host->data->bytes_xfered, dmanow);
-   sdhci_writel(host, dmanow, SDHCI_DMA_ADDRESS);
+   DBG("DMA base %pad, transferred 0x%06x bytes, next 
%pad\n",
+   &dmastart, host->data->bytes_xfered, &dmanow);
+   sdhci_set_sdma_addr(host, dmanow);
}
 
if (intmask & SDHCI_INT_DATA_END) {
@@ -3583,8 +3593,8 @@ int sdhci_setup_host(struct sdhci_host *host)
}
}
 
-   /* SDMA does not support 64-bit DMA */
-   if (host->flags & SDHCI_USE_64_BIT_DMA)
+   /* SDMA does not support 64-bit DMA if v4 mode not set */
+   if ((host->flags & SDHCI_USE_64_BIT_DMA) && !host->v4_mode)
host->flags &= ~SDHCI_USE_SDMA;
 
if (host->flags & SDHCI_USE_ADMA) {
-- 
2.7.4

[PATCH V6 9/9] dt-bindings: sdhci-sprd: Add bindings for the sdhci-sprd controller

2018-08-24 Thread Chunyan Zhang

From: Chunyan Zhang 

This patch adds the device-tree binding documentation for Spreadtrum
SDHCI driver.

Signed-off-by: Chunyan Zhang 
---
 .../devicetree/bindings/mmc/sdhci-sprd.txt | 41 ++
 1 file changed, 41 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/mmc/sdhci-sprd.txt

diff --git a/Documentation/devicetree/bindings/mmc/sdhci-sprd.txt 
b/Documentation/devicetree/bindings/mmc/sdhci-sprd.txt
new file mode 100644
index 000..45c9978
--- /dev/null
+++ b/Documentation/devicetree/bindings/mmc/sdhci-sprd.txt
@@ -0,0 +1,41 @@
+* Spreadtrum SDHCI controller (sdhci-sprd)
+
+The Secure Digital (SD) Host controller on Spreadtrum SoCs provides an 
interface
+for MMC, SD and SDIO types of cards.
+
+This file documents differences between the core properties in mmc.txt
+and the properties used by the sdhci-sprd driver.
+
+Required properties:
+- compatible: Should contain "sprd,sdhci-r11".
+- reg: physical base address of the controller and length.
+- interrupts: Interrupts used by the SDHCI controller.
+- clocks: Should contain phandle for the clock feeding the SDHCI controller
+- clock-names: Should contain the following:
+   "sdio" - SDIO source clock (required)
+   "enable" - gate clock which used for enabling/disabling the device 
(required)
+
+Optional properties:
+- assigned-clocks: the same with "sdio" clock
+- assigned-clock-parents: the default parent of "sdio" clock
+
+Examples:
+
+sdio0: sdio@2060 {
+   compatible  = "sprd,sdhci-r11";
+   reg = <0 0x2060 0 0x1000>;
+   interrupts = ;
+
+   clock-names = "sdio", "enable";
+   clocks = <&ap_clk CLK_EMMC_2X>,
+<&apahb_gate CLK_EMMC_EB>;
+   assigned-clocks = <&ap_clk CLK_EMMC_2X>;
+   assigned-clock-parents = <&rpll CLK_RPLL_390M>;
+
+   bus-width = <8>;
+   non-removable;
+   no-sdio;
+   no-sd;
+   cap-mmc-hw-reset;
+   status = "okay";
+};
-- 
2.7.4

[PATCH V6 7/9] mmc: sdhci: SDMA may use Auto-CMD23 in v4 mode

2018-08-24 Thread Chunyan Zhang

When Host Version 4 Enable is set to 1, SDMA uses ADMA System Address
register (05Fh-058h) instead of using register (000h-004h) to indicate
its system address of data location. The register (000h-004h) is
re-assigned to 32-bit Block Count and Auto CMD23 argument, so then SDMA
may use Auto CMD23.

Signed-off-by: Chunyan Zhang 
---
 drivers/mmc/host/sdhci.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 7e01601..1cb55f2 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -3828,10 +3828,14 @@ int sdhci_setup_host(struct sdhci_host *host)
if (host->quirks & SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12)
host->flags |= SDHCI_AUTO_CMD12;
 
-   /* Auto-CMD23 stuff only works in ADMA or PIO. */
+   /*
+* For v3 mode, Auto-CMD23 stuff only works in ADMA or PIO;
+* For v4 mode, SDMA may use Auto-CMD23 as well.
+*/
if ((host->version >= SDHCI_SPEC_300) &&
((host->flags & SDHCI_USE_ADMA) ||
-!(host->flags & SDHCI_USE_SDMA)) &&
+(!host->v4_mode && !(host->flags & SDHCI_USE_SDMA)) ||
+(host->v4_mode && (host->flags & SDHCI_USE_SDMA))) &&
 !(host->quirks2 & SDHCI_QUIRK2_ACMD23_BROKEN)) {
host->flags |= SDHCI_AUTO_CMD23;
DBG("Auto-CMD23 available\n");
-- 
2.7.4

[PATCH V6 8/9] mmc: sdhci-sprd: Add Spreadtrum's initial host controller

2018-08-24 Thread Chunyan Zhang

From: Chunyan Zhang 

This patch adds the initial support of Secure Digital Host Controller
Interface compliant controller found in some latest Spreadtrum chipsets.
This patch has been tested on the version of SPRD-R11 controller.

R11 is a variant based on SD v4.0 specification.

With this driver, R11 mmc can be initialized, can be mounted, read and
written.

Original-by: Billows Wu 
Signed-off-by: Chunyan Zhang 
---
 drivers/mmc/host/Kconfig  |  13 ++
 drivers/mmc/host/Makefile |   1 +
 drivers/mmc/host/sdhci-sprd.c | 485 ++
 3 files changed, 499 insertions(+)
 create mode 100644 drivers/mmc/host/sdhci-sprd.c

diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig
index 0581c19..c5424dc 100644
--- a/drivers/mmc/host/Kconfig
+++ b/drivers/mmc/host/Kconfig
@@ -581,6 +581,19 @@ config MMC_SDRICOH_CS
  To compile this driver as a module, choose M here: the
  module will be called sdricoh_cs.
 
+config MMC_SDHCI_SPRD
+   tristate "Spreadtrum SDIO host Controller"
+   depends on ARCH_SPRD
+   depends on MMC_SDHCI_PLTFM
+   select MMC_SDHCI_IO_ACCESSORS
+   help
+ This selects the SDIO Host Controller in Spreadtrum
+ SoCs, this driver supports R11(IP version: R11P0).
+
+ If you have a controller with this interface, say Y or M here.
+
+ If unsure, say N.
+
 config MMC_TMIO_CORE
tristate
 
diff --git a/drivers/mmc/host/Makefile b/drivers/mmc/host/Makefile
index 85dc132..b0b6802 100644
--- a/drivers/mmc/host/Makefile
+++ b/drivers/mmc/host/Makefile
@@ -89,6 +89,7 @@ obj-$(CONFIG_MMC_SDHCI_ST)+= sdhci-st.o
 obj-$(CONFIG_MMC_SDHCI_MICROCHIP_PIC32)+= sdhci-pic32.o
 obj-$(CONFIG_MMC_SDHCI_BRCMSTB)+= sdhci-brcmstb.o
 obj-$(CONFIG_MMC_SDHCI_OMAP)   += sdhci-omap.o
+obj-$(CONFIG_MMC_SDHCI_SPRD)   += sdhci-sprd.o
 obj-$(CONFIG_MMC_CQHCI)+= cqhci.o
 
 ifeq ($(CONFIG_CB710_DEBUG),y)
diff --git a/drivers/mmc/host/sdhci-sprd.c b/drivers/mmc/host/sdhci-sprd.c
new file mode 100644
index 000..2551e10
--- /dev/null
+++ b/drivers/mmc/host/sdhci-sprd.c
@@ -0,0 +1,485 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+// Secure Digital Host Controller
+//
+// Copyright (C) 2018 Spreadtrum, Inc.
+// Author: Chunyan Zhang 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "sdhci-pltfm.h"
+
+/* SDHCI_ARGUMENT2 register high 16bit */
+#define SDHCI_SPRD_ARG2_STUFF  GENMASK(31, 16)
+
+#define SDHCI_SPRD_REG_32_DLL_DLY_OFFSET   0x208
+#define  SDHCIBSPRD_IT_WR_DLY_INV  BIT(5)
+#define  SDHCI_SPRD_BIT_CMD_DLY_INVBIT(13)
+#define  SDHCI_SPRD_BIT_POSRD_DLY_INV  BIT(21)
+#define  SDHCI_SPRD_BIT_NEGRD_DLY_INV  BIT(29)
+
+#define SDHCI_SPRD_REG_32_BUSY_POSI0x250
+#define  SDHCI_SPRD_BIT_OUTR_CLK_AUTO_EN   BIT(25)
+#define  SDHCI_SPRD_BIT_INNR_CLK_AUTO_EN   BIT(24)
+
+#define SDHCI_SPRD_REG_DEBOUNCE0x28C
+#define  SDHCI_SPRD_BIT_DLL_BAKBIT(0)
+#define  SDHCI_SPRD_BIT_DLL_VALBIT(1)
+
+#define  SDHCI_SPRD_INT_SIGNAL_MASK0x1B7F410B
+
+/* SDHCI_HOST_CONTROL2 */
+#define  SDHCI_SPRD_CTRL_HS200 0x0005
+#define  SDHCI_SPRD_CTRL_HS400 0x0006
+
+/*
+ * According to the standard specification, BIT(3) of SDHCI_SOFTWARE_RESET is
+ * reserved, and only used on Spreadtrum's design, the hardware cannot work
+ * if this bit is cleared.
+ * 1 : normal work
+ * 0 : hardware reset
+ */
+#define  SDHCI_HW_RESET_CARD   BIT(3)
+
+#define SDHCI_SPRD_MAX_CUR 0xFF
+#define SDHCI_SPRD_CLK_MAX_DIV 1023
+
+#define SDHCI_SPRD_CLK_DEF_RATE2600
+
+struct sdhci_sprd_host {
+   u32 version;
+   struct clk *clk_sdio;
+   struct clk *clk_enable;
+   u32 base_rate;
+};
+
+#define TO_SPRD_HOST(host) sdhci_pltfm_priv(sdhci_priv(host))
+
+static void sdhci_sprd_init_config(struct sdhci_host *host)
+{
+   u16 val;
+
+   /* set dll backup mode */
+   val = sdhci_readl(host, SDHCI_SPRD_REG_DEBOUNCE);
+   val |= SDHCI_SPRD_BIT_DLL_BAK | SDHCI_SPRD_BIT_DLL_VAL;
+   sdhci_writel(host, val, SDHCI_SPRD_REG_DEBOUNCE);
+}
+
+static inline u32 sdhci_sprd_readl(struct sdhci_host *host, int reg)
+{
+   if (unlikely(reg == SDHCI_MAX_CURRENT))
+   return SDHCI_SPRD_MAX_CUR;
+
+   return readl_relaxed(host->ioaddr + reg);
+}
+
+static inline void sdhci_sprd_writel(struct sdhci_host *host, u32 val, int reg)
+{
+   /* SDHCI_MAX_CURRENT is reserved on Spreadtrum's platform */
+   if (unlikely(reg == SDHCI_MAX_CURRENT))
+   return;
+
+   if (unlikely(reg == SDHCI_SIGNAL_ENABLE || reg == SDHCI_INT_ENABLE))
+   val = val & SDHCI_SPRD_INT_SIGNAL_MASK;
+
+   writel_relaxed(val, host->ioaddr + reg);
+}
+
+static inline void sdhci_sprd_wr

[PATCH v1] tools/vm/page-types.c: fix "defined but not used" warning

2018-08-24 Thread Naoya Horiguchi

debugfs_known_mountpoints[] is not used any more, so let's remove it.

Signed-off-by: Naoya Horiguchi 
---
 tools/vm/page-types.c | 6 --
 1 file changed, 6 deletions(-)

diff --git v4.18-mmotm-2018-08-17-15-48/tools/vm/page-types.c 
v4.18-mmotm-2018-08-17-15-48_patched/tools/vm/page-types.c
index 30cb0a0..37908a8 100644
--- v4.18-mmotm-2018-08-17-15-48/tools/vm/page-types.c
+++ v4.18-mmotm-2018-08-17-15-48_patched/tools/vm/page-types.c
@@ -159,12 +159,6 @@ static const char * const page_flag_names[] = {
 };
 
 
-static const char * const debugfs_known_mountpoints[] = {
-   "/sys/kernel/debug",
-   "/debug",
-   0,
-};
-
 /*
  * data structures
  */
-- 
2.7.0

Re: [PATCH v6 1/2] mm: migration: fix migration of huge PMD shared pages

2018-08-24 Thread Michal Hocko

On Thu 23-08-18 13:59:16, Mike Kravetz wrote:
[...]
> @@ -1409,6 +1419,32 @@ static bool try_to_unmap_one(struct page *page, struct 
> vm_area_struct *vma,
>   subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte);
>   address = pvmw.address;
>  
> + if (PageHuge(page)) {
> + if (huge_pmd_unshare(mm, &address, pvmw.pte)) {
> + /*
> +  * huge_pmd_unshare unmapped an entire PMD
> +  * page.  There is no way of knowing exactly
> +  * which PMDs may be cached for this mm, so
> +  * we must flush them all.  start/end were
> +  * already adjusted above to cover this range.
> +  */
> + flush_cache_range(vma, start, end);
> + flush_tlb_range(vma, start, end);
> + mmu_notifier_invalidate_range(mm, start, end);
> +
> + /*
> +  * The ref count of the PMD page was dropped
> +  * which is part of the way map counting
> +  * is done for shared PMDs.  Return 'true'
> +  * here.  When there is no other sharing,
> +  * huge_pmd_unshare returns false and we will
> +  * unmap the actual page and drop map count
> +  * to zero.
> +  */
> + page_vma_mapped_walk_done(&pvmw);
> + break;
> + }
> + }

Wait a second. This is not correct, right? You have to call the
notifiers after page_vma_mapped_walk_done because they might be
sleepable and we are still holding the pte lock. This is btw. a problem
for other users of mmu_notifier_invalidate_range in try_to_unmap_one,
unless I am terribly confused. This would suggest 369ea8242c0fb is
incorrect.
-- 
Michal Hocko
SUSE Labs

[PATCH v2 0/2] drm/atmel-hlcdc: revise selection of pixel-clock frequency divider

2018-08-24 Thread Peter Rosin

Hi!

Some background can be found here:
https://lists.freedesktop.org/archives/dri-devel/2018-August/187182.html

The "10 times" discriminator in patch 2/2 can certainly be discussed...

Cheers,
Peter

Changes since v1https://lkml.org/lkml/2018/8/24/187

- added {} to an if body for symmetry
- reformatted comments a little bit
- spelling/grammar fixes

Peter Rosin (2):
  drm/atmel-hlcdc: prefer a higher rate clock as pixel-clock base
  drm/atmel-hlcdc: allow selecting a higher pixel-clock than requested

 drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c | 30 --
 1 file changed, 23 insertions(+), 7 deletions(-)

-- 
2.11.0

[PATCH v2 1/2] drm/atmel-hlcdc: prefer a higher rate clock as pixel-clock base

2018-08-24 Thread Peter Rosin

If the divider used to get the pixel-clock is small, the granularity
of the frequencies possible for the pixel-clock is quite coarse. E.g.
requesting a pixel-clock of 65MHz with a sys_clk of 132MHz results
in the divider being set to 3 ending up with 44MHz.

By preferring the doubled sys_clk as base, the divider instead ends
up as 5 yielding a pixel-clock of 52.8Mhz, which is a definite
improvement.

While at it, clamp the divider so that it does not overflow in case
it gets big.

Signed-off-by: Peter Rosin 
---
 drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c 
b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c
index c38a479ada98..0d9d1042752a 100644
--- a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c
+++ b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c
@@ -101,18 +101,22 @@ static void atmel_hlcdc_crtc_mode_set_nofb(struct 
drm_crtc *c)
 (adj->crtc_hdisplay - 1) |
 ((adj->crtc_vdisplay - 1) << 16));
 
-   cfg = 0;
+   cfg = ATMEL_HLCDC_CLKSEL;
 
-   prate = clk_get_rate(crtc->dc->hlcdc->sys_clk);
+   prate = 2 * clk_get_rate(crtc->dc->hlcdc->sys_clk);
mode_rate = adj->crtc_clock * 1000;
-   if ((prate / 2) < mode_rate) {
-   prate *= 2;
-   cfg |= ATMEL_HLCDC_CLKSEL;
-   }
 
div = DIV_ROUND_UP(prate, mode_rate);
-   if (div < 2)
+   if (div < 2) {
div = 2;
+   } else if (ATMEL_HLCDC_CLKDIV(div) & ~ATMEL_HLCDC_CLKDIV_MASK) {
+   /* The divider ended up too big, try a lower base rate. */
+   cfg &= ~ATMEL_HLCDC_CLKSEL;
+   prate /= 2;
+   div = DIV_ROUND_UP(prate, mode_rate);
+   if (ATMEL_HLCDC_CLKDIV(div) & ~ATMEL_HLCDC_CLKDIV_MASK)
+   div = ATMEL_HLCDC_CLKDIV_MASK;
+   }
 
cfg |= ATMEL_HLCDC_CLKDIV(div);
 
-- 
2.11.0

Re: [PATCH v8 07/26] PM / Domains: Add genpd governor for CPUs

2018-08-24 Thread Ulf Hansson

On 9 August 2018 at 17:39, Lorenzo Pieralisi  wrote:
> On Mon, Aug 06, 2018 at 11:20:59AM +0200, Rafael J. Wysocki wrote:
>
> [...]
>
>> >>> > @@ -245,6 +248,56 @@ static bool always_on_power_down_ok(struct 
>> >>> > dev_pm_domain *domain)
>> >>> > return false;
>> >>> >  }
>> >>> >
>> >>> > +static bool cpu_power_down_ok(struct dev_pm_domain *pd)
>> >>> > +{
>> >>> > +   struct generic_pm_domain *genpd = pd_to_genpd(pd);
>> >>> > +   ktime_t domain_wakeup, cpu_wakeup;
>> >>> > +   s64 idle_duration_ns;
>> >>> > +   int cpu, i;
>> >>> > +
>> >>> > +   if (!(genpd->flags & GENPD_FLAG_CPU_DOMAIN))
>> >>> > +   return true;
>> >>> > +
>> >>> > +   /*
>> >>> > +* Find the next wakeup for any of the online CPUs within the PM 
>> >>> > domain
>> >>> > +* and its subdomains. Note, we only need the genpd->cpus, as it 
>> >>> > already
>> >>> > +* contains a mask of all CPUs from subdomains.
>> >>> > +*/
>> >>> > +   domain_wakeup = ktime_set(KTIME_SEC_MAX, 0);
>> >>> > +   for_each_cpu_and(cpu, genpd->cpus, cpu_online_mask) {
>> >>> > +   cpu_wakeup = tick_nohz_get_next_wakeup(cpu);
>> >>> > +   if (ktime_before(cpu_wakeup, domain_wakeup))
>> >>> > +   domain_wakeup = cpu_wakeup;
>> >>> > +   }
>> >>
>> >> Here's a concern I have missed before. :-/
>> >>
>> >> Say, one of the CPUs you're walking here is woken up in the meantime.
>> >
>> > Yes, that can happen - when we miss-predicted "next wakeup".
>> >
>> >>
>> >> I don't think it is valid to evaluate tick_nohz_get_next_wakeup() for it 
>> >> then
>> >> to update domain_wakeup.  We really should just avoid the domain power 
>> >> off in
>> >> that case at all IMO.
>> >
>> > Correct.
>> >
>> > However, we also want to avoid locking contentions in the idle path,
>> > which is what this boils done to.
>>
>> This already is done under genpd_lock() AFAICS, so I'm not quite sure
>> what exactly you mean.
>>
>> Besides, this is not just about increased latency, which is a concern
>> by itself but maybe not so much in all environments, but also about
>> possibility of missing a CPU wakeup, which is a major issue.
>>
>> If one of the CPUs sharing the domain with the current one is woken up
>> during cpu_power_down_ok() and the wakeup is an edge-triggered
>> interrupt and the domain is turned off regardless, the wakeup may be
>> missed entirely if I'm not mistaken.
>>
>> It looks like there needs to be a way for the hardware to prevent a
>> domain poweroff when there's a pending interrupt or I don't quite see
>> how this can be handled correctly.
>>
>> >> Sure enough, if the domain power off is already started and one of the 
>> >> CPUs
>> >> in the domain is woken up then, too bad, it will suffer the latency (but 
>> >> in
>> >> that case the hardware should be able to help somewhat), but otherwise CPU
>> >> wakeup should prevent domain power off from being carried out.
>> >
>> > The CPU is not prevented from waking up, as we rely on the FW to deal with 
>> > that.
>> >
>> > Even if the above computation turns out to wrongly suggest that the
>> > cluster can be powered off, the FW shall together with the genpd
>> > backend driver prevent it.
>>
>> Fine, but then the solution depends on specific FW/HW behavior, so I'm
>> not sure how generic it really is.  At least, that expectation should
>> be clearly documented somewhere, preferably in code comments.
>>
>> > To cover this case for PSCI, we also use a per cpu variable for the
>> > CPU's power off state, as can be seen later in the series.
>>
>> Oh great, but the generic part should be independent on the underlying
>> implementation of the driver.  If it isn't, then it also is not
>> generic.
>>
>> > Hope this clarifies your concern, else tell and will to elaborate a bit 
>> > more.
>>
>> Not really.
>>
>> There also is one more problem and that is the interaction between
>> this code and the idle governor.
>>
>> Namely, the idle governor may select a shallower state for some
>> reason, for example due to an additional latency limit derived from
>> CPU utilization (like in the menu governor), and how does the code in
>> cpu_power_down_ok() know what state has been selected and how does it
>> honor the selection made by the idle governor?
>
> That's a good question and it maybe gives a path towards a solution.
>
> AFAICS the genPD governor only selects the idle state parameter that
> determines the idle state at, say, GenPD cpumask level it does not touch
> the CPUidle decision, that works on a subset of idle states (at cpu
> level).
>
> That's my understanding, which can be wrong so please correct me
> if that's the case because that's a bit confusing.
>
> Let's imagine that we flattened out the list of idle states and feed
> CPUidle with it (all of them - cpu, cluster, package, system - as it is
> in the mainline _now_). Then the GenPD governor can run-through the
> CPUidle selection and _demote_ the idle state if necessary since it
> understands that some CPUs

[GIT PULL] namespace fixes for v4.19-rc1

2018-08-24 Thread Eric W. Biederman

Linus,

Please pull the userns-linus branch from the git tree:

   git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git 
userns-linus

   HEAD: 82c9a927bc5df6e06b72d206d24a9d10cced4eb5 getxattr: use correct xattr 
length

This is a set of 4 fairly obvious bug fixes.  A switch from d_find_alias
to d_find_any_alias because the xattr code perversely takes a dentry.
Two mutex vs copy_to_user fixes from Jann Horn and a fix to use a
sanitized size not the size userspace passed in from Christian Brauner.

This is coming late because I fell behind this last development cycle,
and because I have been travelling.  I do not intend to make a habit of
sending pull requests late.

The last fix by Christian Brauner has a very recent commit date.  I was
doing a final review of this pull request and I noticed it was mixing a
fixes tag.  I added the fixes tag so that the bug that is being fixed
can be put into perspective.

Resent because I somehow missed the [GIT PULL] tag when I sent this out
the first time.  I think travelling from GMT-0500 to GMT+0200 to visit
family is affecting more than I thought.

Christian Brauner (1):
  getxattr: use correct xattr length

Eddie.Horng (1):
  cap_inode_getsecurity: use d_find_any_alias() instead of d_find_alias()

Jann Horn (2):
  userns: move user access out of the mutex
  sys: don't hold uts_sem while accessing userspace memory

 arch/alpha/kernel/osf_sys.c  | 51 ++---
 arch/sparc/kernel/sys_sparc_32.c | 22 ++
 arch/sparc/kernel/sys_sparc_64.c | 20 +
 fs/xattr.c   |  2 +-
 kernel/sys.c | 95 +++-
 kernel/user_namespace.c  | 24 +-
 kernel/utsname_sysctl.c  | 41 ++---
 security/commoncap.c |  2 +-
 8 files changed, 131 insertions(+), 126 deletions(-)

Eric

[PATCH v1] tools/vm/slabinfo.c: fix sign-compare warning

2018-08-24 Thread Naoya Horiguchi

Currently we get the following compiler warning:

slabinfo.c:854:22: warning: comparison between signed and unsigned integer 
expressions [-Wsign-compare]
   if (s->object_size < min_objsize)
  ^

due to the mismatch of signed/unsigned comparison. ->object_size and
->slab_size are never expected to be negative, so let's define them
as unsigned int.

Signed-off-by: Naoya Horiguchi 
---
 tools/vm/slabinfo.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git v4.18-mmotm-2018-08-17-15-48/tools/vm/slabinfo.c 
v4.18-mmotm-2018-08-17-15-48_patched/tools/vm/slabinfo.c
index f82c2ea..eebeeb1 100644
--- v4.18-mmotm-2018-08-17-15-48/tools/vm/slabinfo.c
+++ v4.18-mmotm-2018-08-17-15-48_patched/tools/vm/slabinfo.c
@@ -30,9 +30,10 @@ struct slabinfo {
int alias;
int refs;
int aliases, align, cache_dma, cpu_slabs, destroy_by_rcu;
-   int hwcache_align, object_size, objs_per_slab;
-   int sanity_checks, slab_size, store_user, trace;
+   int hwcache_align, objs_per_slab;
+   int sanity_checks, store_user, trace;
int order, poison, reclaim_account, red_zone;
+   unsigned int object_size, slab_size;
unsigned long partial, objects, slabs, objects_partial, objects_total;
unsigned long alloc_fastpath, alloc_slowpath;
unsigned long free_fastpath, free_slowpath;
-- 
2.7.0

[PATCH V2] Bluetooth: bt3c_cs: Fix obsolete function

2018-08-24 Thread Ding Xiang

simple_strtol and simple_strtoul are obsolete, both place
use kstrtoul instead.

V2: fix error tmp += tn

Signed-off-by: Ding Xiang 
---
 drivers/bluetooth/bt3c_cs.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/bluetooth/bt3c_cs.c b/drivers/bluetooth/bt3c_cs.c
index 25b0cf9..8f03774 100644
--- a/drivers/bluetooth/bt3c_cs.c
+++ b/drivers/bluetooth/bt3c_cs.c
@@ -449,7 +449,7 @@ static int bt3c_load_firmware(struct bt3c_info *info,
char *ptr = (char *) firmware;
char b[9];
unsigned int iobase, tmp;
-   unsigned long size, addr, fcs;
+   unsigned long size, addr, fcs, tn;
int i, err = 0;
 
iobase = info->p_dev->resource[0]->start;
@@ -490,7 +490,9 @@ static int bt3c_load_firmware(struct bt3c_info *info,
memset(b, 0, sizeof(b));
for (tmp = 0, i = 0; i < size; i++) {
memcpy(b, ptr + (i * 2) + 2, 2);
-   tmp += simple_strtol(b, NULL, 16);
+   if (kstrtoul(b, 16, &tn))
+   return -EINVAL;
+   tmp += tn;
}
 
if (((tmp + fcs) & 0xff) != 0xff) {
@@ -505,7 +507,8 @@ static int bt3c_load_firmware(struct bt3c_info *info,
memset(b, 0, sizeof(b));
for (i = 0; i < (size - 4) / 2; i++) {
memcpy(b, ptr + (i * 4) + 12, 4);
-   tmp = simple_strtoul(b, NULL, 16);
+   if (kstrtoul(b, 16, &tmp))
+   return -EINVAL;
bt3c_put(iobase, tmp);
}
}
-- 
1.8.3.1

[GIT PULL] Urgent ACPI Kconfig fix for v4.19-rc1

2018-08-24 Thread Rafael J. Wysocki

Hi Linus,

Please pull from the tag

 git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
 acpi-4.19-rc1-3

with top-most commit f5d707ede37a962bc3cb9b3f8531a870dae29e46

 ACPI: fix menuconfig presentation of ACPI submenu

on top of commit df2def49c57b4146520a1f4ca37bc3f494e2cd67

 Merge tag 'acpi-4.19-rc1-2' of
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

to receive an urgent ACPI Kconfig fix for 4.19-rc1.

This fixes recent menuconfig breakage causing it to present
ACPI-specific options incorrectly (Arnd Bergmann).

Thanks!


---

Arnd Bergmann (1):
  ACPI: fix menuconfig presentation of ACPI submenu

---

 drivers/acpi/Kconfig | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Re: [PATCH] sched/fair: vruntime should normalize when switching from fair

2018-08-24 Thread Peter Zijlstra

On Mon, Aug 20, 2018 at 04:54:25PM -0700, Miguel de Dios wrote:
> On 08/17/2018 11:27 AM, Steve Muckle wrote:
> > From: John Dias 
> > 
> > When rt_mutex_setprio changes a task's scheduling class to RT,
> > we're seeing cases where the task's vruntime is not updated
> > correctly upon return to the fair class.
> > Specifically, the following is being observed:
> > - task is deactivated while still in the fair class
> > - task is boosted to RT via rt_mutex_setprio, which changes
> >the task to RT and calls check_class_changed.
> > - check_class_changed leads to detach_task_cfs_rq, at which point
> >the vruntime_normalized check sees that the task's state is TASK_WAKING,
> >which results in skipping the subtraction of the rq's min_vruntime
> >from the task's vruntime
> > - later, when the prio is deboosted and the task is moved back
> >to the fair class, the fair rq's min_vruntime is added to
> >the task's vruntime, even though it wasn't subtracted earlier.
> > The immediate result is inflation of the task's vruntime, giving
> > it lower priority (starving it if there's enough available work).
> > The longer-term effect is inflation of all vruntimes because the
> > task's vruntime becomes the rq's min_vruntime when the higher
> > priority tasks go idle. That leads to a vicious cycle, where
> > the vruntime inflation repeatedly doubled.
> > 
> > The change here is to detect when vruntime_normalized is being
> > called when the task is waking but is waking in another class,
> > and to conclude that this is a case where vruntime has not
> > been normalized.
> > 
> > Signed-off-by: John Dias 
> > Signed-off-by: Steve Muckle 
> > ---
> >   kernel/sched/fair.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index b39fb596f6c1..14011d7929d8 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -9638,7 +9638,8 @@ static inline bool vruntime_normalized(struct 
> > task_struct *p)
> >  * - A task which has been woken up by try_to_wake_up() and
> >  *   waiting for actually being woken up by sched_ttwu_pending().
> >  */
> > -   if (!se->sum_exec_runtime || p->state == TASK_WAKING)
> > +   if (!se->sum_exec_runtime ||
> > +   (p->state == TASK_WAKING && p->sched_class == &fair_sched_class))
> > return true;
> > return false;

> The normalization of vruntime used to exist in task_waking but it was
> removed and the normalization was moved into migrate_task_rq_fair. The
> reasoning being that task_waking_fair was only hit when a task is queued
> onto a different core and migrate_task_rq_fair should do the same work.
> 
> However, we're finding that there's one case which migrate_task_rq_fair
> doesn't hit: that being the case where rt_mutex_setprio changes a task's
> scheduling class to RT when its scheduled out. The task never hits
> migrate_task_rq_fair because it is switched to RT and migrates as an RT
> task. Because of this we're getting an unbounded addition of min_vruntime
> when the task is re-attached to the CFS runqueue when it loses the inherited
> priority. The patch above works because now the kernel specifically checks
> for this case and normalizes accordingly.
> 
> Here's the patch I was talking about:
> https://lore.kernel.org/patchwork/patch/677689/. In our testing we were
> seeing vruntimes nearly double every time after rt_mutex_setprio boosts the
> task to RT.

Bah, patchwork is such shit... how do you get to the previus patch from
there? Because I think 2/3 is the actual commit that changed things, 3/3
just cleans up a bit.

That would be commit:

  b5179ac70de8 ("sched/fair: Prepare to fix fairness problems on migration")

But I'm still somewhat confused; how would task_waking_fair() have
helped if we're already changed to a different class?

Re: [PATCH] cpuidle: menu: Retain tick when shallow state is selected

2018-08-24 Thread Rafael J. Wysocki

On Tuesday, August 21, 2018 10:44:10 AM CEST Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki 
> 
> The case addressed by commit 5ef499cd571c (cpuidle: menu: Handle
> stopped tick more aggressively) in the stopped tick case is present
> when the tick has not been stopped yet too.  Namely, if only two CPU
> idle states, shallow state A with target residency significantly
> below the tick boundary and deep state B with target residency
> significantly above it, are available and the predicted idle
> duration is above the tick boundary, but below the target residency
> of state B, state A will be selected and the CPU may spend indefinite
> amount of time in it, which is not quite energy-efficient.
> 
> However, if the tick has not been stopped yet and the governor is
> about to select a shallow idle state for the CPU even though the idle
> duration predicted by it is above the tick boundary, it should be
> fine to wake up the CPU early, so the tick can be retained then and
> the governor will have a chance to select a deeper state when it runs
> next time.
> 
> [Note that when this really happens, it will make the idle duration
>  predictor believe that the CPU might be idle longer than predicted,
>  which will make it more likely to predict longer idle durations going
>  forward, but that will also cause deeper idle states to be selected
>  going forward, on average, which is what's needed here.]
> 
> Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with 
> stopped tick)
> Reported-by: Leo Yan 
> Signed-off-by: Rafael J. Wysocki 
> ---
> 
> Commit 5ef499cd571c (cpuidle: menu: Handle stopped tick more aggressively) is
> in linux-next only at this point.
> 
> ---
>  drivers/cpuidle/governors/menu.c |   13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> Index: linux-pm/drivers/cpuidle/governors/menu.c
> ===
> --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> +++ linux-pm/drivers/cpuidle/governors/menu.c
> @@ -379,9 +379,20 @@ static int menu_select(struct cpuidle_dr
>   if (idx == -1)
>   idx = i; /* first enabled state */
>   if (s->target_residency > data->predicted_us) {
> - if (!tick_nohz_tick_stopped())
> + if (data->predicted_us < TICK_USEC)
>   break;
>  
> + if (!tick_nohz_tick_stopped()) {
> + /*
> +  * If the state selected so far is shallow,
> +  * waking up early won't hurt, so retain the
> +  * tick in that case and let the governor run
> +  * again in the next iteration of the loop.
> +  */
> + expected_interval = 
> drv->states[idx].target_residency;
> + break;
> + }
> +
>   /*
>* If the state selected so far is shallow and this
>* state's target residency matches the time till the
> 
> 

Due to the lack of objections, I'm inclined to queue this up.

Re: [RFC PATCH] EDAC, ghes: Enable per-layer error reporting for ARM

2018-08-24 Thread James Morse

Hi Tyler,

On 23/08/18 16:46, Tyler Baicar wrote:
> On Thu, Aug 23, 2018 at 5:29 AM James Morse  wrote:
>> On 19/07/18 19:36, Tyler Baicar wrote:
>>> On 7/19/2018 10:46 AM, James Morse wrote:
 On 19/07/18 15:01, Borislav Petkov wrote:
> On Mon, Jul 16, 2018 at 01:26:49PM -0400, Tyler Baicar wrote:
>> Enable per-layer error reporting for ARM systems so that the error
>> counters are incremented per-DIMM.
>>
>> This 'layer' term seems to be EDAC's artificial view of memory.
>>
> 
> Yes, it's just the terminology that EDAC uses for locating a DIMM.
> 
> "Layer" can mean several things here:
> 
> https://elixir.bootlin.com/linux/latest/source/include/linux/edac.h#L318

Aha, its an enum. I thought it was an upper/middle/lower mapping at the whim of
the edac driver.

[...]

>> [re-ordered hunk:]
>>> This seems pretty hacky to me, so if anyone has other suggestions please 
>>> share
>>> them.
>>
>> CPER's "Memory Error Record 2" thinks that "NODE, CARD and MODULE should 
>> provide
>> the information necessary to identify the failing FRU". As EDAC has three
>> 'levels', these are what they should correspond to for ghes-edac.
>>
>> I assume NODE means rack/chassis in some distributed system. Lets ignore it 
>> as
>> it doesn't seem to map to anything in the SMBIOS table.
> 
> I believe NODE should map to socket number for multi-socket systems.

Isn't the Memory Array Structure still unique in a multi-socket system? If so
the node isn't telling us anything new.

Do sockets show up in the SMBIOS table? We would need to know how many there are
in advance. For arm systems the cpu topology from PPTT is the best bet for this
information, but what do we do if that table is missing? (also, does firmware
count from 1 or 0?) I suspect we can't use this field unless we know what the
range of values is going to be in advance.

I assumed this node must be a level of information above Card/Memory-Array's
address-space. Somehow the Card handle isn't no long unique, we need the node
number too. If the CPER records were all being pumped at a single agent, (shared
BMC in a blade/chassis thing) then this might matter. I suspect we can ignore it
in linux.

>> The CPER record's card and module numbers are useless to us, as we need to 
>> know
>> how many there will be in advance. (does this version of firmware count from >> 0
>> or 1?)
>>
>> ... but CPER also gives us a 'Card Handle' and 'Module Handle'.
>> 'Module Handle' maps to SMBIOS:17 Memory Device (aka, a DIMM). The Handle is 
>> a
>> word-value in the structure, so it doesn't depend on the layout/parse-order 
>> of
>> the SMBIOS tables. When we count the DIMMs in edac-ghes we can give them some
>> level-idx, then use the handle to find which level-idx to use for this DIMM.
>>
>> ghes_edac_report_mem_error() already picks up the module-handle, but only 
>> uses
>> it to print the bank/device.
>>
>> 'Card' doesn't mean much to me, but it maps to SMBIOS:17 "Memory Array
>> Structure", which the Memory Device structure also points to.
>> Card then must mean "a collection of memory devices (DIMMs) that operate
>> together to form an address space".
>>
>> This might be what I think of as a memory-controller, or it might be 
>> something
>> more complicated. Regardless, the CPER records think its relevant.
>>
>> For the edac:layers, we could walk the DMI table to find these structures, 
>> and
>> build the layers from them. If the Memory-array-structures are missing, we 
>> can
>> use the existing 1:NUM_DIMMS approach.

> I think the proper way to get this working would be to use these handles. We 
> can
> avoid populating this layer information and instead have a mapping of type 17
> index number (how edac is numbering the DIMMs today) to the handle number.

Why get avoid the layer stuff? Isn't counting DIMM/memory-devices what
EDAC_MC_LAYER_SLOT is for?

> Then we will need a new function to increment the counter based on the handle
> number rather than this layer information. Is that how you are envisioning it?

I'm not familiar with edac's internals, so I didn't have any particular vision!

Isn't the problem that ghes_edac_report_mem_error() does this:
|   e->top_layer = -1;
|   e->mid_layer = -1;
|   e->low_layer = -1;

so edac_raw_mc_handle_error() has no clue where the error happened. (I haven't
read what it does with this information yet).

ghes_edac_report_mem_error() does check CPER_MEM_VALID_MODULE_HANDLE, and if its
set, it uses the handle to find the bank/device strings and prints them out.

Naively I thought we could generate some index during ghes_edac_count_dimms(),
and use this as e->${whichever}_layer. I hoped there would be something we could
already use as the index, but I can't spot it, so this will be more than the
one-liner I was hoping for!

Thanks,

James

[PATCH 1/2] iio: adc: sc27xx: Add raw data support

2018-08-24 Thread Baolin Wang

The headset device will use channel 20 of ADC controller to detect events,
but it needs the raw ADC data to do conversion according to its own formula.

Thus we should configure the channel mask separately and configure channel
20 as IIO_CHAN_INFO_RAW, as well as adding raw data read support.

Signed-off-by: Baolin Wang 
---
 drivers/iio/adc/sc27xx_adc.c |   80 --
 1 file changed, 45 insertions(+), 35 deletions(-)

diff --git a/drivers/iio/adc/sc27xx_adc.c b/drivers/iio/adc/sc27xx_adc.c
index 2b60efe..153c311 100644
--- a/drivers/iio/adc/sc27xx_adc.c
+++ b/drivers/iio/adc/sc27xx_adc.c
@@ -273,6 +273,17 @@ static int sc27xx_adc_read_raw(struct iio_dev *indio_dev,
int ret, tmp;
 
switch (mask) {
+   case IIO_CHAN_INFO_RAW:
+   mutex_lock(&indio_dev->mlock);
+   ret = sc27xx_adc_read(data, chan->channel, scale, &tmp);
+   mutex_unlock(&indio_dev->mlock);
+
+   if (ret)
+   return ret;
+
+   *val = tmp;
+   return IIO_VAL_INT;
+
case IIO_CHAN_INFO_PROCESSED:
mutex_lock(&indio_dev->mlock);
ret = sc27xx_adc_read_processed(data, chan->channel, scale,
@@ -315,48 +326,47 @@ static int sc27xx_adc_write_raw(struct iio_dev *indio_dev,
.write_raw = &sc27xx_adc_write_raw,
 };
 
-#define SC27XX_ADC_CHANNEL(index) {\
+#define SC27XX_ADC_CHANNEL(index, mask) {  \
.type = IIO_VOLTAGE,\
.channel = index,   \
-   .info_mask_separate = BIT(IIO_CHAN_INFO_PROCESSED) |\
- BIT(IIO_CHAN_INFO_SCALE), \
+   .info_mask_separate = mask | BIT(IIO_CHAN_INFO_SCALE),  \
.datasheet_name = "CH##index",  \
.indexed = 1,   \
 }
 
 static const struct iio_chan_spec sc27xx_channels[] = {
-   SC27XX_ADC_CHANNEL(0),
-   SC27XX_ADC_CHANNEL(1),
-   SC27XX_ADC_CHANNEL(2),
-   SC27XX_ADC_CHANNEL(3),
-   SC27XX_ADC_CHANNEL(4),
-   SC27XX_ADC_CHANNEL(5),
-   SC27XX_ADC_CHANNEL(6),
-   SC27XX_ADC_CHANNEL(7),
-   SC27XX_ADC_CHANNEL(8),
-   SC27XX_ADC_CHANNEL(9),
-   SC27XX_ADC_CHANNEL(10),
-   SC27XX_ADC_CHANNEL(11),
-   SC27XX_ADC_CHANNEL(12),
-   SC27XX_ADC_CHANNEL(13),
-   SC27XX_ADC_CHANNEL(14),
-   SC27XX_ADC_CHANNEL(15),
-   SC27XX_ADC_CHANNEL(16),
-   SC27XX_ADC_CHANNEL(17),
-   SC27XX_ADC_CHANNEL(18),
-   SC27XX_ADC_CHANNEL(19),
-   SC27XX_ADC_CHANNEL(20),
-   SC27XX_ADC_CHANNEL(21),
-   SC27XX_ADC_CHANNEL(22),
-   SC27XX_ADC_CHANNEL(23),
-   SC27XX_ADC_CHANNEL(24),
-   SC27XX_ADC_CHANNEL(25),
-   SC27XX_ADC_CHANNEL(26),
-   SC27XX_ADC_CHANNEL(27),
-   SC27XX_ADC_CHANNEL(28),
-   SC27XX_ADC_CHANNEL(29),
-   SC27XX_ADC_CHANNEL(30),
-   SC27XX_ADC_CHANNEL(31),
+   SC27XX_ADC_CHANNEL(0, BIT(IIO_CHAN_INFO_PROCESSED)),
+   SC27XX_ADC_CHANNEL(1, BIT(IIO_CHAN_INFO_PROCESSED)),
+   SC27XX_ADC_CHANNEL(2, BIT(IIO_CHAN_INFO_PROCESSED)),
+   SC27XX_ADC_CHANNEL(3, BIT(IIO_CHAN_INFO_PROCESSED)),
+   SC27XX_ADC_CHANNEL(4, BIT(IIO_CHAN_INFO_PROCESSED)),
+   SC27XX_ADC_CHANNEL(5, BIT(IIO_CHAN_INFO_PROCESSED)),
+   SC27XX_ADC_CHANNEL(6, BIT(IIO_CHAN_INFO_PROCESSED)),
+   SC27XX_ADC_CHANNEL(7, BIT(IIO_CHAN_INFO_PROCESSED)),
+   SC27XX_ADC_CHANNEL(8, BIT(IIO_CHAN_INFO_PROCESSED)),
+   SC27XX_ADC_CHANNEL(9, BIT(IIO_CHAN_INFO_PROCESSED)),
+   SC27XX_ADC_CHANNEL(10, BIT(IIO_CHAN_INFO_PROCESSED)),
+   SC27XX_ADC_CHANNEL(11, BIT(IIO_CHAN_INFO_PROCESSED)),
+   SC27XX_ADC_CHANNEL(12, BIT(IIO_CHAN_INFO_PROCESSED)),
+   SC27XX_ADC_CHANNEL(13, BIT(IIO_CHAN_INFO_PROCESSED)),
+   SC27XX_ADC_CHANNEL(14, BIT(IIO_CHAN_INFO_PROCESSED)),
+   SC27XX_ADC_CHANNEL(15, BIT(IIO_CHAN_INFO_PROCESSED)),
+   SC27XX_ADC_CHANNEL(16, BIT(IIO_CHAN_INFO_PROCESSED)),
+   SC27XX_ADC_CHANNEL(17, BIT(IIO_CHAN_INFO_PROCESSED)),
+   SC27XX_ADC_CHANNEL(18, BIT(IIO_CHAN_INFO_PROCESSED)),
+   SC27XX_ADC_CHANNEL(19, BIT(IIO_CHAN_INFO_PROCESSED)),
+   SC27XX_ADC_CHANNEL(20, BIT(IIO_CHAN_INFO_RAW)),
+   SC27XX_ADC_CHANNEL(21, BIT(IIO_CHAN_INFO_PROCESSED)),
+   SC27XX_ADC_CHANNEL(22, BIT(IIO_CHAN_INFO_PROCESSED)),
+   SC27XX_ADC_CHANNEL(23, BIT(IIO_CHAN_INFO_PROCESSED)),
+   SC27XX_ADC_CHANNEL(24, BIT(IIO_CHAN_INFO_PROCESSED)),
+   SC27XX_ADC_CHANNEL(25, BIT(IIO_CHAN_INFO_PROCESSED)),
+   SC27XX_ADC_CHANNEL(26, BIT(IIO_CHAN_INFO_PROCESSED)),
+   SC27XX_ADC_CHANNEL(27, BIT(IIO_CHAN_INFO_PROCESSED)),
+   SC27XX_ADC_CHANNEL(28, BIT(IIO_CHAN_INFO_PROCESSED)),
+   SC27XX_ADC_CHANNEL(29, BIT(IIO_CHAN_INFO_PROCESSED)),
+   SC27XX_ADC_CHANNEL(30, BIT(IIO_CHAN_INFO_PROCESSED)),
+   SC27XX_ADC_CHANNEL(31, BIT(IIO_C

[PATCH 2/2] iio: adc: sc27xx: Add ADC scale calibration

2018-08-24 Thread Baolin Wang

This patch adds support to read calibration values from the eFuse
controller to calibrate the ADC channel scales, which can make ADC
sample data more accurate.

Signed-off-by: Baolin Wang 
---
 .../bindings/iio/adc/sprd,sc27xx-adc.txt   |4 ++
 drivers/iio/adc/sc27xx_adc.c   |   52 ++--
 2 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/iio/adc/sprd,sc27xx-adc.txt 
b/Documentation/devicetree/bindings/iio/adc/sprd,sc27xx-adc.txt
index 8aad960..b4daa15 100644
--- a/Documentation/devicetree/bindings/iio/adc/sprd,sc27xx-adc.txt
+++ b/Documentation/devicetree/bindings/iio/adc/sprd,sc27xx-adc.txt
@@ -12,6 +12,8 @@ Required properties:
 - interrupts: The interrupt number for the ADC device.
 - #io-channel-cells: Number of cells in an IIO specifier.
 - hwlocks: Reference to a phandle of a hwlock provider node.
+- nvmem-cells: A phandle to the calibration cells provided by eFuse device.
+- nvmem-cell-names: Should be "big_scale_calib", "small_scale_calib".
 
 Example:
 
@@ -32,5 +34,7 @@ Example:
interrupts = <0 IRQ_TYPE_LEVEL_HIGH>;
#io-channel-cells = <1>;
hwlocks = <&hwlock 4>;
+   nvmem-cells = <&adc_big_scale>, <&adc_small_scale>;
+   nvmem-cell-names = "big_scale_calib", 
"small_scale_calib";
};
};
diff --git a/drivers/iio/adc/sc27xx_adc.c b/drivers/iio/adc/sc27xx_adc.c
index 153c311..7ac78eda 100644
--- a/drivers/iio/adc/sc27xx_adc.c
+++ b/drivers/iio/adc/sc27xx_adc.c
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -87,16 +88,48 @@ struct sc27xx_adc_linear_graph {
  * should use the small-scale graph, and if more than 1.2v, we should use the
  * big-scale graph.
  */
-static const struct sc27xx_adc_linear_graph big_scale_graph = {
+static struct sc27xx_adc_linear_graph big_scale_graph = {
4200, 3310,
3600, 2832,
 };
 
-static const struct sc27xx_adc_linear_graph small_scale_graph = {
+static struct sc27xx_adc_linear_graph small_scale_graph = {
1000, 3413,
100, 341,
 };
 
+static const struct sc27xx_adc_linear_graph big_scale_graph_calib = {
+   4200, 856,
+   3600, 733,
+};
+
+static const struct sc27xx_adc_linear_graph small_scale_graph_calib = {
+   1000, 833,
+   100, 80,
+};
+
+static int sc27xx_adc_get_calib_data(u32 calib_data, int calib_adc)
+{
+   return ((calib_data & 0xff) + calib_adc - 128) * 4;
+}
+
+static void
+sc27xx_adc_scale_calibration(const struct sc27xx_adc_linear_graph *calib_graph,
+u32 calib_data, bool big_scale)
+{
+   struct sc27xx_adc_linear_graph *graph;
+
+   if (big_scale)
+   graph = &big_scale_graph;
+   else
+   graph = &small_scale_graph;
+
+   /* Only need to calibrate the adc values in the linear graph. */
+   graph->adc0 = sc27xx_adc_get_calib_data(calib_data, calib_graph->adc0);
+   graph->adc1 = sc27xx_adc_get_calib_data(calib_data >> 8,
+   calib_graph->adc1);
+}
+
 static int sc27xx_adc_get_ratio(int channel, int scale)
 {
switch (channel) {
@@ -209,7 +242,7 @@ static void sc27xx_adc_volt_ratio(struct sc27xx_adc_data 
*data,
*div_denominator = ratio & SC27XX_RATIO_DENOMINATOR_MASK;
 }
 
-static int sc27xx_adc_to_volt(const struct sc27xx_adc_linear_graph *graph,
+static int sc27xx_adc_to_volt(struct sc27xx_adc_linear_graph *graph,
  int raw_adc)
 {
int tmp;
@@ -371,6 +404,7 @@ static int sc27xx_adc_write_raw(struct iio_dev *indio_dev,
 
 static int sc27xx_adc_enable(struct sc27xx_adc_data *data)
 {
+   u32 val;
int ret;
 
ret = regmap_update_bits(data->regmap, SC27XX_MODULE_EN,
@@ -390,6 +424,18 @@ static int sc27xx_adc_enable(struct sc27xx_adc_data *data)
if (ret)
goto disable_clk;
 
+   /* ADC channel scales' calibration from nvmem device */
+   ret = nvmem_cell_read_u32(data->dev, "big_scale_calib", &val);
+   if (ret)
+   goto disable_clk;
+
+   sc27xx_adc_scale_calibration(&big_scale_graph_calib, val, true);
+
+   ret = nvmem_cell_read_u32(data->dev, "small_scale_calib", &val);
+   if (ret)
+   goto disable_clk;
+
+   sc27xx_adc_scale_calibration(&small_scale_graph_calib, val, false);
return 0;
 
 disable_clk:
-- 
1.7.9.5

Re: [PATCH v5 1/2] dt-bindings: leds: Add bindings for lm3697 driver

2018-08-24 Thread Pavel Machek

On Fri 2018-08-17 10:15:27, Dan Murphy wrote:
> Add the device tree bindings for the lm3697
> LED driver for backlighting and display.
> 
> Signed-off-by: Dan Murphy 

Acked-by: Pavel Machek 

Some nits are below.

> +The LM3697 11-bit LED driver provides high-
> +performance backlight dimming for 1, 2, or 3 series
> +LED strings while delivering up to 90% efficiency.

LED core is 8-bit only... so full dynamic range can not be currently
used in linux -- right? Is there any plan to change/fix that?

> +This device is suitable for Display and Keypad Lighting

"display and keypad lighting."

> +Optional properties:
> + - enable-gpios : gpio pin to enable/disable the device.

Remove "." at end of sentence, for consistency. "GPIO"?

> +All HVLED strings controlled by control bank A

":"?

> +led-controller@36 {
> + compatible = "ti,lm3967";
> + reg = <0x36>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> +
> + enable-gpios = <&gpio1 28 GPIO_ACTIVE_HIGH>;
> + vled-supply = <&vbatt>;
> +
> + led@0 {
> + reg = <0>;
> + led-sources = <1 1 1>;
> + label = "white:backlight_cluster";
> + linux,default-trigger = "backlight";
> + };
> +}
> +
> +For more product information please see the link below:
> +http://www.ti.com/lit/ds/symlink/lm3697.pdf

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

Re: [PATCH] sched/fair: vruntime should normalize when switching from fair

2018-08-24 Thread Peter Zijlstra

> > On 08/17/2018 11:27 AM, Steve Muckle wrote:

> > > When rt_mutex_setprio changes a task's scheduling class to RT,
> > > we're seeing cases where the task's vruntime is not updated
> > > correctly upon return to the fair class.

> > > Specifically, the following is being observed:
> > > - task is deactivated while still in the fair class
> > > - task is boosted to RT via rt_mutex_setprio, which changes
> > >the task to RT and calls check_class_changed.
> > > - check_class_changed leads to detach_task_cfs_rq, at which point
> > >the vruntime_normalized check sees that the task's state is 
> > > TASK_WAKING,
> > >which results in skipping the subtraction of the rq's min_vruntime
> > >from the task's vruntime
> > > - later, when the prio is deboosted and the task is moved back
> > >to the fair class, the fair rq's min_vruntime is added to
> > >the task's vruntime, even though it wasn't subtracted earlier.

I'm thinking that is an incomplete scenario; where do we get to
TASK_WAKING.

[PATCH] arm64: dts: rockchip: Add idle-states to device tree for rk3399

2018-08-24 Thread Tony Xie

Tony Xie (1):
  arm64: dts: rockchip: Add idle-states to device tree for rk3399

 arch/arm64/boot/dts/rockchip/rk3399.dtsi | 28 
 1 file changed, 28 insertions(+)

-- 
1.9.1

[PATCH] arm64: dts: rockchip: Add idle-states to device tree for rk3399

2018-08-24 Thread Tony Xie

Signed-off-by: Tony Xie 
---
 arch/arm64/boot/dts/rockchip/rk3399.dtsi | 28 
 1 file changed, 28 insertions(+)

diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi 
b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
index e0040b6..49fb57f 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
@@ -111,6 +111,7 @@
#cooling-cells = <2>; /* min followed by max */
clocks = <&cru ARMCLKL>;
dynamic-power-coefficient = <100>;
+   cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>;
};
 
cpu_l1: cpu@1 {
@@ -120,6 +121,7 @@
enable-method = "psci";
clocks = <&cru ARMCLKL>;
dynamic-power-coefficient = <100>;
+   cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>;
};
 
cpu_l2: cpu@2 {
@@ -129,6 +131,7 @@
enable-method = "psci";
clocks = <&cru ARMCLKL>;
dynamic-power-coefficient = <100>;
+   cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>;
};
 
cpu_l3: cpu@3 {
@@ -138,6 +141,7 @@
enable-method = "psci";
clocks = <&cru ARMCLKL>;
dynamic-power-coefficient = <100>;
+   cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>;
};
 
cpu_b0: cpu@100 {
@@ -148,6 +152,7 @@
#cooling-cells = <2>; /* min followed by max */
clocks = <&cru ARMCLKB>;
dynamic-power-coefficient = <436>;
+   cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>;
};
 
cpu_b1: cpu@101 {
@@ -157,6 +162,29 @@
enable-method = "psci";
clocks = <&cru ARMCLKB>;
dynamic-power-coefficient = <436>;
+   cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>;
+   };
+
+   idle-states {
+   entry-method = "psci";
+
+   CPU_SLEEP: cpu-sleep {
+   compatible = "arm,idle-state";
+   local-timer-stop;
+   arm,psci-suspend-param = <0x001>;
+   entry-latency-us = <120>;
+   exit-latency-us = <250>;
+   min-residency-us = <900>;
+   };
+
+   CLUSTER_SLEEP: cluster-sleep {
+   compatible = "arm,idle-state";
+   local-timer-stop;
+   arm,psci-suspend-param = <0x101>;
+   entry-latency-us = <400>;
+   exit-latency-us = <500>;
+   min-residency-us = <2000>;
+   };
};
};
 
-- 
1.9.1

Re: [PATCH v5 2/2] leds: lm3697: Introduce the lm3697 driver

2018-08-24 Thread Pavel Machek

Hi!

> +/**
> + * struct lm3697 -
> + * @enable_gpio - Hardware enable gpio
> + * @regulator - LED supply regulator pointer
> + * @client - Pointer to the I2C client
> + * @regmap - Devices register map
> + * @dev - Pointer to the devices device struct
> + * @lock - Lock for reading/writing the device
> + * @leds - Array of LED strings.
> + */

extra .

> + ret = regmap_write(led->priv->regmap, brt_msb_reg, brt_val);
> + if (ret) {
> + dev_err(&led->priv->client->dev, "Cannot write MSB\n");
> + goto out;
> + }
> +out:

I'd avoid this goto.

> +static int lm3697_set_control_bank(struct lm3697 *priv)
> +{
> + u8 control_bank_config = 0;
> + struct lm3697_led *led;
> + int ret, i;
> +
> + led = &priv->leds[0];
> + if (led->control_bank == LM3697_CONTROL_A)
> + led = &priv->leds[1];

I'd expect CONTROL_A to correspond to leds[0]...?

> + for (i = 0; i < LM3697_MAX_LED_STRINGS; i++) {
> + if (led->hvled_strings[i] == LM3697_HVLED_ASSIGNMENT)
> + control_bank_config |= 1 << i;
> + }

Extra {}s.

> + priv->regulator = devm_regulator_get(&priv->client->dev, "vled");
> + if (IS_ERR(priv->regulator))
> + priv->regulator = NULL;

If vled regulator is specified in dt and _get fails, is it worth a
warning?

Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

Re: [PATCH] Input: elants_i2c - Fix sw reset delays

2018-08-24 Thread Andi Shyti

Hi Derek,

> > > On Thu, Aug 23, 2018 at 04:10:13PM -0700, Derek Basehore wrote:
> > > > We only need to wait 10ms instead of 30ms before starting fastboot or
> > > > sending IAP on the touchscreen. Also, instead of delaying everytime
> > > > sw_reset is called, this delays 10ms in the function that starts
> > > > fastboot. There's also an explicit 20ms delay before sending IAP when
> > > > updating the firmware, so no additional delay is needed there. This
> > > > change also has the benefit of not delaying when wakeup is enabled
> > > > during suspend. This is because sw_reset is called, yet fastboot
> > > > isn't.

...

> > > > -   /*
> > > > -* We should wait at least 10 msec (but no more than 40) before
> > > > -* sending fastboot or IAP command to the device.
> > > > -*/
> > > > -   msleep(30);
> > > > -

moving from 30 to 0 is a bit alarming... what does the datasheet
say?

Sometimes delays are implicit in the system where you are testing
the driver, so that without any msleep it might work in your
system but it might not on others.

> + /*
> +  * We should wait at least 10 msec (but no more than 40) before
> +  * sending IAP command to the device.
> +  */
>   msleep(20);

I agree though that it's not nice to wait twice here (even though
as Dmitry says it doesn't hurt so much). Wouldn't it make more
sense to remove this msleep instead?
This way...

> + /*
> +  * We should wait at least 10 msec (but no more than 40) before sending
> +  * fastboot command to the device.
> +  */
> + usleep_range(10 * 1000, 11 * 1000);
> +
>   error = elants_i2c_send(client, boot_cmd, sizeof(boot_cmd));

... you do not need to add an extra sleep here.

Andi

Re: [PATCH] staging: greybus: Fix null pointer dereference

2018-08-24 Thread Ding Xiang


Hi, Johan

    sorry, it's my fault.


On 8/24/2018 2:29 PM, Johan Hovold wrote:

On Fri, Aug 24, 2018 at 12:07:11AM -0400, Ding Xiang wrote:

If fw is null then fw->size will trigger null pointer dereference

Signed-off-by: Ding Xiang 
---
  drivers/staging/greybus/bootrom.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/greybus/bootrom.c 
b/drivers/staging/greybus/bootrom.c
index e85ffae..3af28a0 100644
--- a/drivers/staging/greybus/bootrom.c
+++ b/drivers/staging/greybus/bootrom.c
@@ -297,7 +297,7 @@ static int gb_bootrom_get_firmware(struct gb_operation *op)
  
  queue_work:

/* Refresh timeout */
-   if (!ret && (offset + size == fw->size))
+   if (!ret && fw && (offset + size == fw->size))
next_request = NEXT_REQ_READY_TO_BOOT;
else
next_request = NEXT_REQ_GET_FIRMWARE;

How could fw be NULL when ret is 0 here?

It may not be as obvious as one might have wished, but the current code
looks correct to me.

Johan

Re: [PATCH v5 1/2] leds: core: Introduce LED pattern trigger

2018-08-24 Thread Pavel Machek

Hi!

> I think that it would be more flexible if software pattern fallback
> was applied in case of pattern_set failure. Otherwise, it would
> lead to the situation where LED class devices that support hardware
> blinking couldn't be applied the same set of patterns as LED class
> devices that don't implement pattern_set. The latter will always have to
> resort to using software pattern engine which will accept far greater
> amount of pattern combinations.
> 
> In this case we need to discuss on what basis the decision will be
> made on whether hardware or software engine will be used.
> 
> Possible options coming to mind:
> - an interface will be provided to determine max difference between
>   the settings supported by the hardware and the settings requested by
>   the user, that will result in aligning user's setting to the hardware
>   capabilities
> - the above alignment rate will be predefined instead
> - hardware engine will be used only if user requests supported settings
>   on the whole span of the requested pattern
> - in each of the above cases it would be worth to think of the
>   interface to show the scope of the settings supported by hardware

I'd recommend keeping it simple. We use hardware engine if driver
author thinks pattern is "close enough".

If human can not tell the difference, it probably is.

We may want to do something more formal later.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

Re: [PATCH] staging: greybus: Fix null pointer dereference

2018-08-24 Thread Dan Carpenter

On Fri, Aug 24, 2018 at 12:07:11AM -0400, Ding Xiang wrote:
> If fw is null then fw->size will trigger null pointer dereference
> 
> Signed-off-by: Ding Xiang 
> ---
>  drivers/staging/greybus/bootrom.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/greybus/bootrom.c 
> b/drivers/staging/greybus/bootrom.c
> index e85ffae..3af28a0 100644
> --- a/drivers/staging/greybus/bootrom.c
> +++ b/drivers/staging/greybus/bootrom.c
> @@ -297,7 +297,7 @@ static int gb_bootrom_get_firmware(struct gb_operation 
> *op)
>  
>  queue_work:
>   /* Refresh timeout */
> - if (!ret && (offset + size == fw->size))
> + if (!ret && fw && (offset + size == fw->size))

That is impossible.  If "ret" is zero that implies "fw" is a valid
pointer.

regards,
dan carpenter

Re: [PATCH v9 21/22] KVM: s390: CPU model support for AP virtualization

2018-08-24 Thread Halil Pasic





On 08/23/2018 07:40 PM, David Hildenbrand wrote:

On 23.08.2018 19:35, Tony Krowiak wrote:

On 08/23/2018 10:59 AM, Pierre Morel wrote:

On 23/08/2018 15:38, David Hildenbrand wrote:

On 23.08.2018 15:22, Halil Pasic wrote:



On 08/23/2018 02:47 PM, Pierre Morel wrote:

On 23/08/2018 13:12, David Hildenbrand wrote:

[..]


I'm confused, which 128 bit?



Me too :) , I was assuming this block to be 128bit, but the qci
block
has 128 bytes

And looking at arch/s390/include/asm/ap.h, there is a lot of
information
contained that is definitely not of interest for CPU models...

I wonder if there is somewhere defined which bits are reserved for
future features/facilities, compared to ap masks and such.

This is really hard to understand/plan without access to
documentation.

You (Halil, Tony, Pier, ...) should have a look if what I described
related to PQAP(QCI) containing features that should get part of
the CPU
model makes sense or not. For now I was thinking that there is
some part
inside of QCI that is strictly reserved for facilities/features
that we
can use.


No there is no such part. The architecture documentation is quite
confusing
with some aspects (e.g. persistence) of how exactly some of these
features
work and are indicated. I'm having a hard time finding my opinion. I
may
end up asking some questions later, but for now i have to think first.

Just one hint. There is a programming note stating that if bit 2 of the
QCI block is one there is at least one AP card in the machine that
actually
has APXA installed.

I read the architecture so that the APXA has a 'cpu part' (if we are
doing APXA the cpu can't spec exception on certain bits not being zor9)
and a 'card(s) part'.

Since the stuff seems quite difficult to sort out properly, I ask
myself
are there real problems we must solve?

This ultimately seems to be  about the migration, right? You say
'This helps
to catch nasty migration bugs (e.g. APXA suddenly disappearing).' at
the very
beginning of the discussion. Yes, we don't have to have an vfio_ap
device,
he guest can and will start looking for AP resources if
only the cpu model features installed. So the guest could observe
a disappearing APXA, but I don't think that would lead to problems
(with
Linux at least).

And there ain't much AP a guest can sanely do without if no AP
resources
are there.

I would really prefer not rushing a solution if we don't have to.






What is apsc, qact, rc8a in the qci blocks? are the facility bits?


Yes, facility bits concerning the AP instructions



According to the current AR document rc8a ain't a facility but bits
0-2 and 4-7 kind of are.



Easy ( :) ) answer. Everything that is the CPU part should get into the
CPU model. Everything that is AP specific not. If APXA is not a CPU
facility, fine with me to leave it out.

Ack to not rushing, but also ack to not leaving out important things.
Ack that this stuff is hard to ficure out.


APXA is not a CPU part, it is a machine part (SIE) and a AP part
(QCI,TAPQ),
it has no influence on CPU instructions but on the AP instructions.
Consequently, if I understood the definition correctly, it should not
go in the CPU model.


The APXA bit returned via the PQAP(QCI) instruction indicates the APXA
facility is
installed in the CPUs of the configuration. This means that the facility is
installed in one or more adjunct processors but not necessarily all.
Given that
it indicates a CPU property, maybe it does belong in the CPU model?



Hmmm, I tend to agree - especially as it affects SIE behavior. But as
this is not a feature block (compared to what I thought), this clould be
model as a CPU feature like AP.



There is certainly a CPU aspect to APXA: before APXA the APQN had to
have zeros in certain bits (otherwise specification exception). When
running with APXA we have a guarantee that there won't be any
specification exception flying because such an bit is set. The interesting
question is, is APXA constant let's say as long as an LPAR partition is
activated?

Regards,
Halil

[PATCH V5 3/8] backlight: qcom-wled: Add new properties for PMI8998

2018-08-24 Thread Kiran Gunda

Update the bindings with the new properties used for
PMI8998.

Signed-off-by: Kiran Gunda 
Reviewed-by: Bjorn Andersson 
Reviewed-by: Rob Herring 
Acked-by: Daniel Thompson 
---
Changes from V3:
- Removed the default values.
- Removed pmi8998 example.

Changes from V4:
- modified qcom,enabled-strings property with decimal numbers.

 .../bindings/leds/backlight/qcom-wled.txt  | 76 ++
 1 file changed, 62 insertions(+), 14 deletions(-)

diff --git a/Documentation/devicetree/bindings/leds/backlight/qcom-wled.txt 
b/Documentation/devicetree/bindings/leds/backlight/qcom-wled.txt
index 14f28f2..9d840d5 100644
--- a/Documentation/devicetree/bindings/leds/backlight/qcom-wled.txt
+++ b/Documentation/devicetree/bindings/leds/backlight/qcom-wled.txt
@@ -20,8 +20,7 @@ platforms. The PMIC is connected to the host processor via 
SPMI bus.
 - default-brightness
Usage:optional
Value type:   
-   Definition:   brightness value on boot, value from: 0-4095
- Default: 2048
+   Definition:   brightness value on boot, value from: 0-4095.
 
 - label
Usage:required
@@ -48,20 +47,24 @@ platforms. The PMIC is connected to the host processor via 
SPMI bus.
 - qcom,current-limit
Usage:optional
Value type:   
-   Definition:   mA; per-string current limit
- value: For pm8941: from 0 to 25 with 5 mA step
-Default 20 mA.
-For pmi8998: from 0 to 30 with 5 mA step
-Default 25 mA.
+   Definition:   mA; per-string current limit; value from 0 to 25 with
+ 1 mA step.
+ This property is supported only for pm8941.
+
+- qcom,current-limit-microamp
+   Usage:optional
+   Value type:   
+   Definition:   uA; per-string current limit; value from 0 to 3 with
+ 2500 uA step.
 
 - qcom,current-boost-limit
Usage:optional
Value type:   
Definition:   mA; boost current limit.
  For pm8941: one of: 105, 385, 525, 805, 980, 1260, 1400,
- 1680. Default: 805 mA
+ 1680.
  For pmi8998: one of: 105, 280, 450, 620, 970, 1150, 1300,
- 1500. Default: 970 mA
+ 1500.
 
 - qcom,switching-freq
Usage:optional
@@ -69,22 +72,66 @@ platforms. The PMIC is connected to the host processor via 
SPMI bus.
 Definition:   kHz; switching frequency; one of: 600, 640, 685, 738,
   800, 872, 960, 1066, 1200, 1371, 1600, 1920, 2400, 3200,
   4800, 9600.
-  Default: for pm8941: 1600 kHz
-   for pmi8998: 800 kHz
 
 - qcom,ovp
Usage:optional
Value type:   
Definition:   V; Over-voltage protection limit; one of:
- 27, 29, 32, 35. default: 29V
+ 27, 29, 32, 35.
  This property is supported only for PM8941.
 
+- qcom,ovp-millivolt
+   Usage:optional
+   Value type:   
+   Definition:   mV; Over-voltage protection limit;
+ For pmi8998: one of 18100, 19600, 29600, 31100
+ If this property is not specified for PM8941, it
+ falls back to "qcom,ovp" property.
+
 - qcom,num-strings
Usage:optional
Value type:   
Definition:   #; number of led strings attached;
- value from 1 to 3. default: 2
- This property is supported only for PM8941.
+ value: For PM8941 from 1 to 3.
+For PMI8998 from 1 to 4.
+
+- interrupts
+   Usage:optional
+   Value type:   
+   Definition:   Interrupts associated with WLED. This should be
+ "short" and "ovp" interrupts. Interrupts can be
+ specified as per the encoding listed under
+ Documentation/devicetree/bindings/spmi/
+ qcom,spmi-pmic-arb.txt.
+
+- interrupt-names
+   Usage:optional
+   Value type:   
+   Definition:   Interrupt names associated with the interrupts.
+ Must be "short" and "ovp". The short circuit detection
+ is not supported for PM8941.
+
+- qcom,enabled-strings
+   Usage:optional
+   Value tyoe:   
+   Definition:   Array of the WLED strings numbered from 0 to 3. Each
+ string of leds are operated individually. Specify the
+ list of strings used by the device. Any combination of
+ led strings can be used.
+
+- qcom,external-pfet
+   Usage:optional
+   Value type:   
+   Definition:   Specify if external PFET control for short circ

[PATCH V5 2/8] backlight: qcom-wled: restructure the qcom-wled bindings

2018-08-24 Thread Kiran Gunda

Restructure the qcom-wled bindings for the better readability.

Signed-off-by: Kiran Gunda 
Reviewed-by: Bjorn Andersson 
Reviewed-by: Rob Herring 
Acked-by: Daniel Thompson 
---
Changes from V3:
Added Reviewed-by and Acked-by tags.

Changes from V4:
None

 .../bindings/leds/backlight/qcom-wled.txt  | 110 -
 1 file changed, 85 insertions(+), 25 deletions(-)

diff --git a/Documentation/devicetree/bindings/leds/backlight/qcom-wled.txt 
b/Documentation/devicetree/bindings/leds/backlight/qcom-wled.txt
index fb39e32..14f28f2 100644
--- a/Documentation/devicetree/bindings/leds/backlight/qcom-wled.txt
+++ b/Documentation/devicetree/bindings/leds/backlight/qcom-wled.txt
@@ -1,30 +1,90 @@
 Binding for Qualcomm Technologies, Inc. WLED driver
 
-Required properties:
-- compatible: should be "qcom,pm8941-wled"
-- reg: slave address
-
-Optional properties:
-- default-brightness: brightness value on boot, value from: 0-4095
-   default: 2048
-- label: The name of the backlight device
-- qcom,cs-out: bool; enable current sink output
-- qcom,cabc: bool; enable content adaptive backlight control
-- qcom,ext-gen: bool; use externally generated modulator signal to dim
-- qcom,current-limit: mA; per-string current limit; value from 0 to 25
-   default: 20mA
-- qcom,current-boost-limit: mA; boost current limit; one of:
-   105, 385, 525, 805, 980, 1260, 1400, 1680
-   default: 805mA
-- qcom,switching-freq: kHz; switching frequency; one of:
-   600, 640, 685, 738, 800, 872, 960, 1066, 1200, 1371,
-   1600, 1920, 2400, 3200, 4800, 9600,
-   default: 1600kHz
-- qcom,ovp: V; Over-voltage protection limit; one of:
-   27, 29, 32, 35
-   default: 29V
-- qcom,num-strings: #; number of led strings attached; value from 1 to 3
-   default: 2
+WLED (White Light Emitting Diode) driver is used for controlling display
+backlight that is part of PMIC on Qualcomm Technologies, Inc. reference
+platforms. The PMIC is connected to the host processor via SPMI bus.
+
+- compatible
+   Usage:required
+   Value type:   
+   Definition:   should be one of:
+   "qcom,pm8941-wled"
+   "qcom,pmi8998-wled"
+   "qcom,pm660l-wled"
+
+- reg
+   Usage:required
+   Value type:   
+   Definition:   Base address of the WLED modules.
+
+- default-brightness
+   Usage:optional
+   Value type:   
+   Definition:   brightness value on boot, value from: 0-4095
+ Default: 2048
+
+- label
+   Usage:required
+   Value type:   
+   Definition:   The name of the backlight device
+
+- qcom,cs-out
+   Usage:optional
+   Value type:   
+   Definition:   enable current sink output.
+ This property is supported only for PM8941.
+
+- qcom,cabc
+   Usage:optional
+   Value type:   
+   Definition:   enable content adaptive backlight control.
+
+- qcom,ext-gen
+   Usage:optional
+   Value type:   
+   Definition:   use externally generated modulator signal to dim.
+ This property is supported only for PM8941.
+
+- qcom,current-limit
+   Usage:optional
+   Value type:   
+   Definition:   mA; per-string current limit
+ value: For pm8941: from 0 to 25 with 5 mA step
+Default 20 mA.
+For pmi8998: from 0 to 30 with 5 mA step
+Default 25 mA.
+
+- qcom,current-boost-limit
+   Usage:optional
+   Value type:   
+   Definition:   mA; boost current limit.
+ For pm8941: one of: 105, 385, 525, 805, 980, 1260, 1400,
+ 1680. Default: 805 mA
+ For pmi8998: one of: 105, 280, 450, 620, 970, 1150, 1300,
+ 1500. Default: 970 mA
+
+- qcom,switching-freq
+   Usage:optional
+   Value type:   
+Definition:   kHz; switching frequency; one of: 600, 640, 685, 738,
+  800, 872, 960, 1066, 1200, 1371, 1600, 1920, 2400, 3200,
+  4800, 9600.
+  Default: for pm8941: 1600 kHz
+   for pmi8998: 800 kHz
+
+- qcom,ovp
+   Usage:optional
+   Value type:   
+   Definition:   V; Over-voltage protection limit; one of:
+ 27, 29, 32, 35. default: 29V
+ This property is supported only for PM8941.
+
+- qcom,num-strings
+   Usage:optional
+   Value type:   
+   Definition:   #; number of led strings attached;
+ value from 1 to 3. default: 2
+ This property is supported only for PM8941.
 
 Example:
 
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
 a Linux Foundation Collaborative Project

[PATCH V5 7/8] backlight: qcom-wled: add support for short circuit handling

2018-08-24 Thread Kiran Gunda

Handle the short circuit interrupt and check if the short circuit
interrupt is valid. Re-enable the module to check if it goes
away. Disable the module altogether if the short circuit event
persists.

Signed-off-by: Kiran Gunda 
Reviewed-by: Bjorn Andersson 
---
Changes from V3:
- Added Reviewed by tag.
- Addressed minor comments from Vinod

Changes from V4:
- Changed the return value from -EINVAL to -ENXIO
- Re-initializing the short_count from 0 to 1.

 drivers/video/backlight/qcom-wled.c | 132 ++--
 1 file changed, 128 insertions(+), 4 deletions(-)

diff --git a/drivers/video/backlight/qcom-wled.c 
b/drivers/video/backlight/qcom-wled.c
index 49fdd23..d891067 100644
--- a/drivers/video/backlight/qcom-wled.c
+++ b/drivers/video/backlight/qcom-wled.c
@@ -10,6 +10,9 @@
  * GNU General Public License for more details.
  */
 
+#include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -64,6 +67,16 @@
 #define WLED3_SINK_REG_STR_CABC(n) (0x66 + (n * 0x10))
 #define  WLED3_SINK_REG_STR_CABC_MASK  BIT(7)
 
+/* WLED4 specific control registers */
+#define WLED4_CTRL_REG_SHORT_PROTECT   0x5e
+#define  WLED4_CTRL_REG_SHORT_EN_MASK  BIT(7)
+
+#define WLED4_CTRL_REG_SEC_ACCESS  0xd0
+#define  WLED4_CTRL_REG_SEC_UNLOCK 0xa5
+
+#define WLED4_CTRL_REG_TEST1   0xe2
+#define  WLED4_CTRL_REG_TEST1_EXT_FET_DTEST2   0x09
+
 /* WLED4 specific sink registers */
 #define WLED4_SINK_REG_CURR_SINK   0x46
 #define  WLED4_SINK_REG_CURR_SINK_MASK GENMASK(7, 4)
@@ -113,17 +126,23 @@ struct wled_config {
bool cs_out_en;
bool ext_gen;
bool cabc;
+   bool external_pfet;
 };
 
 struct wled {
const char *name;
struct device *dev;
struct regmap *regmap;
+   struct mutex lock;  /* Lock to avoid race from thread irq handler */
+   ktime_t last_short_event;
u16 ctrl_addr;
u16 sink_addr;
u16 max_string_count;
u32 brightness;
u32 max_brightness;
+   u32 short_count;
+   bool disabled_by_short;
+   bool has_short_detect;
 
struct wled_config cfg;
int (*wled_set_brightness)(struct wled *wled, u16 brightness);
@@ -174,6 +193,9 @@ static int wled_module_enable(struct wled *wled, int val)
 {
int rc;
 
+   if (wled->disabled_by_short)
+   return -ENXIO;
+
rc = regmap_update_bits(wled->regmap, wled->ctrl_addr +
WLED_CTRL_REG_MOD_EN,
WLED_CTRL_REG_MOD_EN_MASK,
@@ -210,18 +232,19 @@ static int wled_update_status(struct backlight_device *bl)
bl->props.state & BL_CORE_FBBLANK)
brightness = 0;
 
+   mutex_lock(&wled->lock);
if (brightness) {
rc = wled->wled_set_brightness(wled, brightness);
if (rc < 0) {
dev_err(wled->dev, "wled failed to set brightness 
rc:%d\n",
rc);
-   return rc;
+   goto unlock_mutex;
}
 
rc = wled_sync_toggle(wled);
if (rc < 0) {
dev_err(wled->dev, "wled sync failed rc:%d\n", rc);
-   return rc;
+   goto unlock_mutex;
}
}
 
@@ -229,15 +252,61 @@ static int wled_update_status(struct backlight_device *bl)
rc = wled_module_enable(wled, !!brightness);
if (rc < 0) {
dev_err(wled->dev, "wled enable failed rc:%d\n", rc);
-   return rc;
+   goto unlock_mutex;
}
}
 
wled->brightness = brightness;
 
+unlock_mutex:
+   mutex_unlock(&wled->lock);
+
return rc;
 }
 
+#define WLED_SHORT_DLY_MS  20
+#define WLED_SHORT_CNT_MAX 5
+#define WLED_SHORT_RESET_CNT_DLY_USUSEC_PER_SEC
+
+static irqreturn_t wled_short_irq_handler(int irq, void *_wled)
+{
+   struct wled *wled = _wled;
+   int rc;
+   s64 elapsed_time;
+
+   wled->short_count++;
+   mutex_lock(&wled->lock);
+   rc = wled_module_enable(wled, false);
+   if (rc < 0) {
+   dev_err(wled->dev, "wled disable failed rc:%d\n", rc);
+   goto unlock_mutex;
+   }
+
+   elapsed_time = ktime_us_delta(ktime_get(),
+ wled->last_short_event);
+   if (elapsed_time > WLED_SHORT_RESET_CNT_DLY_US)
+   wled->short_count = 1;
+
+   if (wled->short_count > WLED_SHORT_CNT_MAX) {
+   dev_err(wled->dev, "Short trigged %d times, disabling WLED 
forever!\n",
+   wled->short_count);
+   wled->disabled_by_short = true;
+

[PATCH] tpm: factor out TPM 1.x duration calculation to tpm1-cmd.c

2018-08-24 Thread Jarkko Sakkinen

From: Tomas Winkler 

Factor out TPM 1.x commands calculation into tpm1-cmd.c file and change
the prefix from "tpm_" to "tpm1_". No functional changes are done here.

Signed-off-by: Tomas Winkler 
Reviewed-by: Jarkko Sakkinen 
---
Applied Tomas' patch with "patch -p1 -u", added SPDIX header and fixed
some minor typos in the commit message. Better to apply this first
before backporting rest of the patches. Other than those updates, the
commit should be unchanged from the original.

The associated patch sets:
* https://lkml.org/lkml/2018/3/6/147
* https://lkml.org/lkml/2018/3/10/45
 drivers/char/tpm/Makefile|   2 +-
 drivers/char/tpm/st33zp24/st33zp24.c |   2 +-
 drivers/char/tpm/tpm-interface.c | 284 +-
 drivers/char/tpm/tpm.h   |   2 +-
 drivers/char/tpm/tpm1-cmd.c  | 294 +++
 drivers/char/tpm/tpm_i2c_nuvoton.c   |  10 +-
 drivers/char/tpm/tpm_tis_core.c  |   2 +-
 drivers/char/tpm/xen-tpmfront.c  |   2 +-
 8 files changed, 306 insertions(+), 292 deletions(-)
 create mode 100644 drivers/char/tpm/tpm1-cmd.c

diff --git a/drivers/char/tpm/Makefile b/drivers/char/tpm/Makefile
index 4e9c33ca1f8f..fd3f12847e86 100644
--- a/drivers/char/tpm/Makefile
+++ b/drivers/char/tpm/Makefile
@@ -3,7 +3,7 @@
 # Makefile for the kernel tpm device drivers.
 #
 obj-$(CONFIG_TCG_TPM) += tpm.o
-tpm-y := tpm-interface.o tpm-dev.o tpm-sysfs.o tpm-chip.o tpm2-cmd.o \
+tpm-y := tpm-interface.o tpm-dev.o tpm-sysfs.o tpm-chip.o tpm1-cmd.o 
tpm2-cmd.o \
 tpm-dev-common.o tpmrm-dev.o eventlog/common.o eventlog/tpm1.o \
 eventlog/tpm2.o tpm2-space.o
 tpm-$(CONFIG_ACPI) += tpm_ppi.o eventlog/acpi.o
diff --git a/drivers/char/tpm/st33zp24/st33zp24.c 
b/drivers/char/tpm/st33zp24/st33zp24.c
index abd675bec88c..16be974955ea 100644
--- a/drivers/char/tpm/st33zp24/st33zp24.c
+++ b/drivers/char/tpm/st33zp24/st33zp24.c
@@ -430,7 +430,7 @@ static int st33zp24_send(struct tpm_chip *chip, unsigned 
char *buf,
ordinal = be32_to_cpu(*((__be32 *) (buf + 6)));
 
ret = wait_for_stat(chip, TPM_STS_DATA_AVAIL | TPM_STS_VALID,
-   tpm_calc_ordinal_duration(chip, ordinal),
+   tpm1_calc_ordinal_duration(chip, ordinal),
&tpm_dev->read_queue, false);
if (ret < 0)
goto out_err;
diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
index 1a803b0cf980..7c37bda68cab 100644
--- a/drivers/char/tpm/tpm-interface.c
+++ b/drivers/char/tpm/tpm-interface.c
@@ -33,7 +33,6 @@
 
 #include "tpm.h"
 
-#define TPM_MAX_ORDINAL 243
 #define TSC_MAX_ORDINAL 12
 #define TPM_PROTECTED_COMMAND 0x00
 #define TPM_CONNECTION_COMMAND 0x40
@@ -48,285 +47,6 @@ module_param_named(suspend_pcr, tpm_suspend_pcr, uint, 
0644);
 MODULE_PARM_DESC(suspend_pcr,
 "PCR to use for dummy writes to facilitate flush on suspend.");
 
-/*
- * Array with one entry per ordinal defining the maximum amount
- * of time the chip could take to return the result.  The ordinal
- * designation of short, medium or long is defined in a table in
- * TCG Specification TPM Main Part 2 TPM Structures Section 17. The
- * values of the SHORT, MEDIUM, and LONG durations are retrieved
- * from the chip during initialization with a call to tpm_get_timeouts.
- */
-static const u8 tpm_ordinal_duration[TPM_MAX_ORDINAL] = {
-   TPM_UNDEFINED,  /* 0 */
-   TPM_UNDEFINED,
-   TPM_UNDEFINED,
-   TPM_UNDEFINED,
-   TPM_UNDEFINED,
-   TPM_UNDEFINED,  /* 5 */
-   TPM_UNDEFINED,
-   TPM_UNDEFINED,
-   TPM_UNDEFINED,
-   TPM_UNDEFINED,
-   TPM_SHORT,  /* 10 */
-   TPM_SHORT,
-   TPM_MEDIUM,
-   TPM_LONG,
-   TPM_LONG,
-   TPM_MEDIUM, /* 15 */
-   TPM_SHORT,
-   TPM_SHORT,
-   TPM_MEDIUM,
-   TPM_LONG,
-   TPM_SHORT,  /* 20 */
-   TPM_SHORT,
-   TPM_MEDIUM,
-   TPM_MEDIUM,
-   TPM_MEDIUM,
-   TPM_SHORT,  /* 25 */
-   TPM_SHORT,
-   TPM_MEDIUM,
-   TPM_SHORT,
-   TPM_SHORT,
-   TPM_MEDIUM, /* 30 */
-   TPM_LONG,
-   TPM_MEDIUM,
-   TPM_SHORT,
-   TPM_SHORT,
-   TPM_SHORT,  /* 35 */
-   TPM_MEDIUM,
-   TPM_MEDIUM,
-   TPM_UNDEFINED,
-   TPM_UNDEFINED,
-   TPM_MEDIUM, /* 40 */
-   TPM_LONG,
-   TPM_MEDIUM,
-   TPM_SHORT,
-   TPM_SHORT,
-   TPM_SHORT,  /* 45 */
-   TPM_SHORT,
-   TPM_SHORT,
-   TPM_SHORT,
-   TPM_LONG,
-   TPM_MEDIUM, /* 50 */
-   TPM_MEDIUM,
-   TPM_UNDEFINED,
-   TPM_UNDEFINED,
-   TPM_UNDEFINED,
-   TPM_UNDEFINED,  /* 55 */
-   TPM_UNDEFINED,
-   TPM_UNDEFINED,
-   TPM_UNDEFINED,
-   TPM_UNDEFINED,
-   TPM_MEDIUM, /* 60 */
-   TPM_MED

Re: [PATCH] x86/speculation/l1tf: suggest what to do on systems with too much RAM

2018-08-24 Thread Vlastimil Babka

On 08/24/2018 09:32 AM, Vlastimil Babka wrote:
> On 08/23/2018 09:27 PM, Michal Hocko wrote:
>> On Thu 23-08-18 16:28:12, Vlastimil Babka wrote:
>>> Two users have reported [1] that they have an "extremely unlikely" system
>>> with more than MAX_PA/2 memory and L1TF mitigation is not effective. Let's
>>> make the warning more helpful by suggesting the proper mem=X kernel boot 
>>> param,
>>> a rough calculation of how much RAM can be lost (not precise if there's 
>>> holes
>>> between MAX_PA/2 and max_pfn in the e820 map) and a link to the L1TF 
>>> document
>>> to help decide if the mitigation is worth the unusable RAM.
>>>
>>> [1] https://bugzilla.suse.com/show_bug.cgi?id=1105536
>>>
>>> Suggested-by: Michal Hocko 
>>> Cc: sta...@vger.kernel.org
>>> Signed-off-by: Vlastimil Babka 
>>
>> I wouldn't bother with max_pfn-half_pa part but other than that this is
>> much more useful than the original message.
> 
> Right, and it causes build failures on some configs.
> 
>> Acked-by: Michal Hocko 
> 
> Thanks! Here's a v2:

Just realized that kvm printk's refer to the online version at
https://www.kernel.org/doc/html/latest/admin-guide/l1tf.html
which should be easier for the users of distro kernels, should I change
that?

Re: [PATCH v8 07/26] PM / Domains: Add genpd governor for CPUs

2018-08-24 Thread Lorenzo Pieralisi

On Fri, Aug 24, 2018 at 11:26:19AM +0200, Ulf Hansson wrote:

[...]

> > That's a good question and it maybe gives a path towards a solution.
> >
> > AFAICS the genPD governor only selects the idle state parameter that
> > determines the idle state at, say, GenPD cpumask level it does not touch
> > the CPUidle decision, that works on a subset of idle states (at cpu
> > level).
> >
> > That's my understanding, which can be wrong so please correct me
> > if that's the case because that's a bit confusing.
> >
> > Let's imagine that we flattened out the list of idle states and feed
> > CPUidle with it (all of them - cpu, cluster, package, system - as it is
> > in the mainline _now_). Then the GenPD governor can run-through the
> > CPUidle selection and _demote_ the idle state if necessary since it
> > understands that some CPUs in the GenPD will wake up shortly and break
> > the target residency hyphothesis the CPUidle governor is expecting.
> >
> > The whole idea about this series is improving CPUidle decision when
> > the target idle state is _shared_ among groups of cpus (again, please
> > do correct me if I am wrong).
> 
> Absolutely, this is one of the main reason for the series!
> 
> >
> > It is obvious that a GenPD governor must only demote - never promote a
> > CPU idle state selection given that hierarchy implies more power
> > savings and higher target residencies required.
> 
> Absolutely. I apologize if I have been using the word "promote"
> wrongly, I realize it may be a bit confusing.
> 
> >
> > This whole series would become more generic and won't depend on
> > PSCI OSI at all - actually that would become a hierarchical
> > CPUidle governor.
> 
> Well, to me we need a first user of the new infrastructure code in
> genpd and PSCI is probably the easiest one to start with. An option
> would be to start with an old ARM32 platform, but it seems a bit silly
> to me.

If the code can be structured as described above as a hierarchical
(possibly optional through a Kconfig entry or sysfs tuning) idle
decision you can apply it to _any_ PSCI based platform out there,
provided that the new governor improves power savings.

> In regards to OS-initiated mode vs platform coordinated mode, let's
> discuss that in details in the other email thread instead.

I think that's crystal clear by now that IMHO PSCI OS-initiated mode is
a red-herring, it has nothing to do with this series, it is there just
because QC firmware does not support PSCI platform coordinated suspend
mode.

You can apply the concept in this series to _any_ arch provided
the power domains representation is correct (and again, I would sound
like a broken record but the series must improve power savings over
vanilla CPUidle menu governor).

> > I still think that PSCI firmware and most certainly mwait() play the
> > role the GenPD governor does since they can detect in FW/HW whether
> > that's worthwhile to switch off a domain, the information is obviously
> > there and the kernel would just add latency to the idle path in that
> > case but let's gloss over this for the sake of this discussion.
> 
> Yep, let's discuss that separately.
> 
> That said, can I interpret your comments on the series up until this
> change, that you seems rather happy with where the series is going?

It is something we have been discussing with Daniel since generic idle
was merged for Arm a long while back. I have nothing against describing
idle states with power domains but it must improve idle decisions
against the mainline. As I said before, runtime PM can also be used
to get rid of CPU PM notifiers (because with power domains we KNOW
what devices eg PMU are switched off on idle entry, we do not guess
any longer; replacing CPU PM notifiers is challenging and can be
tackled - if required - in a different series).

Bottom line (talk is cheap, I know and apologise about that): this
series (up until this change) adds complexity to the idle path and lots
of code; if its usage is made optional and can be switched on on systems
where it saves power that's fine by me as long as we keep PSCI
OS-initiated idle states out of the equation, that's an orthogonal
discussion as, I hope, I managed to convey.

Thanks,
Lorenzo

Re: [PATCH v1 1/1] Bluetooth: hci_qca: Add poweroff support during hci down for wcn3990

2018-08-24 Thread Balakrishna Godavarthi


Hi Stephen,

On 2018-08-24 12:17, Stephen Boyd wrote:

Quoting Balakrishna Godavarthi (2018-08-23 04:29:35)

This patch enables power off support for hci down and power on support
for hci up. As wcn3990 power sources are ignited by regulators, we 
will

turn off them during hci down, i.e. an complete power off of wcn3990.
So while hci up, we will call vendor specific open/close and setup 
which

will turn on the regulators, requests BT chip version and download the
firmware.

Signed-off-by: Balakrishna Godavarthi 
---
 drivers/bluetooth/hci_qca.c | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
index e182f6019f68..98d33c6b8909 100644
--- a/drivers/bluetooth/hci_qca.c
+++ b/drivers/bluetooth/hci_qca.c
@@ -595,6 +595,9 @@ static int qca_close(struct hci_uart *hu)
struct qca_serdev *qcadev;
struct qca_data *qca = hu->priv;

+   if (!qca)
+   return 0;


Does this happen? If it does it seems like a failure in the caller to
know what's going on.


[Bala]: yes qca_close() function will execute twice i.e. when we remove 
the BT module.


while we remove the module,hci_dev_doc_close() will call 
hdev->close() i.e. hci_uart_close() which is Qualcomm specific close.
in hci_uart_close() we will call qca_close() which will free the 
memory.

after that proto close will also call qca_close().
Here hci_uart_close and proto close are assigned to same 
function pointer i.e. qca_close().


--
Regards
Balakrishna.

[PATCH V3] spi: spi-geni-qcom: Add SPI driver support for GENI based QUP

2018-08-24 Thread Dilip Kota

From: Girish Mahadevan 

This driver supports GENI based SPI Controller in the Qualcomm SOCs. The
Qualcomm Generic Interface (GENI) is a programmable module supporting a
wide range of serial interfaces including SPI. This driver supports SPI
operations using FIFO mode of transfer.

Signed-off-by: Girish Mahadevan 
Signed-off-by: Dilip Kota 
---
Addressing all the reviewer commets given in Patchset1.
Summerizing all the comments below:

MAKEFILE: Arrange SPI-GENI driver in alphabetical order
Kconfig: Mark SPI_GENI driver dependent on QCOM_GENI_SE
Enable SPI core auto runtime pm, and remove runtime pm calls.
Remove spi_geni_unprepare_message(), 
spi_geni_unprepare_transfer_hardware()
Remove likely/unlikely keywords.
Remove get_spi_master() and use dev_get_drvdata()
Move request_irq to probe()
Mark bus number assignment to -1 as SPI core framework will assign 
dynamically
Use devm_spi_register_master()
Include platform_device.h instead of of_platform.h
Removing macros which are used only once:
#define SPI_NUM_CHIPSELECT 4
#define SPI_XFER_TIMEOUT_MS250
Place Register field definitions next to respective Register 
definitions.
Replace int and u32 declerations to unsigned int.
Remove Hex numbers in debug prints.
Declare mode as u16 in spi_setup_word_len()
Remove the labels: setup_fifo_params_exit: 
exit_prepare_transfer_hardware:
Declaring struct spi_master as spi everywhere in the file.
Calling spi_finalize_current_transfer() for end of transfer.
Hard code the SPI controller max frequency instead of reading from DTSI 
node.
Spinlock not required, removed it.
Removed unrequired error prints.
Fix KASAN error in geni_spi_isr().
Remove spi-geni-qcom.h
Remove inter words delay and CS to Clock toggle delay logic in the 
driver, as of now no clients are using it.
Will submit this logic in the next patchset.
Use major, minor and step macros to read from hardware version register.

 .../devicetree/bindings/soc/qcom/qcom,geni-se.txt  |   2 -
 drivers/spi/Kconfig|  12 +
 drivers/spi/Makefile   |   1 +
 drivers/spi/spi-geni-qcom.c| 678 +
 4 files changed, 691 insertions(+), 2 deletions(-)
 create mode 100644 drivers/spi/spi-geni-qcom.c

diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,geni-se.txt 
b/Documentation/devicetree/bindings/soc/qcom/qcom,geni-se.txt
index 68b7d62..16467ed 100644
--- a/Documentation/devicetree/bindings/soc/qcom/qcom,geni-se.txt
+++ b/Documentation/devicetree/bindings/soc/qcom/qcom,geni-se.txt
@@ -60,7 +60,6 @@ Required properties:
 - interrupts:  Must contain SPI controller interrupts.
 - clock-names: Must contain "se".
 - clocks:  Serial engine core clock needed by the device.
-- spi-max-frequency:   Specifies maximum SPI clock frequency, units - Hz.
 - #address-cells:  Must be <1> to define a chip select address on
the SPI bus.
 - #size-cells: Must be <0>.
@@ -112,7 +111,6 @@ Example:
pinctrl-names = "default", "sleep";
pinctrl-0 = <&qup_1_spi_2_active>;
pinctrl-1 = <&qup_1_spi_2_sleep>;
-   spi-max-frequency = <1920>;
#address-cells = <1>;
#size-cells = <0>;
};
diff --git a/drivers/spi/Kconfig b/drivers/spi/Kconfig
index ad5d68e..4f7f86f 100644
--- a/drivers/spi/Kconfig
+++ b/drivers/spi/Kconfig
@@ -533,6 +533,18 @@ config SPI_QUP
  This driver can also be built as a module.  If so, the module
  will be called spi_qup.
 
+config SPI_QCOM_GENI
+   tristate "Qualcomm SPI controller with QUP interface"
+   depends on QCOM_GENI_SE
+   help
+ This driver supports GENI serial engine based SPI controller in
+ master mode on the Qualcomm Technologies Inc.'s SoCs. If you say
+ yes to this option, support will be included for the SPI interface
+ on the Qualcomm Technologies Inc.'s SoCs.
+
+ This driver can also be built as a module.  If so, the module
+ will be called spi-geni-qcom.
+
 config SPI_S3C24XX
tristate "Samsung S3C24XX series SPI"
depends on ARCH_S3C24XX
diff --git a/drivers/spi/Makefile b/drivers/spi/Makefile
index cb1f437..98337cf 100644
--- a/drivers/spi/Makefile
+++ b/drivers/spi/Makefile
@@ -74,6 +74,7 @@ obj-$(CONFIG_SPI_PPC4xx)  += spi-ppc4xx.o
 spi-pxa2xx-platform-objs   := spi-pxa2xx.o spi-pxa2xx-dma.o
 obj-$(CONFIG_SPI_PXA2XX)   += spi-pxa2xx-platform.o
 obj-$(CONFIG_SPI_PXA2XX_PCI)   += spi-pxa2xx-pci.o
+obj-$(CONFIG_SPI_QCOM_GENI)+= spi-geni-qcom.o
 obj-$(

[RFC PATCH 12/20] x86/intel_rdt: Correct the closid when staging configuration changes

2018-08-24 Thread James Morse

Now that apply_config() and update_domains() know the code/data/both value
of what they are writing, and ctrl_val is correctly sized: use the
hardware closid slot, based on the configuration type.

This means cbm_idx() and its illusionary cache-properties can go.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/intel_rdt.c | 18 +---
 arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c | 32 ++---
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c|  2 +-
 include/linux/resctrl.h |  6 ++--
 4 files changed, 25 insertions(+), 33 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 8d3544b6c149..6466c172c045 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -75,8 +75,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.cache_level= 3,
.cache = {
.min_cbm_bits   = 1,
-   .cbm_idx_mult   = 1,
-   .cbm_idx_offset = 0,
},
.domains= domain_init(RDT_RESOURCE_L3),
.parse_ctrlval  = parse_cbm,
@@ -95,8 +93,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.cache_level= 3,
.cache = {
.min_cbm_bits   = 1,
-   .cbm_idx_mult   = 2,
-   .cbm_idx_offset = 0,
},
.domains= 
domain_init(RDT_RESOURCE_L3DATA),
.parse_ctrlval  = parse_cbm,
@@ -116,8 +112,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.cache_level= 3,
.cache = {
.min_cbm_bits   = 1,
-   .cbm_idx_mult   = 2,
-   .cbm_idx_offset = 1,
},
.domains= 
domain_init(RDT_RESOURCE_L3CODE),
.parse_ctrlval  = parse_cbm,
@@ -136,8 +130,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.cache_level= 2,
.cache = {
.min_cbm_bits   = 1,
-   .cbm_idx_mult   = 1,
-   .cbm_idx_offset = 0,
},
.domains= domain_init(RDT_RESOURCE_L2),
.parse_ctrlval  = parse_cbm,
@@ -156,8 +148,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.cache_level= 2,
.cache = {
.min_cbm_bits   = 1,
-   .cbm_idx_mult   = 2,
-   .cbm_idx_offset = 0,
},
.domains= 
domain_init(RDT_RESOURCE_L2DATA),
.parse_ctrlval  = parse_cbm,
@@ -176,8 +166,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.cache_level= 2,
.cache = {
.min_cbm_bits   = 1,
-   .cbm_idx_mult   = 2,
-   .cbm_idx_offset = 1,
},
.domains= 
domain_init(RDT_RESOURCE_L2CODE),
.parse_ctrlval  = parse_cbm,
@@ -204,10 +192,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
},
 };
 
-static unsigned int cbm_idx(struct rdt_resource *r, unsigned int closid)
-{
-   return closid * r->cache.cbm_idx_mult + r->cache.cbm_idx_offset;
-}
 
 /*
  * cache_alloc_hsw_probe() - Have to probe for Intel haswell server CPUs
@@ -408,7 +392,7 @@ cat_wrmsr(struct rdt_domain *d, struct msr_param *m, struct 
rdt_resource *r)
struct rdt_hw_resource *hw_res = resctrl_to_rdt(r);
 
for (i = m->low; i < m->high; i++)
-   wrmsrl(hw_res->msr_base + cbm_idx(r, i), hw_dom->ctrl_val[i]);
+   wrmsrl(hw_res->msr_base + i, hw_dom->ctrl_val[i]);
 }
 
 struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r)
diff --git a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c 
b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
index bab6032704c3..05c14d9f797c 100644
--- a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
+++ b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
@@ -28,6 +28,16 @@
 #include 
 #include "intel_rdt.h"
 
+static u32 resctrl_closid_cdp_map(u32 closid, enum resctrl_conf_type t)
+{
+   if (t == CDP_CODE)
+   return (closid * 2) + 1;
+   else if (t == CDP_DATA)
+   return (closid * 2);
+   else
+   return closid;
+

[RFC PATCH 13/20] x86/intel_rdt: Allow different CODE/DATA configurations to be staged

2018-08-24 Thread James Morse

Now that the staged configuration holds its CDP type and hardware
closid, allow resctrl to stage more than configuration at a time for
a single resource.

To detect the same schema being specified twice when the schemata file
is written, the same slot in the staged_configuration array must be
used for each schema. Use the cdp_type enum directly as an index.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c | 4 ++--
 include/linux/resctrl.h | 4 +++-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c 
b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
index 05c14d9f797c..f80a838cc36d 100644
--- a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
+++ b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
@@ -78,7 +78,7 @@ int parse_bw(char *buf, struct rdt_resource *r, struct 
rdt_domain *d,
 enum resctrl_conf_type t, u32 closid)
 {
unsigned long data;
-   struct resctrl_staged_config *cfg = &d->staged_config[0];
+   struct resctrl_staged_config *cfg = &d->staged_config[t];
 
if (cfg->have_new_ctrl) {
rdt_last_cmd_printf("duplicate domain %d\n", d->id);
@@ -144,7 +144,7 @@ int parse_cbm(char *buf, struct rdt_resource *r, struct 
rdt_domain *d,
  enum resctrl_conf_type t, u32 closid)
 {
unsigned long data;
-   struct resctrl_staged_config *cfg = &d->staged_config[0];
+   struct resctrl_staged_config *cfg = &d->staged_config[t];
 
if (cfg->have_new_ctrl) {
rdt_last_cmd_printf("duplicate domain %d\n", d->id);
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index dad266f9b0fe..ede5c40756b4 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -12,6 +12,8 @@ enum resctrl_conf_type {
CDP_CODE,
CDP_DATA,
 };
+#define NUM_CDP_TYPES  CDP_DATA + 1
+
 
 /**
  * struct resctrl_staged_config - parsed configuration to be applied
@@ -39,7 +41,7 @@ struct rdt_domain {
int id;
struct cpumask  cpu_mask;
 
-   struct resctrl_staged_configstaged_config[1];
+   struct resctrl_staged_configstaged_config[NUM_CDP_TYPES];
 };
 
 /**
-- 
2.18.0

[RFC PATCH 02/20] x86/intel_rdt: Split struct rdt_domain

2018-08-24 Thread James Morse

resctrl is the defacto Linux ABI for SoC resource partitioning features.
To support it on another architecture, we need to abstract it from
Intel RDT, and move it to /fs/.

Split struct rdt_domain up too. Move everything that that is particular
to resctrl into a new header file. resctrl code paths touching a 'hw'
struct indicates where an abstraction is needed.

No change in behaviour, this patch just moves types around.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/intel_rdt.c | 87 +++--
 arch/x86/kernel/cpu/intel_rdt.h | 30 ---
 arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c | 12 ++-
 arch/x86/kernel/cpu/intel_rdt_monitor.c | 55 +++--
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c| 14 +++-
 include/linux/resctrl.h | 17 +++-
 6 files changed, 127 insertions(+), 88 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 8cb2639b8a56..c4e6dcdd235b 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -377,21 +377,23 @@ static void
 mba_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)
 {
unsigned int i;
+   struct rdt_hw_domain *hw_dom = rc_dom_to_rdt(d);
struct rdt_hw_resource *hw_res = resctrl_to_rdt(r);
 
/*  Write the delay values for mba. */
for (i = m->low; i < m->high; i++)
-   wrmsrl(hw_res->msr_base + i, delay_bw_map(d->ctrl_val[i], r));
+   wrmsrl(hw_res->msr_base + i, delay_bw_map(hw_dom->ctrl_val[i], 
r));
 }
 
 static void
 cat_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)
 {
unsigned int i;
+   struct rdt_hw_domain *hw_dom = rc_dom_to_rdt(d);
struct rdt_hw_resource *hw_res = resctrl_to_rdt(r);
 
for (i = m->low; i < m->high; i++)
-   wrmsrl(hw_res->msr_base + cbm_idx(r, i), d->ctrl_val[i]);
+   wrmsrl(hw_res->msr_base + cbm_idx(r, i), hw_dom->ctrl_val[i]);
 }
 
 struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r)
@@ -476,21 +478,22 @@ void setup_default_ctrlval(struct rdt_resource *r, u32 
*dc, u32 *dm)
 static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
 {
struct rdt_hw_resource *hw_res = resctrl_to_rdt(r);
+   struct rdt_hw_domain *hw_dom = rc_dom_to_rdt(d);
struct msr_param m;
u32 *dc, *dm;
 
-   dc = kmalloc_array(r->num_closid, sizeof(*d->ctrl_val), GFP_KERNEL);
+   dc = kmalloc_array(r->num_closid, sizeof(*hw_dom->ctrl_val), 
GFP_KERNEL);
if (!dc)
return -ENOMEM;
 
-   dm = kmalloc_array(r->num_closid, sizeof(*d->mbps_val), GFP_KERNEL);
+   dm = kmalloc_array(r->num_closid, sizeof(*hw_dom->mbps_val), 
GFP_KERNEL);
if (!dm) {
kfree(dc);
return -ENOMEM;
}
 
-   d->ctrl_val = dc;
-   d->mbps_val = dm;
+   hw_dom->ctrl_val = dc;
+   hw_dom->mbps_val = dm;
setup_default_ctrlval(r, dc, dm);
 
m.low = 0;
@@ -502,36 +505,37 @@ static int domain_setup_ctrlval(struct rdt_resource *r, 
struct rdt_domain *d)
 static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
 {
size_t tsize;
+   struct rdt_hw_domain *hw_dom = rc_dom_to_rdt(d);
 
if (is_llc_occupancy_enabled()) {
-   d->rmid_busy_llc = kcalloc(BITS_TO_LONGS(r->num_rmid),
+   hw_dom->rmid_busy_llc = kcalloc(BITS_TO_LONGS(r->num_rmid),
   sizeof(unsigned long),
   GFP_KERNEL);
-   if (!d->rmid_busy_llc)
+   if (!hw_dom->rmid_busy_llc)
return -ENOMEM;
-   INIT_DELAYED_WORK(&d->cqm_limbo, cqm_handle_limbo);
+   INIT_DELAYED_WORK(&hw_dom->cqm_limbo, cqm_handle_limbo);
}
if (is_mbm_total_enabled()) {
-   tsize = sizeof(*d->mbm_total);
-   d->mbm_total = kcalloc(r->num_rmid, tsize, GFP_KERNEL);
-   if (!d->mbm_total) {
-   kfree(d->rmid_busy_llc);
+   tsize = sizeof(*hw_dom->mbm_total);
+   hw_dom->mbm_total = kcalloc(r->num_rmid, tsize, GFP_KERNEL);
+   if (!hw_dom->mbm_total) {
+   kfree(hw_dom->rmid_busy_llc);
return -ENOMEM;
}
}
if (is_mbm_local_enabled()) {
-   tsize = sizeof(*d->mbm_local);
-   d->mbm_local = kcalloc(r->num_rmid, tsize, GFP_KERNEL);
-   if (!d->mbm_local) {
-   kfree(d->rmid_busy_llc);
-   kfree(d->mbm_total);
+   tsize = sizeof(*hw_dom->mbm_local);
+   hw_dom->mbm_local = kcalloc(r->num_rmid, tsize, GFP_KERNEL);
+   if (!hw_dom->mbm_local) {
+   kfree(hw_dom->rmid_busy_llc);
+   k

[RFC PATCH 01/20] x86/intel_rdt: Split struct rdt_resource

2018-08-24 Thread James Morse

resctrl is the defacto Linux ABI for SoC resource partitioning features.
To support it on another architecture, we need to abstract it from
Intel RDT, and move it to /fs/.

Lets start by splitting struct rdt_resource, (the name is kept for now
to keep the noise down), and add some type-trickery to keep the foreach
helpers working.

Move everything that that is particular to resctrl into a new header
file, keeping the x86 msr specific stuff where it is. resctrl code
paths touching a 'hw' struct indicates where an abstraction is needed.

We split rdt_domain up in a similar way in the next patch.
No change in behaviour, this patch just moves types around.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/intel_rdt.c | 193 +++-
 arch/x86/kernel/cpu/intel_rdt.h | 112 +++-
 arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c |   6 +-
 arch/x86/kernel/cpu/intel_rdt_monitor.c |  23 ++-
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c|  37 ++--
 include/linux/resctrl.h | 103 +++
 6 files changed, 275 insertions(+), 199 deletions(-)
 create mode 100644 include/linux/resctrl.h

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index ec4754f81cbd..8cb2639b8a56 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -64,122 +64,137 @@ mba_wrmsr(struct rdt_domain *d, struct msr_param *m, 
struct rdt_resource *r);
 static void
 cat_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r);
 
-#define domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].domains)
+#define domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].resctrl.domains)
 
-struct rdt_resource rdt_resources_all[] = {
+struct rdt_hw_resource rdt_resources_all[] = {
[RDT_RESOURCE_L3] =
{
.rid= RDT_RESOURCE_L3,
-   .name   = "L3",
-   .domains= domain_init(RDT_RESOURCE_L3),
+   .resctrl = {
+   .name   = "L3",
+   .cache_level= 3,
+   .cache = {
+   .min_cbm_bits   = 1,
+   .cbm_idx_mult   = 1,
+   .cbm_idx_offset = 0,
+   },
+   .domains= domain_init(RDT_RESOURCE_L3),
+   .parse_ctrlval  = parse_cbm,
+   .format_str = "%d=%0*x",
+   .fflags = RFTYPE_RES_CACHE,
+   },
.msr_base   = IA32_L3_CBM_BASE,
.msr_update = cat_wrmsr,
-   .cache_level= 3,
-   .cache = {
-   .min_cbm_bits   = 1,
-   .cbm_idx_mult   = 1,
-   .cbm_idx_offset = 0,
-   },
-   .parse_ctrlval  = parse_cbm,
-   .format_str = "%d=%0*x",
-   .fflags = RFTYPE_RES_CACHE,
},
[RDT_RESOURCE_L3DATA] =
{
.rid= RDT_RESOURCE_L3DATA,
-   .name   = "L3DATA",
-   .domains= domain_init(RDT_RESOURCE_L3DATA),
+   .resctrl = {
+   .name   = "L3DATA",
+   .cache_level= 3,
+   .cache = {
+   .min_cbm_bits   = 1,
+   .cbm_idx_mult   = 2,
+   .cbm_idx_offset = 0,
+   },
+   .domains= 
domain_init(RDT_RESOURCE_L3DATA),
+   .parse_ctrlval  = parse_cbm,
+   .format_str = "%d=%0*x",
+   .fflags = RFTYPE_RES_CACHE,
+   },
.msr_base   = IA32_L3_CBM_BASE,
.msr_update = cat_wrmsr,
-   .cache_level= 3,
-   .cache = {
-   .min_cbm_bits   = 1,
-   .cbm_idx_mult   = 2,
-   .cbm_idx_offset = 0,
-   },
-   .parse_ctrlval  = parse_cbm,
-   .format_str = "%d=%0*x",
-   .fflags = RFTYPE_RES_CACHE,
+
},
[RDT_RESOURCE_L3CODE] =
{
.rid= RDT_RESOURCE_L3CODE,
-   .name   = "L3CODE",
-   .domains= domain_init(RDT_RESOURCE_L3CODE),
+   .resctrl = {
+   .name   = "L3CODE",
+   .cache_level= 3,
+   .cache = {
+

[RFC PATCH 06/20] x86/intel_rdt: Add a helper to read a closid's configuration for show_doms()

2018-08-24 Thread James Morse

The configuration values used by the arch code may not be the same
as the bitmaps generated by resctrl, resctrl shouldn't read or
write them directly.

update_domains() and the staged config are suitable for letting the
arch code perform any conversion. Add a helper to read the current
configuration.

This will allow another architecture to scale the bitmaps if
necessary, and possibly use controls that don't take a bitmap at all.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c | 17 +
 arch/x86/kernel/cpu/intel_rdt_monitor.c |  2 +-
 include/linux/resctrl.h |  3 +++
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c 
b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
index 01ffd455313a..ec3c15ee3473 100644
--- a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
+++ b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
@@ -322,21 +322,30 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file 
*of,
return ret ?: nbytes;
 }
 
+void resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
+   u32 closid, u32 *value)
+{
+   struct rdt_hw_domain *hw_dom = rc_dom_to_rdt(d);
+
+   if (!is_mba_sc(r))
+   *value = hw_dom->ctrl_val[closid];
+   else
+   *value = hw_dom->mbps_val[closid];
+}
+
 static void show_doms(struct seq_file *s, struct rdt_resource *r, int closid)
 {
-   struct rdt_hw_domain *hw_dom;
+
struct rdt_domain *dom;
bool sep = false;
u32 ctrl_val;
 
seq_printf(s, "%*s:", max_name_width, r->name);
list_for_each_entry(dom, &r->domains, list) {
-   hw_dom = rc_dom_to_rdt(dom);
if (sep)
seq_puts(s, ";");
 
-   ctrl_val = (!is_mba_sc(r) ? hw_dom->ctrl_val[closid] :
-   hw_dom->mbps_val[closid]);
+   resctrl_arch_get_config(r, dom, closid, &ctrl_val);
seq_printf(s, r->format_str, dom->id, max_data_width,
   ctrl_val);
sep = true;
diff --git a/arch/x86/kernel/cpu/intel_rdt_monitor.c 
b/arch/x86/kernel/cpu/intel_rdt_monitor.c
index c05f1cecf6cd..42ddcefc7065 100644
--- a/arch/x86/kernel/cpu/intel_rdt_monitor.c
+++ b/arch/x86/kernel/cpu/intel_rdt_monitor.c
@@ -390,7 +390,7 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct 
rdt_domain *dom_mbm)
hw_dom_mba = rc_dom_to_rdt(dom_mba);
 
cur_bw = pmbm_data->prev_bw;
-   user_bw = hw_dom_mba->mbps_val[closid];
+   resctrl_arch_get_config(r_mba, dom_mba, closid, &user_bw);
delta_bw = pmbm_data->delta_bw;
cur_msr_val = hw_dom_mba->ctrl_val[closid];
 
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 370db085ee77..03d9fbc230af 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -125,4 +125,7 @@ struct rdt_resource {
 
 };
 
+void resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
+u32 closid, u32 *value);
+
 #endif /* __LINUX_RESCTRL_H */
-- 
2.18.0

[RFC PATCH 08/20] x86/intel_rdt: Make cdp enable/disable global

2018-08-24 Thread James Morse

If the CPU supports Intel's Code and Data Prioritization (CDP), software
can specify a separate bitmap for code and data. This feature needs
enabling in a model-specific-register, and changes the properties of
the cache-controls: it halves the effective number of closids.

This changes how closids are allocated, and so applies to all
alloc_enabled caches. If a system has multiple levels of RDT-like
controls CDP should be enabled/disabled across them all.

Make the CDP enable/disable calls global.

Add CDP capable/enabled flags, and unify the enable/disable behind a
single resctrl_arch_set_cdp_enabled(true/false) call. Architectures
that have nothing to do here can just update the flags.

This subtly changes resctrl's '-o cdp' (l3) and '-o cdpl2' parameters
to mean enable globally if this level supports cdp. The difference
can't be seen on a system which only has one of the two.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/intel_rdt.c  |  1 +
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 72 
 include/linux/resctrl.h  |  7 +++
 3 files changed, 57 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index c4e6dcdd235b..0e651447956e 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -331,6 +331,7 @@ static void rdt_get_cdp_config(int level, int type)
 * By default, CDP is disabled. CDP can be enabled by mount parameter
 * "cdp" during resctrl file system mount time.
 */
+   r_l->cdp_capable = true;
r->alloc_enabled = false;
 }
 
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c 
b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 3ed88d4fedd0..f4f76c193495 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -1081,7 +1081,7 @@ static int cdp_enable(int level, int data_type, int 
code_type)
int ret;
 
if (!r_l->alloc_capable || !r_ldata->alloc_capable ||
-   !r_lcode->alloc_capable)
+   !r_lcode->alloc_capable || !r_l->cdp_capable)
return -EINVAL;
 
ret = set_cache_qos_cfg(level, true);
@@ -1089,51 +1089,77 @@ static int cdp_enable(int level, int data_type, int 
code_type)
r_l->alloc_enabled = false;
r_ldata->alloc_enabled = true;
r_lcode->alloc_enabled = true;
+
+   r_l->cdp_enabled = true;
+   r_ldata->cdp_enabled = true;
+   r_lcode->cdp_enabled = true;
}
return ret;
 }
 
-static int cdpl3_enable(void)
-{
-   return cdp_enable(RDT_RESOURCE_L3, RDT_RESOURCE_L3DATA,
- RDT_RESOURCE_L3CODE);
-}
-
-static int cdpl2_enable(void)
-{
-   return cdp_enable(RDT_RESOURCE_L2, RDT_RESOURCE_L2DATA,
- RDT_RESOURCE_L2CODE);
-}
-
 static void cdp_disable(int level, int data_type, int code_type)
 {
struct rdt_resource *r = &rdt_resources_all[level].resctrl;
 
+   if (!r->cdp_enabled)
+   return;
+
r->alloc_enabled = r->alloc_capable;
 
if (rdt_resources_all[data_type].resctrl.alloc_enabled) {
rdt_resources_all[data_type].resctrl.alloc_enabled = false;
rdt_resources_all[code_type].resctrl.alloc_enabled = false;
set_cache_qos_cfg(level, false);
+
+   r->cdp_enabled = false;
+   rdt_resources_all[data_type].resctrl.cdp_enabled = false;
+   rdt_resources_all[code_type].resctrl.cdp_enabled = false;
}
 }
 
-static void cdpl3_disable(void)
+int resctrl_arch_set_cdp_enabled(bool enable)
 {
-   cdp_disable(RDT_RESOURCE_L3, RDT_RESOURCE_L3DATA, RDT_RESOURCE_L3CODE);
+   int ret = -EINVAL;
+   struct rdt_hw_resource *l3 = &rdt_resources_all[RDT_RESOURCE_L3];
+   struct rdt_hw_resource *l2 = &rdt_resources_all[RDT_RESOURCE_L2];
+
+   if (l3 && l3->resctrl.cdp_capable) {
+   if (!enable) {
+   cdp_disable(RDT_RESOURCE_L3, RDT_RESOURCE_L3DATA,
+   RDT_RESOURCE_L3CODE);
+   ret = 0;
+   } else {
+   ret = cdp_enable(RDT_RESOURCE_L3, RDT_RESOURCE_L3DATA,
+   RDT_RESOURCE_L3CODE);
+   }
+   }
+   if (l2 && l2->resctrl.cdp_capable) {
+   if (!enable) {
+   cdp_disable(RDT_RESOURCE_L2, RDT_RESOURCE_L2DATA,
+   RDT_RESOURCE_L2CODE);
+   ret = 0;
+   } else {
+   ret = cdp_enable(RDT_RESOURCE_L2, RDT_RESOURCE_L2DATA,
+   RDT_RESOURCE_L2CODE);
+   }
+   }
+
+   return ret;
 }
 
-static void cdpl2_disable(void)
+static int try_to_enable_cdp(int level)
 {
-   cdp_disable(RDT_RESOURCE_L2, RDT_RESOURCE_L2DATA, RDT_RESOURCE_L2COD

[RFC PATCH 00/20] x86/intel_rdt: Start abstraction for a second arch

2018-08-24 Thread James Morse

Hi folks,

ARM have some upcoming CPU features that are similar to Intel RDT. Resctrl
is the defacto ABI for this sort of thing, but it lives under arch/x86.

To get existing software working, we need to make resctrl work with arm64.
This series is the first chunk of that. The aim is to move the filesystem/ABI
parts into /fs/resctrl, and implement a second arch backend.


What are the ARM features?
Future ARM SoCs may have a feature called MPAM: Memory Partitioning and
Monitoring. This is an umbrella term like RDT, and covers a range of controls
(like CAT) and monitors (like MBM, CMT).

This series is almost all about CDP. MPAM has equivalent functionality, but
it doesn't need enabling, and doesn't affect the available closids. (I'll
try and use Intel terms). MPAM expects the equivalent to IA32_PRQ_MSR to
be configured with an Instruction closid and a Data closid. These are the
same for no-CDP, and different otherwise. There is no need for them to be
adjacent.

To avoid emulating CDP in arm64's arch code, this series moves all the ABI
parts of the CDP behaviour, (half the closid-space, each having two
configurations) into the filesystem parts of resctrl. These will eventually
be moved to /fs/.

MPAMs control and monitor configuration is all memory mapped, the base
addresses are discovered via firmware tables, so we won't have a table of
possible resources that just need alloc_enabling.

Is this it? No... there are another two series of a similar size that
abstract the MBM/CMT overflow threads and avoid 'fs' code accessing things
that have moved into the 'hw' arch specific struct.


I'm after feedback on the general approach taken here, bugs, as there are
certainly subtleties I've missed, and any strong-opinions on what should be
arch-specific, and what shouldn't.

This series is based on v4.18, and can be retrieved from:
git://linux-arm.org/linux-jm.git -b mpam/resctrl_rework/rfc_1


Thanks,

James Morse (20):
  x86/intel_rdt: Split struct rdt_resource
  x86/intel_rdt: Split struct rdt_domain
  x86/intel_rdt: Group staged configuration into a separate struct
  x86/intel_rdt: Add closid to the staged config
  x86/intel_rdt: make update_domains() learn the affected closids
  x86/intel_rdt: Add a helper to read a closid's configuration for
show_doms()
  x86/intel_rdt: Expose update_domains() as an arch helper
  x86/intel_rdt: Make cdp enable/disable global
  x86/intel_rdt: Track the actual number of closids separately
  x86/intel_rdt: Let resctrl change the resources's num_closid
  x86/intel_rdt: Pass in the code/data/both configuration value when
parsing
  x86/intel_rdt: Correct the closid when staging configuration changes
  x86/intel_rdt: Allow different CODE/DATA configurations to be staged
  x86/intel_rdt: Add a separate resource list for resctrl
  x86/intel_rdt: Walk the resctrl schema list instead of the arch's
resource list
  x86/intel_rdt: Move the schemata names into struct resctrl_schema
  x86/intel_rdt: Stop using Lx CODE/DATA resources
  x86/intel_rdt: Remove the CODE/DATA illusionary caches
  x86/intel_rdt: Kill off alloc_enabled
  x86/intel_rdt: Merge cdp enable/disable calls

 arch/x86/kernel/cpu/intel_rdt.c | 298 +++-
 arch/x86/kernel/cpu/intel_rdt.h | 161 ---
 arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c | 142 +++---
 arch/x86/kernel/cpu/intel_rdt_monitor.c |  78 ++---
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c| 216 +-
 include/linux/resctrl.h | 166 +++
 6 files changed, 621 insertions(+), 440 deletions(-)
 create mode 100644 include/linux/resctrl.h

-- 
2.18.0

[RFC PATCH 05/20] x86/intel_rdt: make update_domains() learn the affected closids

2018-08-24 Thread James Morse

Now that the closid is present in the staged configuration,
update_domains() can learn which low/high values it should update.

Remove the single passed in closid, and update msr_param as we
apply each staged config.

Once the L2/L2CODE/L2DATA resources are merged this will allow
update_domains() to be called once for the single resource, even
when CDP is in use. This results in both CODE and DATA
configurations being applied and the two consecutive closids being
updated with a single smp_call_function_many().

This will let us keep the CDP odd/even behaviour inside resctrl
so that architectures that don't do this don't need to emulate it.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/intel_rdt.h |  4 ++--
 arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c | 21 -
 2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 5e271e0fe1f5..8df549ef016d 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -241,8 +241,8 @@ static inline struct rdt_hw_domain *rc_dom_to_rdt(struct 
rdt_domain *r)
  */
 struct msr_param {
struct rdt_resource *res;
-   int low;
-   int high;
+   u32 low;
+   u32 high;
 };
 
 static inline bool is_llc_occupancy_enabled(void)
diff --git a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c 
b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
index 0c849653a99d..01ffd455313a 100644
--- a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
+++ b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
@@ -193,22 +193,21 @@ static void apply_config(struct rdt_hw_domain *hw_dom,
}
 }
 
-static int update_domains(struct rdt_resource *r, int closid)
+static int update_domains(struct rdt_resource *r)
 {
struct resctrl_staged_config *cfg;
struct rdt_hw_domain *hw_dom;
+   bool msr_param_init = false;
struct msr_param msr_param;
cpumask_var_t cpu_mask;
struct rdt_domain *d;
bool mba_sc;
+   u32 closid;
int i, cpu;
 
if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
return -ENOMEM;
 
-   /* TODO: learn these two by looping the config */
-   msr_param.low = closid;
-   msr_param.high = msr_param.low + 1;
msr_param.res = r;
 
mba_sc = is_mba_sc(r);
@@ -220,9 +219,21 @@ static int update_domains(struct rdt_resource *r, int 
closid)
continue;
 
apply_config(hw_dom, cfg, cpu_mask, mba_sc);
+
+   closid = cfg->closid;
+   if (!msr_param_init) {
+   msr_param.low = closid;
+   msr_param.high = closid;
+   msr_param_init = true;
+   } else {
+   msr_param.low = min(msr_param.low, closid);
+   msr_param.high = max(msr_param.high, closid);
+   }
}
}
 
+   msr_param.high += 1;
+
/*
 * Avoid writing the control msr with control values when
 * MBA software controller is enabled
@@ -301,7 +312,7 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
}
 
for_each_alloc_enabled_rdt_resource(r) {
-   ret = update_domains(r, closid);
+   ret = update_domains(r);
if (ret)
goto out;
}
-- 
2.18.0

[RFC PATCH 09/20] x86/intel_rdt: Track the actual number of closids separately

2018-08-24 Thread James Morse

num_closid is different for the illusionary CODE/DATA caches, and
these resource's ctrlval is sized on this parameter. When it comes
to writing the configuration values into hardware, a correction is
applied.

The next step in moving this behaviour into the resctrl code is
to make the arch code always work with the full range of closids, and
size its ctrlval arrays based on this number.

This means another architecture doesn't need to emulate CDP.

Add a separate field to hold hw_num_closids and use this in the
arch code. The CODE/DATA caches use the full range for their hardware
struct, but the half sized version for the resctrl visible part.
This means the ctrlval array is the full size, but only the first
half is used.

A later patch will correct the closid when the configuration is
written, at which point we can merge the illusionary caches.

A short lived quirk of this is when a resource is reset(), both
the code and data illusionary caches reset the full closid range.
This disappears in a later patch that merges the caches together.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/intel_rdt.c  | 19 ++-
 arch/x86/kernel/cpu/intel_rdt.h  |  2 ++
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c |  3 ++-
 3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 0e651447956e..c035280b4398 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -223,7 +223,8 @@ static unsigned int cbm_idx(struct rdt_resource *r, 
unsigned int closid)
  */
 static inline void cache_alloc_hsw_probe(void)
 {
-   struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].resctrl;
+   struct rdt_hw_resource *hw_res  = &rdt_resources_all[RDT_RESOURCE_L3];
+   struct rdt_resource *r = &hw_res->resctrl;
u32 l, h, max_cbm = BIT_MASK(20) - 1;
 
if (wrmsr_safe(IA32_L3_CBM_BASE, max_cbm, 0))
@@ -235,6 +236,7 @@ static inline void cache_alloc_hsw_probe(void)
return;
 
r->num_closid = 4;
+   hw_res->hw_num_closid = 4;
r->default_ctrl = max_cbm;
r->cache.cbm_len = 20;
r->cache.shareable_bits = 0xc;
@@ -276,12 +278,14 @@ static inline bool rdt_get_mb_table(struct rdt_resource 
*r)
 
 static bool rdt_get_mem_config(struct rdt_resource *r)
 {
+   struct rdt_hw_resource *hw_res = resctrl_to_rdt(r);
union cpuid_0x10_3_eax eax;
union cpuid_0x10_x_edx edx;
u32 ebx, ecx;
 
cpuid_count(0x0010, 3, &eax.full, &ebx, &ecx, &edx.full);
r->num_closid = edx.split.cos_max + 1;
+   hw_res->hw_num_closid = r->num_closid;
r->membw.max_delay = eax.split.max_delay + 1;
r->default_ctrl = MAX_MBA_BW;
if (ecx & MBA_IS_LINEAR) {
@@ -302,12 +306,14 @@ static bool rdt_get_mem_config(struct rdt_resource *r)
 
 static void rdt_get_cache_alloc_cfg(int idx, struct rdt_resource *r)
 {
+   struct rdt_hw_resource *hw_res = resctrl_to_rdt(r);
union cpuid_0x10_1_eax eax;
union cpuid_0x10_x_edx edx;
u32 ebx, ecx;
 
cpuid_count(0x0010, idx, &eax.full, &ebx, &ecx, &edx.full);
r->num_closid = edx.split.cos_max + 1;
+   hw_res->hw_num_closid = r->num_closid;
r->cache.cbm_len = eax.split.cbm_len + 1;
r->default_ctrl = BIT_MASK(eax.split.cbm_len + 1) - 1;
r->cache.shareable_bits = ebx & r->default_ctrl;
@@ -319,9 +325,11 @@ static void rdt_get_cache_alloc_cfg(int idx, struct 
rdt_resource *r)
 static void rdt_get_cdp_config(int level, int type)
 {
struct rdt_resource *r_l = &rdt_resources_all[level].resctrl;
-   struct rdt_resource *r = &rdt_resources_all[type].resctrl;
+   struct rdt_hw_resource *hw_res_t = &rdt_resources_all[type];
+   struct rdt_resource *r = &hw_res_t->resctrl;
 
r->num_closid = r_l->num_closid / 2;
+   hw_res_t->hw_num_closid = r_l->num_closid;
r->cache.cbm_len = r_l->cache.cbm_len;
r->default_ctrl = r_l->default_ctrl;
r->cache.shareable_bits = r_l->cache.shareable_bits;
@@ -463,6 +471,7 @@ struct rdt_domain *rdt_find_domain(struct rdt_resource *r, 
int id,
 void setup_default_ctrlval(struct rdt_resource *r, u32 *dc, u32 *dm)
 {
int i;
+   struct rdt_hw_resource *hw_res = resctrl_to_rdt(r);
 
/*
 * Initialize the Control MSRs to having no control.
@@ -470,7 +479,7 @@ void setup_default_ctrlval(struct rdt_resource *r, u32 *dc, 
u32 *dm)
 * For Memory Allocation: Set b/w requested to 100%
 * and the bandwidth in MBps to U32_MAX
 */
-   for (i = 0; i < r->num_closid; i++, dc++, dm++) {
+   for (i = 0; i < hw_res->hw_num_closid; i++, dc++, dm++) {
*dc = r->default_ctrl;
*dm = MBA_MAX_MBPS;
}
@@ -483,7 +492,7 @@ static int domain_setup_ctrlval(struct rdt_resource *r, 
struct rdt_domain *d)
struct msr_param m;
u32 *dc, *dm;

[RFC PATCH 10/20] x86/intel_rdt: Let resctrl change the resources's num_closid

2018-08-24 Thread James Morse

Today we switch between different alloc_enabled resources which
have differing preset num_closid to account for CDP.
We want to merge these illusionary caches together, at which
point something needs to change the resctrl's view of num_closid.

The arch code now has its own idea of how many closids there are,
and as the two configurations for one rdtgroup is part of resctrl's
ABI we should get resctrl to change it.

We change the num_closid on the l2/l3 resources, which aren't
yet in use when cdp is enabled, then change them back afterwards.
Once we merge illusionary caches, resctrl will see the value it
changed here.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c 
b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 58dceaad6863..e2a9202674f3 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -1149,16 +1149,36 @@ int resctrl_arch_set_cdp_enabled(bool enable)
 
 static int try_to_enable_cdp(int level)
 {
+   int ret;
struct rdt_resource *r = &rdt_resources_all[level].resctrl;
+   struct rdt_resource *l3 = &rdt_resources_all[RDT_RESOURCE_L3].resctrl;
+   struct rdt_resource *l2 = &rdt_resources_all[RDT_RESOURCE_L2].resctrl;
 
if (!r->cdp_capable)
return -EINVAL;
+   if (r->cdp_enabled)
+   return 0;
 
-   return resctrl_arch_set_cdp_enabled(true);
+   ret = resctrl_arch_set_cdp_enabled(true);
+   if (!ret) {
+   if (l2->cdp_enabled)
+   l2->num_closid /= 2;
+   if (l3->cdp_enabled)
+   l3->num_closid /= 2;
+   }
+
+   return ret;
 }
 
 static void cdp_disable_all(void)
 {
+   struct rdt_resource *l2 = &rdt_resources_all[RDT_RESOURCE_L2].resctrl;
+   struct rdt_resource *l3 = &rdt_resources_all[RDT_RESOURCE_L3].resctrl;
+
+   if (l2->cdp_enabled)
+   l2->num_closid *= 2;
+   if (l3->cdp_enabled)
+   l3->num_closid *= 2;
resctrl_arch_set_cdp_enabled(false);
 }
 
-- 
2.18.0

[RFC PATCH 17/20] x86/intel_rdt: Stop using Lx CODE/DATA resources

2018-08-24 Thread James Morse

Now that CDP enable/disable is global, and the closid offset correction
is based on the configuration being applied, we can use the same
Lx resource twice for CDP's CODE/DATA schema. This keeps the illusion
of separate caches in the resctrl code.

When CDP is enabled for a cache, create two schema generating the names
and setting the configuration type.

We can now remove the initialisation of of the illusionary hw_resources:
'cdp_capable' just requires setting a flag, resctrl knows what to do
from there.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/intel_rdt.c  | 49 ++--
 arch/x86/kernel/cpu/intel_rdt.h  |  1 -
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 98 +---
 3 files changed, 58 insertions(+), 90 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 3a0d7de15afa..96b1aab36053 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -81,7 +81,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.format_str = "%d=%0*x",
.fflags = RFTYPE_RES_CACHE,
},
-   .cdp_type   = CDP_BOTH,
.msr_base   = IA32_L3_CBM_BASE,
.msr_update = cat_wrmsr,
},
@@ -99,7 +98,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.format_str = "%d=%0*x",
.fflags = RFTYPE_RES_CACHE,
},
-   .cdp_type   = CDP_DATA,
.msr_base   = IA32_L3_CBM_BASE,
.msr_update = cat_wrmsr,
 
@@ -118,7 +116,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.format_str = "%d=%0*x",
.fflags = RFTYPE_RES_CACHE,
},
-   .cdp_type   = CDP_CODE,
.msr_base   = IA32_L3_CBM_BASE,
.msr_update = cat_wrmsr,
},
@@ -136,7 +133,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.format_str = "%d=%0*x",
.fflags = RFTYPE_RES_CACHE,
},
-   .cdp_type   = CDP_BOTH,
.msr_base   = IA32_L2_CBM_BASE,
.msr_update = cat_wrmsr,
},
@@ -154,7 +150,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.format_str = "%d=%0*x",
.fflags = RFTYPE_RES_CACHE,
},
-   .cdp_type   = CDP_DATA,
.msr_base   = IA32_L2_CBM_BASE,
.msr_update = cat_wrmsr,
},
@@ -172,7 +167,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.format_str = "%d=%0*x",
.fflags = RFTYPE_RES_CACHE,
},
-   .cdp_type   = CDP_CODE,
.msr_base   = IA32_L2_CBM_BASE,
.msr_update = cat_wrmsr,
},
@@ -312,39 +306,6 @@ static void rdt_get_cache_alloc_cfg(int idx, struct 
rdt_resource *r)
r->alloc_enabled = true;
 }
 
-static void rdt_get_cdp_config(int level, int type)
-{
-   struct rdt_resource *r_l = &rdt_resources_all[level].resctrl;
-   struct rdt_hw_resource *hw_res_t = &rdt_resources_all[type];
-   struct rdt_resource *r = &hw_res_t->resctrl;
-
-   r->num_closid = r_l->num_closid / 2;
-   hw_res_t->hw_num_closid = r_l->num_closid;
-   r->cache.cbm_len = r_l->cache.cbm_len;
-   r->default_ctrl = r_l->default_ctrl;
-   r->cache.shareable_bits = r_l->cache.shareable_bits;
-   r->data_width = (r->cache.cbm_len + 3) / 4;
-   r->alloc_capable = true;
-   /*
-* By default, CDP is disabled. CDP can be enabled by mount parameter
-* "cdp" during resctrl file system mount time.
-*/
-   r_l->cdp_capable = true;
-   r->alloc_enabled = false;
-}
-
-static void rdt_get_cdp_l3_config(void)
-{
-   rdt_get_cdp_config(RDT_RESOURCE_L3, RDT_RESOURCE_L3DATA);
-   rdt_get_cdp_config(RDT_RESOURCE_L3, RDT_RESOURCE_L3CODE);
-}
-
-static void rdt_get_cdp_l2_config(void)
-{
-   rdt_get_cdp_config(RDT_RESOURCE_L2, RDT_RESOURCE_L2DATA);
-   rdt_get_cdp_config(RDT_RESOURCE_L2, RDT_RESOURCE_L2CODE);
-}
-
 static int get_cache_id(int cpu, int level)
 {
struct cpu_cacheinfo *ci = get_cpu_cacheinfo(cpu);
@@ -813,6 +774,8 @@ static bool __init rdt_cpu_has(int flag)
 static __init bool get_rdt_alloc_resources(void)
 {
bool ret = false;
+   struct rdt_hw_resource *l2 = &rdt_resources_all[RDT_RESOURCE_L2];
+   struct rdt_hw_resource *l3 = &rdt_resources_all[RDT_RESOURCE

[RFC PATCH 16/20] x86/intel_rdt: Move the schemata names into struct resctrl_schema

2018-08-24 Thread James Morse

Move the names used for the schemata file out of the resource and
into struct resctrl_schema. This lets us give one resource two
different names, based on the other schema properties.

For now we copy the name, once we merge the L2/L2CODE/L2DATA
resources resctrl will generate it.

Remove the arch code's max_name_width, this is now resctrl's
problem.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/intel_rdt.c | 9 ++---
 arch/x86/kernel/cpu/intel_rdt.h | 2 +-
 arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c | 4 ++--
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c| 4 +++-
 include/linux/resctrl.h | 7 +++
 5 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 6466c172c045..3a0d7de15afa 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -48,10 +48,10 @@ DEFINE_MUTEX(rdtgroup_mutex);
 DEFINE_PER_CPU(struct intel_pqr_state, pqr_state);
 
 /*
- * Used to store the max resource name width and max resource data width
+ * Used to store the max resource data width
  * to display the schemata in a tabular format
  */
-int max_name_width, max_data_width;
+int max_data_width;
 
 /*
  * Global boolean for rdt_alloc which is true if any
@@ -722,13 +722,8 @@ static int intel_rdt_offline_cpu(unsigned int cpu)
 static __init void rdt_init_padding(void)
 {
struct rdt_resource *r;
-   int cl;
 
for_each_alloc_capable_rdt_resource(r) {
-   cl = strlen(r->name);
-   if (cl > max_name_width)
-   max_name_width = cl;
-
if (r->data_width > max_data_width)
max_data_width = r->data_width;
}
diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index cc8dea58b74f..b72448186532 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -146,7 +146,7 @@ struct rdtgroup {
 /* List of all resource groups */
 extern struct list_head rdt_all_groups;
 
-extern int max_name_width, max_data_width;
+extern int max_data_width;
 
 int __init rdtgroup_init(void);
 
diff --git a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c 
b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
index 3038ecfdeec0..e8264637a4d3 100644
--- a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
+++ b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
@@ -275,7 +275,7 @@ static int rdtgroup_parse_resource(char *resname, char 
*tok, int closid)
 
list_for_each_entry(s, &resctrl_all_schema, list) {
r = s->res;
-   if (!strcmp(resname, r->name) && closid < r->num_closid)
+   if (!strcmp(resname, s->name) && closid < r->num_closid)
return parse_line(tok, r, s->conf_type, closid);
}
rdt_last_cmd_printf("unknown/unsupported resource name '%s'\n", 
resname);
@@ -358,7 +358,7 @@ static void show_doms(struct seq_file *s, struct 
resctrl_schema *schema, int clo
bool sep = false;
u32 ctrl_val, hw_closid;
 
-   seq_printf(s, "%*s:", max_name_width, r->name);
+   seq_printf(s, "%*s:", sizeof(schema->name), schema->name);
list_for_each_entry(dom, &r->domains, list) {
if (sep)
seq_puts(s, ";");
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c 
b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 0bd748defc73..b3d3acbb2ef7 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -932,7 +932,7 @@ static int rdtgroup_create_info_dir(struct kernfs_node 
*parent_kn)
list_for_each_entry(s, &resctrl_all_schema, list) {
r = s->res;
fflags =  r->fflags | RF_CTRL_INFO;
-   ret = rdtgroup_mkdir_info_resdir(r, r->name, fflags);
+   ret = rdtgroup_mkdir_info_resdir(r, s->name, fflags);
if (ret)
goto out_destroy;
}
@@ -1306,6 +1306,8 @@ static int create_schemata_list(void)
s->res = r;
s->conf_type = resctrl_to_rdt(r)->cdp_type;
 
+   snprintf(s->name, sizeof(s->name), "%s", r->name);
+
INIT_LIST_HEAD(&s->list);
list_add(&s->list, &resctrl_all_schema);
}
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 9ed0beb241d8..8b06ed8e7407 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -7,6 +7,11 @@
 #include 
 #include 
 
+/*
+ * The longest name we expect in the schemata file:
+ */
+#define RESCTRL_NAME_LEN   7
+
 enum resctrl_conf_type {
CDP_BOTH = 0,
CDP_CODE,
@@ -147,11 +152,13 @@ int resctrl_arch_set_cdp_enabled(bool enable);
 
 /**
  * @list:  Member of resctrl's schema list
+ * @name:  Name visible in the schemata file
  * @conf_type: Type of configuration, e.g. code/data/both
  * @res:   The rdt_resource for this entry
  */
 st

[RFC PATCH 15/20] x86/intel_rdt: Walk the resctrl schema list instead of the arch's resource list

2018-08-24 Thread James Morse

Now that resctrl has a list of resources it is using, walk that list
instead of the architectures list. This lets us keep schema properties
with the resource that is using them.

Most users of for_each_alloc_enabled_rdt_resource() are per-schema,
switch these to walk the schema list. The remainder are working with
a per-resource property.

Previously we littered resctrl_to_rdt() wherever we needed to know the
cdp_type of a cache. Now that this has a home, fix all those callers
to read the value from the relevant schema entry.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c | 24 +
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c|  4 +++-
 include/linux/resctrl.h |  2 +-
 3 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c 
b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
index f80a838cc36d..3038ecfdeec0 100644
--- a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
+++ b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
@@ -271,10 +271,12 @@ int resctrl_arch_update_domains(struct rdt_resource *r)
 static int rdtgroup_parse_resource(char *resname, char *tok, int closid)
 {
struct rdt_resource *r;
+   struct resctrl_schema *s;
 
-   for_each_alloc_enabled_rdt_resource(r) {
+   list_for_each_entry(s, &resctrl_all_schema, list) {
+   r = s->res;
if (!strcmp(resname, r->name) && closid < r->num_closid)
-   return parse_line(tok, r, resctrl_to_rdt(r)->cdp_type, 
closid);
+   return parse_line(tok, r, s->conf_type, closid);
}
rdt_last_cmd_printf("unknown/unsupported resource name '%s'\n", 
resname);
return -EINVAL;
@@ -283,6 +285,7 @@ static int rdtgroup_parse_resource(char *resname, char 
*tok, int closid)
 ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off)
 {
+   struct resctrl_schema *s;
struct rdtgroup *rdtgrp;
struct rdt_domain *dom;
struct rdt_resource *r;
@@ -303,9 +306,10 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file 
*of,
 
closid = rdtgrp->closid;
 
-   for_each_alloc_enabled_rdt_resource(r) {
-   list_for_each_entry(dom, &r->domains, list)
+   list_for_each_entry(s, &resctrl_all_schema, list) {
+   list_for_each_entry(dom, &s->res->domains, list) {
memset(dom->staged_config, 0, 
sizeof(dom->staged_config));
+   }
}
 
while ((tok = strsep(&buf, "\n")) != NULL) {
@@ -347,9 +351,9 @@ void resctrl_arch_get_config(struct rdt_resource *r, struct 
rdt_domain *d,
*value = hw_dom->mbps_val[hw_closid];
 }
 
-static void show_doms(struct seq_file *s, struct rdt_resource *r, int closid)
+static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int 
closid)
 {
-
+   struct rdt_resource *r = schema->res;
struct rdt_domain *dom;
bool sep = false;
u32 ctrl_val, hw_closid;
@@ -359,7 +363,7 @@ static void show_doms(struct seq_file *s, struct 
rdt_resource *r, int closid)
if (sep)
seq_puts(s, ";");
 
-   hw_closid = resctrl_closid_cdp_map(closid, 
resctrl_to_rdt(r)->cdp_type);
+   hw_closid = resctrl_closid_cdp_map(closid, schema->conf_type);
resctrl_arch_get_config(r, dom, hw_closid, &ctrl_val);
seq_printf(s, r->format_str, dom->id, max_data_width,
   ctrl_val);
@@ -371,6 +375,7 @@ static void show_doms(struct seq_file *s, struct 
rdt_resource *r, int closid)
 int rdtgroup_schemata_show(struct kernfs_open_file *of,
   struct seq_file *s, void *v)
 {
+   struct resctrl_schema *schema;
struct rdtgroup *rdtgrp;
struct rdt_resource *r;
int ret = 0;
@@ -379,9 +384,10 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
rdtgrp = rdtgroup_kn_lock_live(of->kn);
if (rdtgrp) {
closid = rdtgrp->closid;
-   for_each_alloc_enabled_rdt_resource(r) {
+   list_for_each_entry(schema, &resctrl_all_schema, list) {
+   r = schema->res;
if (closid < r->num_closid)
-   show_doms(s, r, closid);
+   show_doms(s, schema, closid);
}
} else {
ret = -ENOENT;
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c 
b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 2015d99ca388..0bd748defc73 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -913,6 +913,7 @@ static int rdtgroup_mkdir_info_resdir(struct rdt_resource 
*r, char *name,
 
 static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn)
 {
+   struct resctrl_schema *s;
struc

[RFC PATCH 14/20] x86/intel_rdt: Add a separate resource list for resctrl

2018-08-24 Thread James Morse

We want to merge the L2/L2CODE/L2DATA resources together so that
there is one resource per cache. The CDP properties are then
part of the configuration.

Currently the cdp type to use with the configuration is hidden
in the resource. This needs to be part of the schema, but resctrl
doesn't have a structure for this, (its all flattened out into
extra resources).

Create a list of schema that resctrl presents via the schemata file.
We want to move the illusion of an "L2CODE" cache into resctrl so that
this part of the ABI is dealt with by core code.
This change will allow us to have the same resource represented twice
as code/data, with the appropriate cdp_type for configuration.

This will also let us generate the names in resctrl.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 45 +++-
 include/linux/resctrl.h  | 13 +++
 2 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c 
b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index f3dfed9c609a..2015d99ca388 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -43,6 +43,9 @@ static struct kernfs_root *rdt_root;
 struct rdtgroup rdtgroup_default;
 LIST_HEAD(rdt_all_groups);
 
+/* list of entries for the schemata file */
+LIST_HEAD(resctrl_all_schema);
+
 /* Kernel fs node for "info" directory under root */
 static struct kernfs_node *kn_info;
 
@@ -1287,6 +1290,37 @@ static int mkdir_mondata_all(struct kernfs_node 
*parent_kn,
 struct rdtgroup *prgrp,
 struct kernfs_node **mon_data_kn);
 
+
+static int create_schemata_list(void)
+{
+   struct rdt_resource *r;
+   struct resctrl_schema *s;
+
+   for_each_alloc_enabled_rdt_resource(r) {
+   s = kzalloc(sizeof(*s), GFP_KERNEL);
+   if (!s)
+   return -ENOMEM;
+
+   s->res = r;
+   s->conf_type = resctrl_to_rdt(r)->cdp_type;
+
+   INIT_LIST_HEAD(&s->list);
+   list_add(&s->list, &resctrl_all_schema);
+   }
+
+   return 0;
+}
+
+static void destroy_schemata_list(void)
+{
+   struct resctrl_schema *s, *tmp;
+
+   list_for_each_entry_safe(s, tmp, &resctrl_all_schema, list) {
+   list_del(&s->list);
+   kfree(s);
+   }
+}
+
 static struct dentry *rdt_mount(struct file_system_type *fs_type,
int flags, const char *unused_dev_name,
void *data)
@@ -1312,12 +1346,18 @@ static struct dentry *rdt_mount(struct file_system_type 
*fs_type,
goto out_cdp;
}
 
+   ret = create_schemata_list();
+   if (ret) {
+   dentry = ERR_PTR(ret);
+   goto out_schemata_free;
+   }
+
closid_init();
 
ret = rdtgroup_create_info_dir(rdtgroup_default.kn);
if (ret) {
dentry = ERR_PTR(ret);
-   goto out_cdp;
+   goto out_schemata_free;
}
 
if (rdt_mon_capable) {
@@ -1370,6 +1410,8 @@ static struct dentry *rdt_mount(struct file_system_type 
*fs_type,
kernfs_remove(kn_mongrp);
 out_info:
kernfs_remove(kn_info);
+out_schemata_free:
+   destroy_schemata_list();
 out_cdp:
cdp_disable_all();
 out:
@@ -1538,6 +1580,7 @@ static void rdt_kill_sb(struct super_block *sb)
reset_all_ctrls(r);
cdp_disable_all();
rmdir_all_sub();
+   destroy_schemata_list();
static_branch_disable_cpuslocked(&rdt_alloc_enable_key);
static_branch_disable_cpuslocked(&rdt_mon_enable_key);
static_branch_disable_cpuslocked(&rdt_enable_key);
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index ede5c40756b4..071b2cc9c402 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -145,4 +145,17 @@ void resctrl_arch_get_config(struct rdt_resource *r, 
struct rdt_domain *d,
 /* Enable/Disable CDP on all applicable resources */
 int resctrl_arch_set_cdp_enabled(bool enable);
 
+/**
+ * @list:  Member of resctrl's schema list
+ * @cdp_type:  Whether this entry is for code/data/both
+ * @res:   The rdt_resource for this entry
+ */
+struct resctrl_schema {
+   struct list_headlist;
+   enum resctrl_conf_type  conf_type;
+   struct rdt_resource *res;
+};
+
+extern struct list_head resctrl_all_schema;
+
 #endif /* __LINUX_RESCTRL_H */
-- 
2.18.0

[RFC PATCH 04/20] x86/intel_rdt: Add closid to the staged config

2018-08-24 Thread James Morse

Once we merge the L2/L2CODE/L2DATA resources, we still want to have
two configurations staged for one resource when CDP is enabled.

These two configurations would have different closid as far as the
hardware is concerned.

In preparation, add closid as a staged parameter, and pass it down
when the schema is being parsed. In the future this will be the
hardware closid, with the CDP correction already applied by resctrl.
This allows another architecture to work with resctrl, without
having to emulate CDP.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/intel_rdt.h |  4 ++--
 arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c | 21 -
 include/linux/resctrl.h |  4 +++-
 3 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 7c17d74fd36c..5e271e0fe1f5 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -294,8 +294,8 @@ static inline struct rdt_hw_resource *resctrl_to_rdt(struct 
rdt_resource *r)
return container_of(r, struct rdt_hw_resource, resctrl);
 }
 
-int parse_cbm(char *buf, struct rdt_resource *r, struct rdt_domain *d);
-int parse_bw(char *buf, struct rdt_resource *r,  struct rdt_domain *d);
+int parse_cbm(char *buf, struct rdt_resource *r, struct rdt_domain *d, u32 
closid);
+int parse_bw(char *buf, struct rdt_resource *r,  struct rdt_domain *d, u32 
closid);
 
 extern struct mutex rdtgroup_mutex;
 
diff --git a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c 
b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
index 1068a19e03c5..0c849653a99d 100644
--- a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
+++ b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
@@ -64,7 +64,7 @@ static bool bw_validate(char *buf, unsigned long *data, 
struct rdt_resource *r)
return true;
 }
 
-int parse_bw(char *buf, struct rdt_resource *r, struct rdt_domain *d)
+int parse_bw(char *buf, struct rdt_resource *r, struct rdt_domain *d, u32 
closid)
 {
unsigned long data;
struct resctrl_staged_config *cfg = &d->staged_config[0];
@@ -76,6 +76,7 @@ int parse_bw(char *buf, struct rdt_resource *r, struct 
rdt_domain *d)
 
if (!bw_validate(buf, &data, r))
return -EINVAL;
+   cfg->closid = closid;
cfg->new_ctrl = data;
cfg->have_new_ctrl = true;
 
@@ -127,7 +128,7 @@ static bool cbm_validate(char *buf, unsigned long *data, 
struct rdt_resource *r)
  * Read one cache bit mask (hex). Check that it is valid for the current
  * resource type.
  */
-int parse_cbm(char *buf, struct rdt_resource *r, struct rdt_domain *d)
+int parse_cbm(char *buf, struct rdt_resource *r, struct rdt_domain *d, u32 
closid)
 {
unsigned long data;
struct resctrl_staged_config *cfg = &d->staged_config[0];
@@ -139,6 +140,7 @@ int parse_cbm(char *buf, struct rdt_resource *r, struct 
rdt_domain *d)
 
if(!cbm_validate(buf, &data, r))
return -EINVAL;
+   cfg->closid = closid;
cfg->new_ctrl = data;
cfg->have_new_ctrl = true;
 
@@ -151,7 +153,7 @@ int parse_cbm(char *buf, struct rdt_resource *r, struct 
rdt_domain *d)
  * separated by ";". The "id" is in decimal, and must match one of
  * the "id"s for this resource.
  */
-static int parse_line(char *line, struct rdt_resource *r)
+static int parse_line(char *line, struct rdt_resource *r, u32 closid)
 {
char *dom = NULL, *id;
struct rdt_domain *d;
@@ -169,7 +171,7 @@ static int parse_line(char *line, struct rdt_resource *r)
dom = strim(dom);
list_for_each_entry(d, &r->domains, list) {
if (d->id == dom_id) {
-   if (r->parse_ctrlval(dom, r, d))
+   if (r->parse_ctrlval(dom, r, d, closid))
return -EINVAL;
goto next;
}
@@ -178,15 +180,15 @@ static int parse_line(char *line, struct rdt_resource *r)
 }
 
 static void apply_config(struct rdt_hw_domain *hw_dom,
-struct resctrl_staged_config *cfg, int closid,
+struct resctrl_staged_config *cfg, 
 cpumask_var_t cpu_mask, bool mba_sc)
 {
u32 *dc = !mba_sc ? hw_dom->ctrl_val : hw_dom->mbps_val;
 
-   if (cfg->new_ctrl != dc[closid]) {
+   if (cfg->new_ctrl != dc[cfg->closid]) {
cpumask_set_cpu(cpumask_any(&hw_dom->resctrl.cpu_mask),
cpu_mask);
-   dc[closid] = cfg->new_ctrl;
+   dc[cfg->closid] = cfg->new_ctrl;
cfg->have_new_ctrl = false;
}
 }
@@ -204,6 +206,7 @@ static int update_domains(struct rdt_resource *r, int 
closid)
if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
return -ENOMEM;
 
+   /* TODO: learn these two by looping the config */
msr_param.low = closid;
msr_param.high = msr_param.low + 1;
msr_param.re

[RFC PATCH 11/20] x86/intel_rdt: Pass in the code/data/both configuration value when parsing

2018-08-24 Thread James Morse

The illusion of three types of cache at each level is a neat trick to
allow a static table of resources to be used. This is a problem if the
cache topology and partitioning abilities have to be discovered at boot.

We want to fold the three code/data/both caches into one, and move the
CDP configuration details to be a property of the configuration and
its closid, not the cache. The resctrl filesystem can then re-create
the illusion of separate caches.

Temporarily label the configuration property of the cache, and pass
this value down to the configuration helpers. Eventually we will move
this label up to become a property of the schema.

A later patch will correct the closid for CDP when the configuration is
staged, which will let us merge the three types of resource.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/intel_rdt.c |  6 ++
 arch/x86/kernel/cpu/intel_rdt.h |  7 +--
 arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c | 15 ++-
 include/linux/resctrl.h | 11 ++-
 4 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index c035280b4398..8d3544b6c149 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -83,6 +83,7 @@ struct rdt_hw_resource rdt_resources_all[] = {
.format_str = "%d=%0*x",
.fflags = RFTYPE_RES_CACHE,
},
+   .cdp_type   = CDP_BOTH,
.msr_base   = IA32_L3_CBM_BASE,
.msr_update = cat_wrmsr,
},
@@ -102,6 +103,7 @@ struct rdt_hw_resource rdt_resources_all[] = {
.format_str = "%d=%0*x",
.fflags = RFTYPE_RES_CACHE,
},
+   .cdp_type   = CDP_DATA,
.msr_base   = IA32_L3_CBM_BASE,
.msr_update = cat_wrmsr,
 
@@ -122,6 +124,7 @@ struct rdt_hw_resource rdt_resources_all[] = {
.format_str = "%d=%0*x",
.fflags = RFTYPE_RES_CACHE,
},
+   .cdp_type   = CDP_CODE,
.msr_base   = IA32_L3_CBM_BASE,
.msr_update = cat_wrmsr,
},
@@ -141,6 +144,7 @@ struct rdt_hw_resource rdt_resources_all[] = {
.format_str = "%d=%0*x",
.fflags = RFTYPE_RES_CACHE,
},
+   .cdp_type   = CDP_BOTH,
.msr_base   = IA32_L2_CBM_BASE,
.msr_update = cat_wrmsr,
},
@@ -160,6 +164,7 @@ struct rdt_hw_resource rdt_resources_all[] = {
.format_str = "%d=%0*x",
.fflags = RFTYPE_RES_CACHE,
},
+   .cdp_type   = CDP_DATA,
.msr_base   = IA32_L2_CBM_BASE,
.msr_update = cat_wrmsr,
},
@@ -179,6 +184,7 @@ struct rdt_hw_resource rdt_resources_all[] = {
.format_str = "%d=%0*x",
.fflags = RFTYPE_RES_CACHE,
},
+   .cdp_type   = CDP_CODE,
.msr_base   = IA32_L2_CBM_BASE,
.msr_update = cat_wrmsr,
},
diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 92822ff99f1a..cc8dea58b74f 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -285,6 +285,7 @@ struct rdt_hw_resource {
struct rdt_resource resctrl;
int rid;
u32 hw_num_closid;
+   enum resctrl_conf_type  cdp_type; // temporary
unsigned intmsr_base;
void (*msr_update)  (struct rdt_domain *d, struct msr_param *m,
 struct rdt_resource *r);
@@ -296,8 +297,10 @@ static inline struct rdt_hw_resource 
*resctrl_to_rdt(struct rdt_resource *r)
return container_of(r, struct rdt_hw_resource, resctrl);
 }
 
-int parse_cbm(char *buf, struct rdt_resource *r, struct rdt_domain *d, u32 
closid);
-int parse_bw(char *buf, struct rdt_resource *r,  struct rdt_domain *d, u32 
closid);
+int parse_cbm(char *buf, struct rdt_resource *r, struct rdt_domain *d,
+ enum resctrl_conf_type t, u32 closid);
+int parse_bw(char *buf, struct rdt_resource *r,  struct rdt_domain *d,
+enum resctrl_conf_type t,  u32 closid);
 
 extern struct mutex rdtgroup_mutex;
 
diff --git a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c 
b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
index 766c3e62ad91..bab6032704

[RFC PATCH 18/20] x86/intel_rdt: Remove the CODE/DATA illusionary caches

2018-08-24 Thread James Morse

Now that nothing uses these caches, remove them.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/intel_rdt.c | 69 -
 arch/x86/kernel/cpu/intel_rdt.h |  4 --
 2 files changed, 73 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 96b1aab36053..f6f1eceb366f 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -84,41 +84,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.msr_base   = IA32_L3_CBM_BASE,
.msr_update = cat_wrmsr,
},
-   [RDT_RESOURCE_L3DATA] =
-   {
-   .rid= RDT_RESOURCE_L3DATA,
-   .resctrl = {
-   .name   = "L3DATA",
-   .cache_level= 3,
-   .cache = {
-   .min_cbm_bits   = 1,
-   },
-   .domains= 
domain_init(RDT_RESOURCE_L3DATA),
-   .parse_ctrlval  = parse_cbm,
-   .format_str = "%d=%0*x",
-   .fflags = RFTYPE_RES_CACHE,
-   },
-   .msr_base   = IA32_L3_CBM_BASE,
-   .msr_update = cat_wrmsr,
-
-   },
-   [RDT_RESOURCE_L3CODE] =
-   {
-   .rid= RDT_RESOURCE_L3CODE,
-   .resctrl = {
-   .name   = "L3CODE",
-   .cache_level= 3,
-   .cache = {
-   .min_cbm_bits   = 1,
-   },
-   .domains= 
domain_init(RDT_RESOURCE_L3CODE),
-   .parse_ctrlval  = parse_cbm,
-   .format_str = "%d=%0*x",
-   .fflags = RFTYPE_RES_CACHE,
-   },
-   .msr_base   = IA32_L3_CBM_BASE,
-   .msr_update = cat_wrmsr,
-   },
[RDT_RESOURCE_L2] =
{
.rid= RDT_RESOURCE_L2,
@@ -136,40 +101,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.msr_base   = IA32_L2_CBM_BASE,
.msr_update = cat_wrmsr,
},
-   [RDT_RESOURCE_L2DATA] =
-   {
-   .rid= RDT_RESOURCE_L2DATA,
-   .resctrl = {
-   .name   = "L2DATA",
-   .cache_level= 2,
-   .cache = {
-   .min_cbm_bits   = 1,
-   },
-   .domains= 
domain_init(RDT_RESOURCE_L2DATA),
-   .parse_ctrlval  = parse_cbm,
-   .format_str = "%d=%0*x",
-   .fflags = RFTYPE_RES_CACHE,
-   },
-   .msr_base   = IA32_L2_CBM_BASE,
-   .msr_update = cat_wrmsr,
-   },
-   [RDT_RESOURCE_L2CODE] =
-   {
-   .rid= RDT_RESOURCE_L2CODE,
-   .resctrl = {
-   .name   = "L2CODE",
-   .cache_level= 2,
-   .cache = {
-   .min_cbm_bits   = 1,
-   },
-   .domains= 
domain_init(RDT_RESOURCE_L2CODE),
-   .parse_ctrlval  = parse_cbm,
-   .format_str = "%d=%0*x",
-   .fflags = RFTYPE_RES_CACHE,
-   },
-   .msr_base   = IA32_L2_CBM_BASE,
-   .msr_update = cat_wrmsr,
-   },
[RDT_RESOURCE_MBA] =
{
.rid= RDT_RESOURCE_MBA,
diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index fd5c0b3dc797..a4aba005cfea 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -311,11 +311,7 @@ int __init rdtgroup_init(void);
 
 enum {
RDT_RESOURCE_L3,
-   RDT_RESOURCE_L3DATA,
-   RDT_RESOURCE_L3CODE,
RDT_RESOURCE_L2,
-   RDT_RESOURCE_L2DATA,
-   RDT_RESOURCE_L2CODE,
RDT_RESOURCE_MBA,
 
/* Must be the last */
-- 
2.18.0

[RFC PATCH 07/20] x86/intel_rdt: Expose update_domains() as an arch helper

2018-08-24 Thread James Morse

update_domains() applies the staged configuration to the hw_dom's
configuration array and updates the hardware. Make it part of the
interface between resctrl and the arch code.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c | 4 ++--
 include/linux/resctrl.h | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c 
b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
index ec3c15ee3473..766c3e62ad91 100644
--- a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
+++ b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
@@ -193,7 +193,7 @@ static void apply_config(struct rdt_hw_domain *hw_dom,
}
 }
 
-static int update_domains(struct rdt_resource *r)
+int resctrl_arch_update_domains(struct rdt_resource *r)
 {
struct resctrl_staged_config *cfg;
struct rdt_hw_domain *hw_dom;
@@ -312,7 +312,7 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
}
 
for_each_alloc_enabled_rdt_resource(r) {
-   ret = update_domains(r);
+   ret = resctrl_arch_update_domains(r);
if (ret)
goto out;
}
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 03d9fbc230af..9fe7d7de53d7 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -125,6 +125,7 @@ struct rdt_resource {
 
 };
 
+int resctrl_arch_update_domains(struct rdt_resource *r);
 void resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
 u32 closid, u32 *value);
 
-- 
2.18.0

1 2 3 4 >

1 - 100 of 395 matches

Mail list logo