date:20180126

Re: [PATCH 1/2] mmc: sdhci: Add support for O2 eMMC HS200 mode

2018-01-26 Thread Adrian Hunter

On 17/01/18 03:37, ernest.zhang wrote:
> when eMMC used as boot device, the eMMC signaling voltage is tied to 1.8v
> fixed output voltage, bios can set o2 sd host controller PCI configuration
> register 0x308 bit4 to 1 to let host controller skip try 3.3.v signaling
> voltage and direct use 1.8v singling voltage in eMMC initialize process.
> 
> Signed-off-by: ernest.zhang 

Please use version numbers in the patch subject.  This was V3, the next is
V4. A summary of what has changed (after the --- line or in the cover email)
is also expected.

> ---
>  drivers/mmc/host/sdhci-pci-o2micro.c | 20 +++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/mmc/host/sdhci-pci-o2micro.c 
> b/drivers/mmc/host/sdhci-pci-o2micro.c
> index 555970a29c94..8855a416ffd4 100644
> --- a/drivers/mmc/host/sdhci-pci-o2micro.c
> +++ b/drivers/mmc/host/sdhci-pci-o2micro.c
> @@ -1,8 +1,9 @@
>  /*
> - * Copyright (C) 2013 BayHub Technology Ltd.
> + * Copyright (C) 2018 BayHub Technology Ltd.
>   *
>   * Authors: Peter Guo 
>   *  Adam Lee 
> + *  Ernest Zhang 
>   *
>   * This software is licensed under the terms of the GNU General Public
>   * License version 2, as published by the Free Software Foundation, and
> @@ -39,6 +40,7 @@
>  #define O2_SD_MISC_CTRL4 0xFC
>  #define O2_SD_TUNING_CTRL0x300
>  #define O2_SD_PLL_SETTING0x304
> +#define O2_SD_MISC_SETTING   0x308
>  #define O2_SD_CLK_SETTING0x328
>  #define O2_SD_CAP_REG2   0x330
>  #define O2_SD_CAP_REG0   0x334
> @@ -53,6 +55,7 @@
>  
>  #define O2_SD_VENDOR_SETTING 0x110
>  #define O2_SD_VENDOR_SETTING20x1C8
> +#define O2_SD_HW_TUNING_ENABLE   BIT(4)

This is not used in this patch which means it should be in the other patch

>  
>  static void o2_pci_set_baseclk(struct sdhci_pci_chip *chip, u32 value)
>  {
> @@ -184,6 +187,7 @@ int sdhci_pci_o2_probe_slot(struct sdhci_pci_slot *slot)
>   struct sdhci_pci_chip *chip;
>   struct sdhci_host *host;
>   u32 reg;
> + int ret;
>  
>   chip = slot->chip;
>   host = slot->host;
> @@ -197,6 +201,20 @@ int sdhci_pci_o2_probe_slot(struct sdhci_pci_slot *slot)
>   if (reg & 0x1)
>   host->quirks |= SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12;
>  
> + if (chip->pdev->device == PCI_DEVICE_ID_O2_SEABIRD0) {
> + ret = pci_read_config_dword(chip->pdev,
> + O2_SD_MISC_SETTING, ®);
> + if (ret)
> + return -EIO;
> + if (reg & (1 << 4)) {
> + pr_info("%s: emmc 1.8v flag is set, force 1.8v 
> signaling voltage\n",
> +  mmc_hostname(host->mmc));
> + host->flags &= ~SDHCI_SIGNALING_330;
> + host->flags |= SDHCI_SIGNALING_180;
> + }
> + }
> +
> +

Please do not add double blank line

>   if (chip->pdev->device != PCI_DEVICE_ID_O2_FUJIN2)
>   break;
>   /* set dll watch dog timer */
>

Re: [PATCH 2/2] mmc: sdhci: Add support for O2 hardware tuning

2018-01-26 Thread Adrian Hunter

On 17/01/18 03:38, ernest.zhang wrote:
> O2 sd host controllers have a hardware tuning function. In software
> tuning mode CPU should send multiple command to host controller but in
> hardware tuning mode, CPU need send only one tuning command to sd host
> controller. It can improve the speed linux boot from eMMC.
> 
> Signed-off-by: ernest.zhang 

Please use version numbers in the patch subject.  This was V3, the next is
V4. A summary of what has changed (after the --- line or in the cover email)
is also expected.

> ---
>  drivers/mmc/host/sdhci-pci-o2micro.c | 193 
> +++
>  drivers/mmc/host/sdhci.c |   5 +-
>  2 files changed, 197 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/mmc/host/sdhci-pci-o2micro.c 
> b/drivers/mmc/host/sdhci-pci-o2micro.c
> index 8855a416ffd4..5978ead34827 100644
> --- a/drivers/mmc/host/sdhci-pci-o2micro.c
> +++ b/drivers/mmc/host/sdhci-pci-o2micro.c
> @@ -17,6 +17,12 @@
>   */
>  
>  #include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
>  
>  #include "sdhci.h"
>  #include "sdhci-pci.h"
> @@ -57,6 +63,192 @@
>  #define O2_SD_VENDOR_SETTING20x1C8
>  #define O2_SD_HW_TUNING_ENABLE   BIT(4)
>  
> +static void sdhci_o2_start_tuning(struct sdhci_host *host)
> +{
> + u16 ctrl;
> +
> + ctrl = sdhci_readw(host, SDHCI_HOST_CONTROL2);
> + ctrl |= SDHCI_CTRL_EXEC_TUNING;
> + sdhci_writew(host, ctrl, SDHCI_HOST_CONTROL2);
> +
> + /*
> +  * As per the Host Controller spec v3.00, tuning command
> +  * generates Buffer Read Ready interrupt, so enable that.
> +  *
> +  * Note: The spec clearly says that when tuning sequence
> +  * is being performed, the controller does not generate
> +  * interrupts other than Buffer Read Ready interrupt. But
> +  * to make sure we don't hit a controller bug, we _only_
> +  * enable Buffer Read Ready interrupt here.
> +  */
> + sdhci_writel(host, SDHCI_INT_DATA_AVAIL, SDHCI_INT_ENABLE);
> + sdhci_writel(host, SDHCI_INT_DATA_AVAIL, SDHCI_SIGNAL_ENABLE);
> +}
> +
> +static void sdhci_o2_end_tuning(struct sdhci_host *host)
> +{
> + sdhci_writel(host, host->ier, SDHCI_INT_ENABLE);
> + sdhci_writel(host, host->ier, SDHCI_SIGNAL_ENABLE);
> +}
> +
> +static inline bool sdhci_data_line_cmd(struct mmc_command *cmd)
> +{
> + return cmd->data || cmd->flags & MMC_RSP_BUSY;
> +}
> +
> +static void sdhci_del_timer(struct sdhci_host *host, struct mmc_request *mrq)
> +{
> + if (sdhci_data_line_cmd(mrq->cmd))
> + del_timer(&host->data_timer);
> + else
> + del_timer(&host->timer);
> +}
> +
> +static void sdhci_o2_set_tuning_mode(struct sdhci_host *host)
> +{
> + u16 reg;
> +
> + /* enable hardware tuning */
> + reg = sdhci_readw(host, O2_SD_VENDOR_SETTING);
> + reg &= ~O2_SD_HW_TUNING_ENABLE;
> + sdhci_writew(host, reg, O2_SD_VENDOR_SETTING);
> +}
> +
> +
> +static int sdhci_o2_send_tuning(struct sdhci_host *host, u32 opcode)
> +{
> + struct mmc_command cmd = { };
> + struct mmc_data data = { };
> + struct scatterlist sg;
> + struct mmc_request mrq = { };
> + unsigned long flags;
> + u32 b = host->sdma_boundary;
> + int size = 64;
> + u8 *data_buf = kzalloc(size, GFP_KERNEL);
> +
> + if (!data_buf)
> + return -ENOMEM;
> +
> + cmd.opcode = opcode;
> + cmd.flags = MMC_RSP_PRESENT | MMC_RSP_OPCODE | MMC_RSP_CRC;
> + cmd.mrq = &mrq;
> + cmd.data = NULL;

I am confused now.  If you set cmd.data to NULL then there is no point in
having struct mmc_data data at all.

But that also means my comment about not needing
SDHCI_QUIRK2_CLEAR_TRANSFERMODE_REG_BEFORE_CMD was wrong.

Please clarify.

> + mrq.cmd = &cmd;
> + mrq.data = &data;
> + data.blksz = size;
> + data.blocks = 1;
> + data.flags = MMC_DATA_READ;
> +
> + data.timeout_ns = 50 * NSEC_PER_MSEC;
> +
> + data.sg = &sg;
> + data.sg_len = 1;
> + sg_init_one(&sg, data_buf, size);
> +
> + spin_lock_irqsave(&host->lock, flags);
> +
> + sdhci_writew(host, SDHCI_MAKE_BLKSZ(b, 64), SDHCI_BLOCK_SIZE);
> +
> + /*
> +  * The tuning block is sent by the card to the host controller.
> +  * So we set the TRNS_READ bit in the Transfer Mode register.
> +  * This also takes care of setting DMA Enable and Multi Block
> +  * Select in the same register to 0.
> +  */

So with no cmd.data then you can set the transfer mode, except that
SDHCI_QUIRK2_CLEAR_TRANSFERMODE_REG_BEFORE_CMD clears it again.  Doesn't
SDHCI_QUIRK2_CLEAR_TRANSFERMODE_REG_BEFORE_CMD also cause a problem with SD
card tuning i.e. when you call sdhci_execute_tuning() ?

> + sdhci_send_command(host, &cmd);
> +
> + host->cmd = NULL;
> +
> + sdhci_del_timer(host, &mrq);
> +
> + host->tuning_done = 0;
> +
> + mmiowb();
> + spin_unlock_irqrestore(&host->lock, flags);
> +
> + /* Wait for Buffer Read Ready int

Re: [PATCH] platform/x86: dell-smbios: Correct notation for filtering

2018-01-26 Thread Andy Shevchenko

On Fri, Jan 5, 2018 at 4:56 PM, Mario Limonciello
 wrote:
> The class/select were mistakingly put into octal notation but

mistakenly

> intended to be in decimal notation.
>
> Suggest-by: Pali Rohar 

Suggested-by:

> Signed-off-by: Mario Limonciello 

I have fixed above and pushed to my review and testing queue, thanks!

> ---
>  drivers/platform/x86/dell-smbios.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/platform/x86/dell-smbios.c 
> b/drivers/platform/x86/dell-smbios.c
> index 6a60db5..8541cde 100644
> --- a/drivers/platform/x86/dell-smbios.c
> +++ b/drivers/platform/x86/dell-smbios.c
> @@ -65,10 +65,10 @@ static struct smbios_call call_whitelist[] = {
>
>  /* calls that are explicitly blacklisted */
>  static struct smbios_call call_blacklist[] = {
> -   {0x, 01, 07}, /* manufacturing use */
> -   {0x, 06, 05}, /* manufacturing use */
> -   {0x, 11, 03}, /* write once */
> -   {0x, 11, 07}, /* write once */
> +   {0x,  1,  7}, /* manufacturing use */
> +   {0x,  6,  5}, /* manufacturing use */
> +   {0x, 11,  3}, /* write once */
> +   {0x, 11,  7}, /* write once */
> {0x, 11, 11}, /* write once */
> {0x, 19, -1}, /* diagnostics */
> /* handled by kernel: dell-laptop */
> --
> 2.7.4
>



-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH] spi: orion: Fix a resource leak if the optional "axi" clk is deferred

2018-01-26 Thread Gregory CLEMENT

Hi Christophe,
 
 On jeu., janv. 25 2018, Christophe JAILLET  
wrote:

> If the optional "axi" clk is deferred, we still need to undo some
> initialisation. Espacially 'master' must be released. It will be
  Especially

> reallocated the next time 'orion_spi_probe()' is called.
>
> Add a new label to clean what needs to be cleaned and rename another
> label to improve the names used.
>
> Fixes: 92ae112e477a ("spi: orion: Fix clock resource by adding an optional 
> bus clock")
> Signed-off-by: Christophe JAILLET 


Acked-by: Gregory CLEMENT 

Thanks,

Gregory


> ---
>  drivers/spi/spi-orion.c | 13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/spi/spi-orion.c b/drivers/spi/spi-orion.c
> index 482a0cf3b7aa..deca63e82ff6 100644
> --- a/drivers/spi/spi-orion.c
> +++ b/drivers/spi/spi-orion.c
> @@ -638,8 +638,10 @@ static int orion_spi_probe(struct platform_device *pdev)
>   /* The following clock is only used by some SoCs */
>   spi->axi_clk = devm_clk_get(&pdev->dev, "axi");
>   if (IS_ERR(spi->axi_clk) &&
> - PTR_ERR(spi->axi_clk) == -EPROBE_DEFER)
> - return -EPROBE_DEFER;
> + PTR_ERR(spi->axi_clk) == -EPROBE_DEFER) {
> + status = -EPROBE_DEFER;
> + goto out_rel_clk;
> + }
>   if (!IS_ERR(spi->axi_clk))
>   clk_prepare_enable(spi->axi_clk);
>  
> @@ -667,7 +669,7 @@ static int orion_spi_probe(struct platform_device *pdev)
>   spi->base = devm_ioremap_resource(&pdev->dev, r);
>   if (IS_ERR(spi->base)) {
>   status = PTR_ERR(spi->base);
> - goto out_rel_clk;
> + goto out_rel_axi_clk;
>   }
>  
>   /* Scan all SPI devices of this controller for direct mapped devices */
> @@ -705,7 +707,7 @@ static int orion_spi_probe(struct platform_device *pdev)
>   PAGE_SIZE);
>   if (!spi->direct_access[cs].vaddr) {
>   status = -ENOMEM;
> - goto out_rel_clk;
> + goto out_rel_axi_clk;
>   }
>   spi->direct_access[cs].size = PAGE_SIZE;
>  
> @@ -733,8 +735,9 @@ static int orion_spi_probe(struct platform_device *pdev)
>  
>  out_rel_pm:
>   pm_runtime_disable(&pdev->dev);
> -out_rel_clk:
> +out_rel_axi_clk:
>   clk_disable_unprepare(spi->axi_clk);
> +out_rel_clk:
>   clk_disable_unprepare(spi->clk);
>  out:
>   spi_master_put(master);
> -- 
> 2.14.1
>
>
> ---
> L'absence de virus dans ce courrier électronique a été vérifiée par le 
> logiciel antivirus Avast.
> https://www.avast.com/antivirus
>

-- 
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com

Re: [PATCH] ARM: dts: exynos: Update x and y properties for mms114 touchscreen

2018-01-26 Thread Krzysztof Kozlowski

On Fri, Jan 26, 2018 at 7:04 AM, Andi Shyti  wrote:
> The mms114 binding [1] specifies that the 'x' and 'y' should be
> called respectively 'touchscreen-size-x' and 'touchscreen-size-y'
> in coherence with the touchscreen [2] binding.
>
> Update the mms114 node for trats2 and trats dts according to the
> binding.
>
> [1] Documentation/devicetree/bindings/input/touchscreen/mms114.txt
> [2] Documentation/devicetree/bindings/input/touchscreen/touchscreen.txt
>
> Signed-off-by: Andi Shyti 
> ---
> Hi Krzysztof,
>
> this patch depends on Simon's patchset [1] and must be applied after
> that. If you want I can ping you when the patch gets or you can take
> it in the next cycle.
>
> Andi

Thanks, looks good. Indeed I would appreciate if you gave me a ping
once dependency gets into Linus's tree.

Best regards,
Krzysztof

>
> [1] https://marc.info/?l=linux-kernel&m=151682273403098&w=2

Re: [PATCH v6 2/2] media: V3s: Add support for Allwinner CSI.

2018-01-26 Thread Maxime Ripard

On Fri, Jan 26, 2018 at 11:00:41AM +0800, Yong wrote:
> Hi Maxime,
> 
> On Fri, 26 Jan 2018 09:46:58 +0800
> Yong  wrote:
> 
> > Hi Maxime,
> > 
> > Do you have any experience in solving this problem?
> > It seems the PHYS_OFFSET maybe undeclared when the ARCH is not arm.
> 
> Got it.
> Should I add 'depends on ARM' in Kconfig?

Yes, or even better a depends on MACH_SUNXI :)

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


signature.asc
Description: PGP signature

Re: [PATCH RFC 1/2] mmc: sdhci-msm: Add support to store supported vdd-io voltages

2018-01-26 Thread Adrian Hunter

On 18/01/18 10:05, Vijay Viswanath wrote:
> During probe check whether the vdd-io regulator of sdhc platform device
> can support 1.8V and 3V and store this information as a capability of
> platform device.
> 
> Signed-off-by: Vijay Viswanath 

Not sure why this is RFC, but for sdhci:

Acked-by: Adrian Hunter 

> ---
>  drivers/mmc/host/sdhci-msm.c | 38 ++
>  1 file changed, 38 insertions(+)
> 
> diff --git a/drivers/mmc/host/sdhci-msm.c b/drivers/mmc/host/sdhci-msm.c
> index c283291..5c23e92 100644
> --- a/drivers/mmc/host/sdhci-msm.c
> +++ b/drivers/mmc/host/sdhci-msm.c
> @@ -23,6 +23,7 @@
>  #include 
>  
>  #include "sdhci-pltfm.h"
> +#include 
>  
>  #define CORE_MCI_VERSION 0x50
>  #define CORE_VERSION_MAJOR_SHIFT 28
> @@ -81,6 +82,9 @@
>  #define CORE_HC_SELECT_IN_HS400  (6 << 19)
>  #define CORE_HC_SELECT_IN_MASK   (7 << 19)
>  
> +#define CORE_3_0V_SUPPORT(1 << 25)
> +#define CORE_1_8V_SUPPORT(1 << 26)
> +
>  #define CORE_CSR_CDC_CTLR_CFG0   0x130
>  #define CORE_SW_TRIG_FULL_CALIB  BIT(16)
>  #define CORE_HW_AUTOCAL_ENA  BIT(17)
> @@ -148,6 +152,7 @@ struct sdhci_msm_host {
>   u32 curr_io_level;
>   wait_queue_head_t pwr_irq_wait;
>   bool pwr_irq_flag;
> + u32 caps_0;
>  };
>  
>  static unsigned int msm_get_clock_rate_for_bus_mode(struct sdhci_host *host,
> @@ -1313,6 +1318,35 @@ static void sdhci_msm_writeb(struct sdhci_host *host, 
> u8 val, int reg)
>   sdhci_msm_check_power_status(host, req_type);
>  }
>  
> +static int sdhci_msm_set_regulator_caps(struct sdhci_msm_host *msm_host)
> +{
> + struct mmc_host *mmc = msm_host->mmc;
> + struct regulator *supply = mmc->supply.vqmmc;
> + int i, count;
> + u32 caps = 0, vdd_uV;
> +
> + if (!IS_ERR(mmc->supply.vqmmc)) {
> + count = regulator_count_voltages(supply);
> + if (count < 0)
> + return count;
> + for (i = 0; i < count; i++) {
> + vdd_uV = regulator_list_voltage(supply, i);
> + if (vdd_uV <= 0)
> + continue;
> + if (vdd_uV > 270)
> + caps |= CORE_3_0V_SUPPORT;
> + if (vdd_uV < 195)
> + caps |= CORE_1_8V_SUPPORT;
> + }
> + }
> + msm_host->caps_0 |= caps;
> + pr_debug("%s: %s: supported caps: 0x%08x\n", mmc_hostname(mmc),
> + __func__, caps);
> +
> + return 0;
> +}
> +
> +
>  static const struct of_device_id sdhci_msm_dt_match[] = {
>   { .compatible = "qcom,sdhci-msm-v4" },
>   {},
> @@ -1530,6 +1564,10 @@ static int sdhci_msm_probe(struct platform_device 
> *pdev)
>   ret = sdhci_add_host(host);
>   if (ret)
>   goto pm_runtime_disable;
> + ret = sdhci_msm_set_regulator_caps(msm_host);
> + if (ret)
> + dev_err(&pdev->dev, "%s: Failed to set regulator caps: %d\n",
> + __func__, ret);
>  
>   pm_runtime_mark_last_busy(&pdev->dev);
>   pm_runtime_put_autosuspend(&pdev->dev);
>

Re: [PATCH RFC 2/2] mmc: sdhci-msm: support voltage pad switching

2018-01-26 Thread Adrian Hunter

On 18/01/18 10:05, Vijay Viswanath wrote:
> From: Krishna Konda 
> 
> The PADs for sdhc controller are dual-voltage that support 3v/1.8v.
> Those PADs have a control signal (io_pad_pwr_switch/mode18 ) that
> indicates whether the PAD works in 3v or 1.8v.
> 
> SDHC core on msm platforms should have IO_PAD_PWR_SWITCH bit set/unset
> based on actual voltage used for IO lines. So when power irq is
> triggered for io high or io low, the driver should check the voltages
> supported and set the pad accordingly.
> 
> Signed-off-by: Krishna Konda 
> Signed-off-by: Venkat Gopalakrishnan 
> Signed-off-by: Vijay Viswanath 
> ---

Not sure why this is RFC, but for sdhci:

Acked-by: Adrian Hunter 

>  drivers/mmc/host/sdhci-msm.c | 38 ++
>  1 file changed, 38 insertions(+)
> 
> diff --git a/drivers/mmc/host/sdhci-msm.c b/drivers/mmc/host/sdhci-msm.c
> index 5c23e92..f5728a8 100644
> --- a/drivers/mmc/host/sdhci-msm.c
> +++ b/drivers/mmc/host/sdhci-msm.c
> @@ -78,6 +78,8 @@
>  #define CORE_HC_MCLK_SEL_DFLT(2 << 8)
>  #define CORE_HC_MCLK_SEL_HS400   (3 << 8)
>  #define CORE_HC_MCLK_SEL_MASK(3 << 8)
> +#define CORE_IO_PAD_PWR_SWITCH_EN(1 << 15)
> +#define CORE_IO_PAD_PWR_SWITCH  (1 << 16)
>  #define CORE_HC_SELECT_IN_EN BIT(18)
>  #define CORE_HC_SELECT_IN_HS400  (6 << 19)
>  #define CORE_HC_SELECT_IN_MASK   (7 << 19)
> @@ -1166,6 +1168,35 @@ static void sdhci_msm_handle_pwr_irq(struct sdhci_host 
> *host, int irq)
>*/
>   writel_relaxed(irq_ack, msm_host->core_mem + CORE_PWRCTL_CTL);
>  
> + /*
> +  * SDHC has core_mem and hc_mem device memory and these memory
> +  * addresses do not fall within 1KB region. Hence, any update to
> +  * core_mem address space would require an mb() to ensure this gets
> +  * completed before its next update to registers within hc_mem.
> +  */
> + mb();
> + /*
> +  * We should unset IO PAD PWR switch only if the register write can
> +  * set IO lines high and the regulator also switches to 3 V.
> +  * Else, we should keep the IO PAD PWR switch set.
> +  * This is applicable to certain targets where eMMC vccq supply is only
> +  * 1.8V. In such targets, even during REQ_IO_HIGH, the IO PAD PWR
> +  * switch must be kept set to reflect actual regulator voltage. This
> +  * way, during initialization of controllers with only 1.8V, we will
> +  * set the IO PAD bit without waiting for a REQ_IO_LOW.
> +  */
> + if ((io_level & REQ_IO_HIGH) && (msm_host->caps_0 & CORE_3_0V_SUPPORT))
> + writel_relaxed((readl_relaxed(host->ioaddr + CORE_VENDOR_SPEC) &
> + ~CORE_IO_PAD_PWR_SWITCH), host->ioaddr +
> + CORE_VENDOR_SPEC);
> + else if ((io_level & REQ_IO_LOW) ||
> + (msm_host->caps_0 & CORE_1_8V_SUPPORT))
> + writel_relaxed((readl_relaxed(host->ioaddr + CORE_VENDOR_SPEC) |
> + CORE_IO_PAD_PWR_SWITCH), host->ioaddr +
> + CORE_VENDOR_SPEC);
> + /* Ensure that the IO PAD switches are updated before proceeding */
> + mb();
> +
>   if (pwr_state)
>   msm_host->curr_pwr_state = pwr_state;
>   if (io_level)
> @@ -1518,6 +1549,13 @@ static int sdhci_msm_probe(struct platform_device 
> *pdev)
>   }
>  
>   /*
> +  * Set the PAD_PWR_SWITCH_EN bit so that the PAD_PWR_SWITCH bit can
> +  * be used as required later on.
> +  */
> + writel_relaxed((readl_relaxed(host->ioaddr + CORE_VENDOR_SPEC) |
> + CORE_IO_PAD_PWR_SWITCH_EN), host->ioaddr +
> + CORE_VENDOR_SPEC);
> + /*
>* Power on reset state may trigger power irq if previous status of
>* PWRCTL was either BUS_ON or IO_HIGH_V. So before enabling pwr irq
>* interrupt in GIC, any pending power irq interrupt should be
>

[PATCH] atm: he: Replace GFP_ATOMIC with GFP_KERNEL in he_open

2018-01-26 Thread Jia-Ju Bai

After checking all possible call chains to he_open() here,
my tool finds that he_open() is never called in atomic context.
And this function is assigned to a function pointer "dev->ops->open",
which is only called by __vcc_connect() (net/atm/common.c)
through dev->ops->send(), and __vcc_connect() is only called by
vcc_connect(), which calls mutex_lock(),
so it indicates that he_open() can call functions which may sleep.
Thus GFP_ATOMIC is not necessary, and it can be replaced with GFP_KERNEL.

This is found by a static analysis tool named DCNS written by myself.

Signed-off-by: Jia-Ju Bai 
---
 drivers/atm/he.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/atm/he.c b/drivers/atm/he.c
index e58538c..fea5bf0 100644
--- a/drivers/atm/he.c
+++ b/drivers/atm/he.c
@@ -2135,7 +2135,7 @@ static int he_start(struct atm_dev *dev)
 
cid = he_mkcid(he_dev, vpi, vci);
 
-   he_vcc = kmalloc(sizeof(struct he_vcc), GFP_ATOMIC);
+   he_vcc = kmalloc(sizeof(struct he_vcc), GFP_KERNEL);
if (he_vcc == NULL) {
hprintk("unable to allocate he_vcc during open\n");
return -ENOMEM;
-- 
1.7.9.5

Re: [PATCH] of: use hash based search in of_find_node_by_phandle

2018-01-26 Thread Chintan Pandya




On 1/26/2018 1:24 AM, Frank Rowand wrote:

On 01/25/18 02:14, Chintan Pandya wrote:

of_find_node_by_phandle() takes a lot of time finding
right node when your intended device is too right-side
in the fdt. Reason is, we search each device serially
from the fdt, starting from left-most to right-most.

Please give me a pointer to the code that is doing
this search.

-Frank

You can refer include/linux/of.h

#define for_each_of_allnodes_from(from, dn) \
    for (dn = __of_find_all_nodes(from); dn; dn = 
__of_find_all_nodes(dn))

#define for_each_of_allnodes(dn) for_each_of_allnodes_from(NULL, dn)

where __of_find_all_nodes() does

struct device_node *__of_find_all_nodes(struct device_node *prev)
{
    struct device_node *np;
    if (!prev) {
    np = of_root;
    } else if (prev->child) {
    np = prev->child;
    } else {
    /* Walk back up looking for a sibling, or the end of 
the structure */

    np = prev;
    while (np->parent && !np->sibling)
    np = np->parent;
    np = np->sibling; /* Might be null at the end of the 
tree */

    }
    return np;
}

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[PATCH 2/3] ARM: dts: imx6ull: add support for the esai interface

2018-01-26 Thread Lothar Waßmann

The address space taken by the UART8 on the i.MX6UL is used for the
ESAI interface on i.MX6ULL.

Since the ESAI unit on i.MX6ULL has two more bits in the TFCR register
(TFIN, TAENB) it deserves to get its own compatible string, though the
bits are currently not used by the driver.

Signed-off-by: Lothar Waßmann 
---
 Documentation/devicetree/bindings/sound/fsl,esai.txt |  4 ++--
 arch/arm/boot/dts/imx6ull.dtsi   | 17 +
 sound/soc/fsl/fsl_esai.c |  1 +
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/sound/fsl,esai.txt 
b/Documentation/devicetree/bindings/sound/fsl,esai.txt
index cacd18b..4103f46 100644
--- a/Documentation/devicetree/bindings/sound/fsl,esai.txt
+++ b/Documentation/devicetree/bindings/sound/fsl,esai.txt
@@ -7,8 +7,8 @@ other DSPs. It has up to six transmitters and four receivers.
 
 Required properties:
 
-  - compatible : Compatible list, must contain "fsl,imx35-esai" or
- "fsl,vf610-esai"
+  - compatible : Compatible list, must contain "fsl,imx35-esai",
+ "fsl,vf610-esai" or "fsl,imx6ull-esai"
 
   - reg: Offset and length of the register set for the 
device.
 
diff --git a/arch/arm/boot/dts/imx6ull.dtsi b/arch/arm/boot/dts/imx6ull.dtsi
index abc815f..8724fdb2 100644
--- a/arch/arm/boot/dts/imx6ull.dtsi
+++ b/arch/arm/boot/dts/imx6ull.dtsi
@@ -47,6 +47,23 @@
aips-bus@200 {
spba-bus@200 {
/delete-node/ serial@2024000;
+
+   esai: esai@2024000 {
+   compatible = "fsl,imx6ull-esai", 
"fsl,imx35-esai";
+   reg = <0x02024000 0x4000>;
+   interrupts = ;
+   clocks = <&clks IMX6ULL_CLK_ESAI_IPG>,
+<&clks IMX6ULL_CLK_ESAI_MEM>,
+<&clks IMX6ULL_CLK_ESAI_EXTAL>,
+<&clks IMX6ULL_CLK_ESAI_IPG>,
+<&clks IMX6UL_CLK_SPBA>;
+   clock-names = "core", "mem", "extal",
+ "fsys", "spba";
+   dmas = <&sdma 0 21 0>,
+  <&sdma 47 21 0>;
+   dma-names = "rx", "tx";
+   status = "disabled";
+   };
};
};
 
diff --git a/sound/soc/fsl/fsl_esai.c b/sound/soc/fsl/fsl_esai.c
index cef79a1..5b6a53f 100644
--- a/sound/soc/fsl/fsl_esai.c
+++ b/sound/soc/fsl/fsl_esai.c
@@ -910,6 +910,7 @@ static int fsl_esai_probe(struct platform_device *pdev)
 }
 
 static const struct of_device_id fsl_esai_dt_ids[] = {
+   { .compatible = "fsl,imx6ull-esai", },
{ .compatible = "fsl,imx35-esai", },
{ .compatible = "fsl,vf610-esai", },
{}
-- 
2.1.4

[PATCH 0/3] ARM: dts: imx6ull: fix some incompatibilities between i.MX6UL and i.MX6ULL

2018-01-26 Thread Lothar Waßmann

This patchset addresses some differences between i.MX6UL and i.MX6ULL
which have slipped through the cracks so far.

- UART8 is not on SPBA but on AIPS-3
- i.MX6ULL has an ESAI interface in the address range of the UART8 on i.MX6UL
- i.MX6ULL does not have a CAAM unit nor SIM interfaces

Re: [PATCH 0/2] MIPS: generic dma-coherence.h inclusion

2018-01-26 Thread Steven J. Hill

On 01/23/2018 07:40 PM, Florian Fainelli wrote:

[...]

> 
> Florian Fainelli (2):
>   MIPS: Allow including mach-generic/dma-coherence.h
>   MIPS: Update dma-coherence.h files
> 
I have tested these on our Octeon III platforms with PCIe and saw
no issues. Thanks.

Steve


Tested-by: Steven J. Hill

Re: [PATCH RFC 0/6] MIPS: Broadcom eXtended KSEG0/1 support

2018-01-26 Thread Steven J. Hill

On 01/23/2018 07:47 PM, Florian Fainelli wrote:

[...]

> 
> Florian Fainelli (6):
>   MIPS: Allow board to override TLB initialization
>   MIPS: Allow platforms to override mapping/unmapping coherent
>   MIPS: BMIPS: Avoid referencing CKSEG1
>   MIPS: Prepare for supporting eXtended KSEG0/1
>   MIPS: BMIPS: Handshake with CFE
>   MIPS: BMIPS: Add support for eXtended KSEG0/1 (XKS01)
> 
I have tested these with your previous "MIPS: generic dma-coherence
inclusion" patchset on our Octeon III platforms with PCIe and saw
no issues. Thanks.

Steve


Tested-by: Steven J. Hill

Re: [PATCH net-next v1] samples/bpf: Partially fixes the bpf.o build

2018-01-26 Thread Mickaël Salaün


On 26/01/2018 03:16, Alexei Starovoitov wrote:
> On Fri, Jan 26, 2018 at 01:39:30AM +0100, Mickaël Salaün wrote:
>> Do not build lib/bpf/bpf.o with this Makefile but use the one from the
>> library directory.  This avoid making a buggy bpf.o file (e.g. missing
>> symbols).
> 
> could you provide an example?
> What symbols will be missing?
> I don't think there is an issue with existing Makefile.

You can run this commands:
make -C samples/bpf; nm tools/lib/bpf/bpf.o > a; make -C tools/lib/bpf;
nm tools/lib/bpf/bpf.o > b; diff -u a b

Symbols like bzero and sys_bpf are missing with the samples/bpf
Makefile, which makes the bpf.o shrink from 25K to 7K.

> 
>> This patch is useful if some code (e.g. Landlock tests) needs both the
>> bpf.o (from tools/lib/bpf) and the bpf_load.o (from samples/bpf).
> 
> is that some future patches?

Yes, I'll send them next week.

> 
> we're trying to move everything form samples/bpf/ into selftests/bpf/
> and convert to use libbpf.a instead of obsolete bpf_load.c
> Please use this approach for landlock as well.

Ok, it should be better with this lib.

> 
>> Signed-off-by: Mickaël Salaün 
>> Cc: Alexei Starovoitov 
>> Cc: Daniel Borkmann 
>> ---
>>
>> This is not a complet fix because the call to multi_depend with
>> $(host-cmulti) from scripts/Makefile.host force the build of bpf.o
>> anyway. I'm not sure how to completely avoid this automatic build
>> though.
>> ---
>>  samples/bpf/Makefile | 5 -
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
>> index 7f61a3d57fa7..64335bb94f9f 100644
>> --- a/samples/bpf/Makefile
>> +++ b/samples/bpf/Makefile
>> @@ -201,13 +201,16 @@ CLANG_ARCH_ARGS = -target $(ARCH)
>>  endif
>>  
>>  # Trick to allow make to be run from this directory
>> -all:
>> +all: $(LIBBPF)
>>  $(MAKE) -C ../../ $(CURDIR)/
>>  
>>  clean:
>>  $(MAKE) -C ../../ M=$(CURDIR) clean
>>  @rm -f *~
>>  
>> +$(LIBBPF): FORCE
>> +$(MAKE) -C $(dir $@) $(notdir $@)
>> +
>>  $(obj)/syscall_nrs.s:   $(src)/syscall_nrs.c
>>  $(call if_changed_dep,cc_s_c)
>>  
>> -- 
>> 2.15.1
>>
> 



signature.asc
Description: OpenPGP digital signature

[PATCH v2] of: use hash based search in of_find_node_by_phandle

2018-01-26 Thread Chintan Pandya

of_find_node_by_phandle() takes a lot of time (1ms per
call) to find right node when your intended device is
too deeper in the fdt. Reason is, we search for each
device serially in the fdt. See this,

struct device_node *__of_find_all_nodes(struct device_node *prev)
{
struct device_node *np;
if (!prev) {
np = of_root;
} else if (prev->child) {
np = prev->child;
} else {
/* Walk back up looking for a sibling, or the end of the 
structure */
np = prev;
while (np->parent && !np->sibling)
np = np->parent;
np = np->sibling; /* Might be null at the end of the tree */
}
return np;
}

#define for_each_of_allnodes_from(from, dn) \
for (dn = __of_find_all_nodes(from); dn; dn = __of_find_all_nodes(dn))
#define for_each_of_allnodes(dn) for_each_of_allnodes_from(NULL, dn)

Implement, device-phandle relation in hash-table so
that look up can be faster, irrespective of where my
device is defined in the DT.

There are ~6.7k calls to of_find_node_by_phandle() and
total improvement observed during boot is 400ms.

Signed-off-by: Chintan Pandya 
---
 drivers/of/base.c  |  8 ++--
 drivers/of/fdt.c   | 18 ++
 include/linux/of.h |  6 ++
 3 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/of/base.c b/drivers/of/base.c
index 26618ba..bfbfa99 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -1005,10 +1005,14 @@ struct device_node *of_find_node_by_phandle(phandle 
handle)
if (!handle)
return NULL;
 
-   raw_spin_lock_irqsave(&devtree_lock, flags);
-   for_each_of_allnodes(np)
+   spin_lock(&dt_hash_spinlock);
+   hash_for_each_possible(dt_hash_table, np, hash, handle)
if (np->phandle == handle)
break;
+
+   spin_unlock(&dt_hash_spinlock);
+
+   raw_spin_lock_irqsave(&devtree_lock, flags);
of_node_get(np);
raw_spin_unlock_irqrestore(&devtree_lock, flags);
return np;
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 4675e5a..62a9a4c 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -33,6 +33,10 @@
 
 #include "of_private.h"
 
+static bool dt_hash_needs_init = true;
+DECLARE_HASHTABLE(dt_hash_table, DT_HASH_BITS);
+DEFINE_SPINLOCK(dt_hash_spinlock);
+
 /*
  * of_fdt_limit_memory - limit the number of regions in the /memory node
  * @limit: maximum entries
@@ -242,6 +246,20 @@ static void populate_properties(const void *blob,
pprev  = &pp->next;
}
 
+   /*
+* In 'dryrun = true' cases, np is some non-NULL junk. So, protect
+* against those cases.
+*/
+   if (!dryrun && np->phandle) {
+   spin_lock(&dt_hash_spinlock);
+   if (dt_hash_needs_init) {
+   dt_hash_needs_init = false;
+   hash_init(dt_hash_table);
+   }
+   hash_add(dt_hash_table, &np->hash, np->phandle);
+   spin_unlock(&dt_hash_spinlock);
+   }
+
/* With version 0x10 we may not have the name property,
 * recreate it here from the unit name if absent
 */
diff --git a/include/linux/of.h b/include/linux/of.h
index d3dea1d..2e3ba84 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -69,6 +70,7 @@ struct device_node {
 #endif
unsigned long _flags;
void*data;
+   struct hlist_node hash;
 #if defined(CONFIG_SPARC)
const char *path_component_name;
unsigned int unique_id;
@@ -76,6 +78,10 @@ struct device_node {
 #endif
 };
 
+#define DT_HASH_BITS 6
+extern DECLARE_HASHTABLE(dt_hash_table, DT_HASH_BITS);
+extern spinlock_t dt_hash_spinlock;
+
 #define MAX_PHANDLE_ARGS 16
 struct of_phandle_args {
struct device_node *np;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

A problem met when using crash with randomized kernel

2018-01-26 Thread Cao jin

Hi,
  Recently I am testing crash tool with structure layout randomized
kernel, and crash failed to work with it.

When using "gdb vmlinux"to examine both the randomized and
non-randomized vmlinux's debuginfo, take struct uts_namespace for
example, the output of "ptype struct uts_namespace" is identical in both
debuginfo.  And this causes a big trouble to crash utility, because
crash uses the embedded gdb to retrieve some structure member's offset
in debuginfo, and crash got the same offset value for every structure
member in both condition.

It is believed by gcc maintainer[*] that it is a problem of the plugin,
not gcc

[*]https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84052

So, could you confirm it is plugin bug or not?
-- 
Sincerely,
Cao jin

[PATCH] base: power: domain: Replace mdelay with msleep

2018-01-26 Thread Jia-Ju Bai

After checking all possible call chains to genpd_dev_pm_detach() and
genpd_dev_pm_attach() here,
my tool finds that these functions are never called in atomic context,
namely never in an interrupt handler or holding a spinlock.
Thus mdelay can be replaced with msleep to avoid busy wait.

This is found by a static analysis tool named DCNS written by myself.

Signed-off-by: Jia-Ju Bai 
---
 drivers/base/power/domain.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index 0c80bea..f84ac72 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -2144,7 +2144,7 @@ static void genpd_dev_pm_detach(struct device *dev, bool 
power_off)
if (ret != -EAGAIN)
break;
 
-   mdelay(i);
+   msleep(i);
cond_resched();
}
 
@@ -2231,7 +2231,7 @@ int genpd_dev_pm_attach(struct device *dev)
if (ret != -EAGAIN)
break;
 
-   mdelay(i);
+   msleep(i);
cond_resched();
}
mutex_unlock(&gpd_list_lock);
-- 
1.7.9.5

Re: [PATCH RFC 06/16] rcuperf: Set gp_exp to true for tests to run

2018-01-26 Thread Lihao Liang

On Thu, Jan 25, 2018 at 6:18 AM, Paul E. McKenney
 wrote:
> On Tue, Jan 23, 2018 at 03:59:31PM +0800, liangli...@huawei.com wrote:
>> From: Lihao Liang 
>>
>> Signed-off-by: Lihao Liang 
>> ---
>>  kernel/rcu/rcuperf.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/rcu/rcuperf.c b/kernel/rcu/rcuperf.c
>> index ea80fa3e..baccc123 100644
>> --- a/kernel/rcu/rcuperf.c
>> +++ b/kernel/rcu/rcuperf.c
>> @@ -60,7 +60,7 @@ MODULE_AUTHOR("Paul E. McKenney 
>> ");
>>  #define VERBOSE_PERFOUT_ERRSTRING(s) \
>>   do { if (verbose) pr_alert("%s" PERF_FLAG "!!! %s\n", perf_type, s); } 
>> while (0)
>>
>> -torture_param(bool, gp_exp, false, "Use expedited GP wait primitives");
>> +torture_param(bool, gp_exp, true, "Use expedited GP wait primitives");
>
> This is fine as a convenience for internal testing, but the usual way
> to make this happen is using the rcuperf.gp_exp kernel boot parameter.
> Or was that not working for you?
>

Sure. It should work if rcuperf.gp_exp=1 is added to the .boot files
(it wouldn't work rcuperf.gp_exp=false is used).

Thanks,
Lihao.

> Thanx, Paul
>
>>  torture_param(int, holdoff, 10, "Holdoff time before test start (s)");
>>  torture_param(int, nreaders, -1, "Number of RCU reader threads");
>>  torture_param(int, nwriters, -1, "Number of RCU updater threads");
>> --
>> 2.14.1.729.g59c0ea183
>>
>

Re: [PATCH v3] iommu/mediatek: Move attach_device after iommu-group is ready for M4Uv1

2018-01-26 Thread Yong Wu

On Thu, 2018-01-25 at 12:02 +, Robin Murphy wrote:
> On 25/01/18 11:14, Yong Wu wrote:
> > In the commit 05f80300dc8b, the iommu framework has supposed all the
> > iommu drivers have their owner iommu-group, it get rid of the FIXME
> > workarounds while the group is NULL. But the flow of Mediatek M4U gen1
> > looks a bit trick that it will hang at this case:
> > 
> > ==
> > Unable to handle kernel NULL pointer dereference at virtual address 0030
> > pgd = c0004000
> > [0030] *pgd=
> > PC is at mutex_lock+0x28/0x54
> > LR is at iommu_attach_device+0xa4/0xd4
> > pc : []lr : []psr: 6013
> > sp : df0edbb8  ip : df0edbc8  fp : df0edbc4
> > r10: c114da14  r9 : df2a3e40  r8 : 0003
> > r7 : df27a210  r6 : df2a90c4  r5 : 0030  r4 : 
> > r3 : df0f8000  r2 : f000  r1 : df29c610  r0 : 0030
> > Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
> > xxx
> > (mutex_lock) from [] (iommu_attach_device+0xa4/0xd4)
> > (iommu_attach_device) from [] 
> > (__arm_iommu_attach_device+0x28/0x90)
> > (__arm_iommu_attach_device) from [] 
> > (arm_iommu_attach_device+0x1c/0x30)
> > (arm_iommu_attach_device) from [] 
> > (mtk_iommu_add_device+0xfc/0x214)
> > (mtk_iommu_add_device) from [] (add_iommu_group+0x3c/0x68)
> > (add_iommu_group) from [] (bus_for_each_dev+0x78/0xac)
> > (bus_for_each_dev) from [] (bus_set_iommu+0xb0/0xec)
> > (bus_set_iommu) from [] (mtk_iommu_probe+0x328/0x368)
> > (mtk_iommu_probe) from [] (platform_drv_probe+0x5c/0xc0)
> > (platform_drv_probe) from [] (driver_probe_device+0x2f4/0x4d8)
> > (driver_probe_device) from [] (__driver_attach+0x10c/0x128)
> > (__driver_attach) from [] (bus_for_each_dev+0x78/0xac)
> > (bus_for_each_dev) from [] (driver_attach+0x2c/0x30)
> > (driver_attach) from [] (bus_add_driver+0x1e0/0x278)
> > (bus_add_driver) from [] (driver_register+0x88/0x108)
> > (driver_register) from [] (__platform_driver_register+0x50/0x58)
> > (__platform_driver_register) from [] (m4u_init+0x24/0x28)
> > (m4u_init) from [] (do_one_initcall+0xf0/0x17c)
> > =
> > 
> > The root cause is that the device's iommu-group is NULL while
> > arm_iommu_attach_device is called. This patch prepare a new iommu-group
> > for the iommu consumer devices to fix this issue.
> > 
> > CC: Robin Murphy 
> > CC: Honghui Zhang 
> > Fixes: 05f80300dc8b ('iommu: Finish making iommu_group support mandatory')
> > Reported-by: Ryder Lee 
> > Signed-off-by: Yong Wu 
> > ---
> > changes notes:
> > v3: don't use the global variable and allocate a new iommu group before
> >  arm_iommu_attach_device following Robin's suggestion.
> > 
> > v2: 
> > http://lists.infradead.org/pipermail/linux-mediatek/2018-January/011810.html
> > Add mtk_domain_v1=NULL in domain_free for symmetry.
> > 
> > v1: https://patchwork.kernel.org/patch/10176255/
> > ---
> >   drivers/iommu/mtk_iommu_v1.c | 49 
> > ++--
> >   1 file changed, 25 insertions(+), 24 deletions(-)
> > 
> > diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
> > index 542930c..aca76d2 100644
> > --- a/drivers/iommu/mtk_iommu_v1.c
> > +++ b/drivers/iommu/mtk_iommu_v1.c
> > @@ -418,20 +418,12 @@ static int mtk_iommu_create_mapping(struct device 
> > *dev,
> > m4udev->archdata.iommu = mtk_mapping;
> > }
> >   
> > -   ret = arm_iommu_attach_device(dev, mtk_mapping);
> > -   if (ret)
> > -   goto err_release_mapping;
> > -
> > return 0;
> > -
> > -err_release_mapping:
> > -   arm_iommu_release_mapping(mtk_mapping);
> > -   m4udev->archdata.iommu = NULL;
> > -   return ret;
> >   }
> >   
> >   static int mtk_iommu_add_device(struct device *dev)
> >   {
> > +   struct dma_iommu_mapping *mtk_mapping;
> > struct of_phandle_args iommu_spec;
> > struct of_phandle_iterator it;
> > struct mtk_iommu_data *data;
> > @@ -452,9 +444,30 @@ static int mtk_iommu_add_device(struct device *dev)
> > if (!dev->iommu_fwspec || dev->iommu_fwspec->ops != &mtk_iommu_ops)
> > return -ENODEV; /* Not a iommu client device */
> >   
> > +   /*
> > +* This is a short-term bodge because the ARM DMA code doesn't
> > +* understand multi-device groups, but we have to call into it
> > +* successfully (and not just rely on a normal IOMMU API attach
> > +* here) in order to set the correct DMA API ops on @dev.
> > +*/
> > +   group = iommu_group_alloc();
> > +   if (IS_ERR(group))
> > +   return PTR_ERR(group);
> > +
> > +   err = iommu_group_add_device(group, dev);
> > +   iommu_group_put(group);
> > +   if (err)
> > +   return err;
> > +
> > data = dev->iommu_fwspec->iommu_priv;
> > -   iommu_device_link(&data->iommu, dev);
> > +   mtk_mapping = data->dev->archdata.iommu;
> > +   err = arm_iommu_attach_device(dev, mtk_mapping);
> > +   if (err) {
> > +   iommu_group_remove_device(dev);
> > +   return err;
> > +   }
>

Re: [PATCH net-next 0/3 V1] rtnetlink: enable IFLA_IF_NETNSID for RTM_{DEL,SET}LINK

2018-01-26 Thread Jiri Benc

On Fri, 26 Jan 2018 00:34:51 +0100, Nicolas Dichtel wrote:
> Why meaningful? The user knows that the answer is like if if was done in 
> another
> netns. It enables to have only one netlink socket instead of one per netns. 
> But
> the code using it will be the same.  

Because you can't use it to query the linked interface. You can't even
use it as an opaque value to track interfaces (netnsid+ifindex) because
netnsids are not unique across net name spaces. You can easily have two
interfaces that have all the ifindex, ifname, netnsid (and basically
everything else) identical but being completely different interfaces.
That's really not helpful.

> I fear that with your approach, it will results to a lot of complexity in the
> kernel.  

The complexity is (at least partly) already there. It's an inevitable
result of the design decision to have relative identifiers.

I agree that we should think about how to make this easy to implement.
I like your idea of doing this somehow generically. Perhaps it's
possible to do while keeping the netnsids valid in the caller's netns?

> What is really missing for me, is a way to get a fd from an nsid. The user
> should be able to call RTM_GETNSID with an fd and a nsid and the kernel 
> performs
> the needed operations so that the fd points to the corresponding netns.  

That's what I was missing, too. I even looked into implementing it. But
opening a fd on behalf of the process and returning it over netlink is a
wrong thing to do. Netlink messages can get lost. Then you have a fd
leak you can do nothing about.

Given that we have netnsids used for so much stuff already (like
NETLINK_LISTEN_ALL_NSID) you need to track them anyway. And if you need
to track them, why bother with another identifier? It would be better
if netnsid can be used universally for anything. Then there will be no
need for the conversion.

 Jiri

[PATCH BUGFIX] ARM: dts: imx6ull: fix the imx6ull-14x14-evk configuration

2018-01-26 Thread Lothar Waßmann

imx6ull-14x14-evk.dts currently includes the imx6ul.dtsi file for an
i.MX6ULL SoC which is plain wrong.

Rename the current imx6ul-14x14-evk.dts to .dtsi and include it from
imx6ul-14x14-evk.dts and imx6ull-14x14-evk.dts, so that both can
include the appropriate SoC specific (imx6ul.dtsi/imx6ull.dtsi) file.

Signed-off-by: Lothar Waßmann 
---
 arch/arm/boot/dts/imx6ul-14x14-evk.dts  | 480 +---
 arch/arm/boot/dts/imx6ull-14x14-evk.dts |   5 +-
 2 files changed, 5 insertions(+), 480 deletions(-)

diff --git a/arch/arm/boot/dts/imx6ul-14x14-evk.dts 
b/arch/arm/boot/dts/imx6ul-14x14-evk.dts
index 18fdb08..6d720b2 100644
--- a/arch/arm/boot/dts/imx6ul-14x14-evk.dts
+++ b/arch/arm/boot/dts/imx6ul-14x14-evk.dts
@@ -9,487 +9,9 @@
 /dts-v1/;
 
 #include "imx6ul.dtsi"
+#include "imx6ul-14x14-evk.dtsi"
 
 / {
model = "Freescale i.MX6 UltraLite 14x14 EVK Board";
compatible = "fsl,imx6ul-14x14-evk", "fsl,imx6ul";
-
-   chosen {
-   stdout-path = &uart1;
-   };
-
-   memory {
-   reg = <0x8000 0x2000>;
-   };
-
-   backlight_display: backlight-display {
-   compatible = "pwm-backlight";
-   pwms = <&pwm1 0 500>;
-   brightness-levels = <0 4 8 16 32 64 128 255>;
-   default-brightness-level = <6>;
-   status = "okay";
-   };
-
-
-   reg_sd1_vmmc: regulator-sd1-vmmc {
-   compatible = "regulator-fixed";
-   regulator-name = "VSD_3V3";
-   regulator-min-microvolt = <330>;
-   regulator-max-microvolt = <330>;
-   gpio = <&gpio1 9 GPIO_ACTIVE_HIGH>;
-   enable-active-high;
-   };
-
-   sound {
-   compatible = "simple-audio-card";
-   simple-audio-card,name = "mx6ul-wm8960";
-   simple-audio-card,format = "i2s";
-   simple-audio-card,bitclock-master = <&dailink_master>;
-   simple-audio-card,frame-master = <&dailink_master>;
-   simple-audio-card,widgets =
-   "Microphone", "Mic Jack",
-   "Line", "Line In",
-   "Line", "Line Out",
-   "Speaker", "Speaker",
-   "Headphone", "Headphone Jack";
-   simple-audio-card,routing =
-   "Headphone Jack", "HP_L",
-   "Headphone Jack", "HP_R",
-   "Speaker", "SPK_LP",
-   "Speaker", "SPK_LN",
-   "Speaker", "SPK_RP",
-   "Speaker", "SPK_RN",
-   "LINPUT1", "Mic Jack",
-   "LINPUT3", "Mic Jack",
-   "RINPUT1", "Mic Jack",
-   "RINPUT2", "Mic Jack";
-
-   simple-audio-card,cpu {
-   sound-dai = <&sai2>;
-   };
-
-   dailink_master: simple-audio-card,codec {
-   sound-dai = <&codec>;
-   clocks = <&clks IMX6UL_CLK_SAI2>;
-   };
-   };
-
-   panel {
-   compatible = "innolux,at043tn24";
-   backlight = <&backlight_display>;
-
-   port {
-   panel_in: endpoint {
-   remote-endpoint = <&display_out>;
-   };
-   };
-   };
-};
-
-&clks {
-   assigned-clocks = <&clks IMX6UL_CLK_PLL4_AUDIO_DIV>;
-   assigned-clock-rates = <786432000>;
-};
-
-&i2c2 {
-   clock_frequency = <10>;
-   pinctrl-names = "default";
-   pinctrl-0 = <&pinctrl_i2c2>;
-   status = "okay";
-
-   codec: wm8960@1a {
-   #sound-dai-cells = <0>;
-   compatible = "wlf,wm8960";
-   reg = <0x1a>;
-   wlf,shared-lrclk;
-   };
-};
-
-&fec1 {
-   pinctrl-names = "default";
-   pinctrl-0 = <&pinctrl_enet1>;
-   phy-mode = "rmii";
-   phy-handle = <ðphy0>;
-   status = "okay";
-};
-
-&fec2 {
-   pinctrl-names = "default";
-   pinctrl-0 = <&pinctrl_enet2>;
-   phy-mode = "rmii";
-   phy-handle = <ðphy1>;
-   status = "okay";
-
-   mdio {
-   #address-cells = <1>;
-   #size-cells = <0>;
-
-   ethphy0: ethernet-phy@2 {
-   reg = <2>;
-   micrel,led-mode = <1>;
-   clocks = <&clks IMX6UL_CLK_ENET_REF>;
-   clock-names = "rmii-ref";
-   };
-
-   ethphy1: ethernet-phy@1 {
-   reg = <1>;
-   micrel,led-mode = <1>;
-   clocks = <&clks IMX6UL_CLK_ENET2_REF>;
-   clock-names = "rmii-ref";
-   };
-   };
-};
-
-
-&lcdif {
-   assigned-clocks = <&clks IMX6UL_CLK_LCDIF_PRE_SEL>;
-   assigned-clock-parents = <&clks IMX6UL_CLK_PL

Re: [RFC PATCH 1/9] media: add request API core and UAPI

2018-01-26 Thread Sakari Ailus

Hi Alexandre,

I remember it was discussed that the work after the V4L2 jobs API would
continue from the existing request API patches. I see that at least the
rather important support for events is missing in this version. Why was it
left out?

I also see that variable size IOCTL argument support is no longer included.

On Fri, Dec 15, 2017 at 04:56:17PM +0900, Alexandre Courbot wrote:
> The request API provides a way to group buffers and device parameters
> into units of work to be queued and executed. This patch introduces the
> UAPI and core framework.
> 
> This patch is based on the previous work by Laurent Pinchart. The core
> has changed considerably, but the UAPI is mostly untouched.
> 
> Signed-off-by: Alexandre Courbot 
> ---
>  drivers/media/Makefile   |   3 +-
>  drivers/media/media-device.c |   6 +
>  drivers/media/media-request.c| 390 
> +++
>  drivers/media/v4l2-core/v4l2-ioctl.c |   2 +-
>  include/media/media-device.h |   3 +
>  include/media/media-entity.h |   6 +
>  include/media/media-request.h| 269 
>  include/uapi/linux/media.h   |  11 +
>  8 files changed, 688 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/media/media-request.c
>  create mode 100644 include/media/media-request.h
> 
> diff --git a/drivers/media/Makefile b/drivers/media/Makefile
> index 594b462ddf0e..985d35ec6b29 100644
> --- a/drivers/media/Makefile
> +++ b/drivers/media/Makefile
> @@ -3,7 +3,8 @@
>  # Makefile for the kernel multimedia device drivers.
>  #
>  
> -media-objs   := media-device.o media-devnode.o media-entity.o
> +media-objs   := media-device.o media-devnode.o media-entity.o \
> +media-request.o
>  
>  #
>  # I2C drivers should come before other drivers, otherwise they'll fail
> diff --git a/drivers/media/media-device.c b/drivers/media/media-device.c
> index e79f72b8b858..045cec7d2de9 100644
> --- a/drivers/media/media-device.c
> +++ b/drivers/media/media-device.c
> @@ -32,6 +32,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #ifdef CONFIG_MEDIA_CONTROLLER
>  
> @@ -407,6 +408,7 @@ static const struct media_ioctl_info ioctl_info[] = {
>   MEDIA_IOC(ENUM_LINKS, media_device_enum_links, 
> MEDIA_IOC_FL_GRAPH_MUTEX),
>   MEDIA_IOC(SETUP_LINK, media_device_setup_link, 
> MEDIA_IOC_FL_GRAPH_MUTEX),
>   MEDIA_IOC(G_TOPOLOGY, media_device_get_topology, 
> MEDIA_IOC_FL_GRAPH_MUTEX),
> + MEDIA_IOC(REQUEST_CMD, media_device_request_cmd, 0),
>  };
>  
>  static long media_device_ioctl(struct file *filp, unsigned int cmd,
> @@ -688,6 +690,10 @@ EXPORT_SYMBOL_GPL(media_device_init);
>  
>  void media_device_cleanup(struct media_device *mdev)
>  {
> + if (mdev->req_queue) {
> + mdev->req_queue->ops->release(mdev->req_queue);
> + mdev->req_queue = NULL;
> + }
>   ida_destroy(&mdev->entity_internal_idx);
>   mdev->entity_internal_idx_max = 0;
>   media_graph_walk_cleanup(&mdev->pm_count_walk);
> diff --git a/drivers/media/media-request.c b/drivers/media/media-request.c
> new file mode 100644
> index ..15dc65ddfe41
> --- /dev/null
> +++ b/drivers/media/media-request.c
> @@ -0,0 +1,390 @@
> +/*
> + * Request and request queue base management
> + *
> + * Copyright (C) 2017, The Chromium OS Authors.  All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 

Alphabetic order, please.

> +
> +const struct file_operations request_fops;
> +
> +void media_request_get(struct media_request *req)

Please return the media request, too.

> +{
> + kref_get(&req->kref);
> +}
> +EXPORT_SYMBOL_GPL(media_request_get);
> +
> +struct media_request *
> +media_request_get_from_fd(int fd)
> +{
> + struct file *f;
> + struct media_request *req;
> +
> + f = fget(fd);
> + if (!f)
> + return NULL;
> +
> + /* Not a request FD? */
> + if (f->f_op != &request_fops) {
> + fput(f);
> + return NULL;
> + }
> +
> + req = f->private_data;
> + media_request_get(req);
> + fput(f);
> +
> + return req;
> +}
> +EXPORT_SYMBOL_GPL(media_request_get_from_fd);
> +
> +static void media_request_release(struct kref *kref)
> +{
> + struct media_request_entity_data *data, *next;
> + struct media_request *req =
> + container_of(kref, typeof(*req), kref);
> + struct media_device *mdev = req->queue->m

[PATCH 3/3] ARM: dts: imx6ull: address some more incompatibilites between i.MX6UL and i.MX6ULL

2018-01-26 Thread Lothar Waßmann

The i.MX6ULL doesn't have the CAAM engine nor any SIM interface.
These are currently not implemented for i.MX6UL but it cannot hurt to
delete the corresponding nodes from the i.MX6ULL DTB anyway.

Signed-off-by: Lothar Waßmann 
---
 arch/arm/boot/dts/imx6ull.dtsi | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm/boot/dts/imx6ull.dtsi b/arch/arm/boot/dts/imx6ull.dtsi
index 8724fdb2..da325cd 100644
--- a/arch/arm/boot/dts/imx6ull.dtsi
+++ b/arch/arm/boot/dts/imx6ull.dtsi
@@ -67,6 +67,12 @@
};
};
 
+   aips-bus@210 {
+   /delete-node/ caam@2140;
+   /delete-node/ sim@218c;
+   /delete-node/ sim@21b4;
+   };
+
aips3: aips-bus@220 {
compatible = "fsl,aips-bus", "simple-bus";
#address-cells = <1>;
-- 
2.1.4

[PATCH 1/3] ARM: dts: imx6ull: fix the i.MX6ULL UART8 configuration

2018-01-26 Thread Lothar Waßmann

UART8 on i.MX6ULL is not located on the SPBA bus like on i.MX6UL but
on the (otherwise unused) AIPS-3 bus.
Create the appropriate AIPS-3 bus configuration and move the uart8
definition where it belongs.

Signed-off-by: Lothar Waßmann 
---
 arch/arm/boot/dts/imx6ull.dtsi | 29 +
 1 file changed, 29 insertions(+)

diff --git a/arch/arm/boot/dts/imx6ull.dtsi b/arch/arm/boot/dts/imx6ull.dtsi
index 0c18291..abc815f 100644
--- a/arch/arm/boot/dts/imx6ull.dtsi
+++ b/arch/arm/boot/dts/imx6ull.dtsi
@@ -41,3 +41,32 @@
 
 #include "imx6ul.dtsi"
 #include "imx6ull-pinfunc.h"
+
+/ {
+   soc {
+   aips-bus@200 {
+   spba-bus@200 {
+   /delete-node/ serial@2024000;
+   };
+   };
+
+   aips3: aips-bus@220 {
+   compatible = "fsl,aips-bus", "simple-bus";
+   #address-cells = <1>;
+   #size-cells = <1>;
+   reg = <0x0220 0x10>;
+   ranges;
+
+   uart8: serial@2288000 {
+   compatible = "fsl,imx6ul-uart",
+"fsl,imx6q-uart";
+   reg = <0x02288000 0x4000>;
+   interrupts = ;
+   clocks = <&clks IMX6UL_CLK_IPG>,
+<&clks IMX6UL_CLK_UART8_SERIAL>;
+   clock-names = "ipg", "per";
+   status = "disabled";
+   };
+   };
+   };
+};
-- 
2.1.4

[PATCH] opp: cpu: Replace GFP_ATOMIC with GFP_KERNEL in dev_pm_opp_init_cpufreq_table

2018-01-26 Thread Jia-Ju Bai

After checking all possible call chains to 
dev_pm_opp_init_cpufreq_table() here,
my tool finds that this function is never called in atomic context, 
namely never in an interrupt handler or holding a spinlock.
And dev_pm_opp_init_cpufreq_table() calls dev_pm_opp_get_opp_count(), 
which calls mutex_lock that can sleep.
It indicates that atmtcp_v_send() can call functions which may sleep.
Thus GFP_ATOMIC is not necessary, and it can be replaced with GFP_KERNEL.

This is found by a static analysis tool named DCNS written by myself.

Signed-off-by: Jia-Ju Bai 
---
 drivers/opp/cpu.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/opp/cpu.c b/drivers/opp/cpu.c
index 2d87bc1..0c09107 100644
--- a/drivers/opp/cpu.c
+++ b/drivers/opp/cpu.c
@@ -55,7 +55,7 @@ int dev_pm_opp_init_cpufreq_table(struct device *dev,
if (max_opps <= 0)
return max_opps ? max_opps : -ENODATA;
 
-   freq_table = kcalloc((max_opps + 1), sizeof(*freq_table), GFP_ATOMIC);
+   freq_table = kcalloc((max_opps + 1), sizeof(*freq_table), GFP_KERNEL);
if (!freq_table)
return -ENOMEM;
 
-- 
1.7.9.5

Re: [PATCH RFC 07/16] prcu: Implement call_prcu() API

2018-01-26 Thread Lihao Liang

On Thu, Jan 25, 2018 at 6:20 AM, Paul E. McKenney
 wrote:
> On Tue, Jan 23, 2018 at 03:59:32PM +0800, liangli...@huawei.com wrote:
>> From: Lihao Liang 
>>
>> This is PRCU's counterpart of RCU's call_rcu() API.
>>
>> Reviewed-by: Heng Zhang 
>> Signed-off-by: Lihao Liang 
>> ---
>>  include/linux/prcu.h | 25 
>>  init/main.c  |  2 ++
>>  kernel/rcu/prcu.c| 67 
>> +---
>>  3 files changed, 91 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/prcu.h b/include/linux/prcu.h
>> index 653b4633..e5e09c9b 100644
>> --- a/include/linux/prcu.h
>> +++ b/include/linux/prcu.h
>> @@ -2,15 +2,36 @@
>>  #define __LINUX_PRCU_H
>>
>>  #include 
>> +#include 
>>  #include 
>>  #include 
>>
>>  #define CONFIG_PRCU
>>
>> +struct prcu_version_head {
>> + unsigned long long version;
>> + struct prcu_version_head *next;
>> +};
>> +
>> +/* Simple unsegmented callback list for PRCU. */
>> +struct prcu_cblist {
>> + struct rcu_head *head;
>> + struct rcu_head **tail;
>> + struct prcu_version_head *version_head;
>> + struct prcu_version_head **version_tail;
>> + long len;
>> +};
>> +
>> +#define PRCU_CBLIST_INITIALIZER(n) { \
>> + .head = NULL, .tail = &n.head, \
>> + .version_head = NULL, .version_tail = &n.version_head, \
>> +}
>> +
>>  struct prcu_local_struct {
>>   unsigned int locked;
>>   unsigned int online;
>>   unsigned long long version;
>> + struct prcu_cblist cblist;
>>  };
>>
>>  struct prcu_struct {
>> @@ -24,6 +45,8 @@ struct prcu_struct {
>>  void prcu_read_lock(void);
>>  void prcu_read_unlock(void);
>>  void synchronize_prcu(void);
>> +void call_prcu(struct rcu_head *head, rcu_callback_t func);
>> +void prcu_init(void);
>>  void prcu_note_context_switch(void);
>>
>>  #else /* #ifdef CONFIG_PRCU */
>> @@ -31,6 +54,8 @@ void prcu_note_context_switch(void);
>>  #define prcu_read_lock() do {} while (0)
>>  #define prcu_read_unlock() do {} while (0)
>>  #define synchronize_prcu() do {} while (0)
>> +#define call_prcu() do {} while (0)
>> +#define prcu_init() do {} while (0)
>>  #define prcu_note_context_switch() do {} while (0)
>>
>>  #endif /* #ifdef CONFIG_PRCU */
>> diff --git a/init/main.c b/init/main.c
>> index f8665104..4925964e 100644
>> --- a/init/main.c
>> +++ b/init/main.c
>> @@ -38,6 +38,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  #include 
>>  #include 
>>  #include 
>> @@ -574,6 +575,7 @@ asmlinkage __visible void __init start_kernel(void)
>>   workqueue_init_early();
>>
>>   rcu_init();
>> + prcu_init();
>>
>>   /* Trace events are available after this */
>>   trace_init();
>> diff --git a/kernel/rcu/prcu.c b/kernel/rcu/prcu.c
>> index a00b9420..f198285c 100644
>> --- a/kernel/rcu/prcu.c
>> +++ b/kernel/rcu/prcu.c
>> @@ -1,11 +1,12 @@
>>  #include 
>> -#include 
>>  #include 
>> -#include 
>> +#include 
>>  #include 
>> -
>> +#include 
>>  #include 
>>
>> +#include "rcu.h"
>> +
>>  DEFINE_PER_CPU_SHARED_ALIGNED(struct prcu_local_struct, prcu_local);
>>
>>  struct prcu_struct global_prcu = {
>> @@ -16,6 +17,16 @@ struct prcu_struct global_prcu = {
>>  };
>>  struct prcu_struct *prcu = &global_prcu;
>>
>> +/* Initialize simple callback list. */
>> +static void prcu_cblist_init(struct prcu_cblist *rclp)
>> +{
>> + rclp->head = NULL;
>> + rclp->tail = &rclp->head;
>> + rclp->version_head = NULL;
>> + rclp->version_tail = &rclp->version_head;
>> + rclp->len = 0;
>> +}
>> +
>>  static inline void prcu_report(struct prcu_local_struct *local)
>>  {
>>   unsigned long long global_version;
>> @@ -123,3 +134,53 @@ void prcu_note_context_switch(void)
>>   prcu_report(local);
>>   put_cpu_ptr(&prcu_local);
>>  }
>> +
>> +void call_prcu(struct rcu_head *head, rcu_callback_t func)
>> +{
>> + unsigned long flags;
>> + struct prcu_local_struct *local;
>> + struct prcu_cblist *rclp;
>> + struct prcu_version_head *vhp;
>> +
>> + debug_rcu_head_queue(head);
>> +
>> + /* Use GFP_ATOMIC with IRQs disabled */
>> + vhp = kmalloc(sizeof(struct prcu_version_head), GFP_ATOMIC);
>> + if (!vhp)
>> + return;
>
> Silently failing to post the callback can cause system hangs.  I suggest
> finding some way to avoid allocating on the call_prcu() code path.
>

You're absolutely right. We were also thinking of changing the
function return type from void to int to indicate whether the memory
allocation is successful or not.

Best,
Lihao.

> Thanx, Paul
>
>> +
>> + head->func = func;
>> + head->next = NULL;
>> + vhp->next = NULL;
>> +
>> + local_irq_save(flags);
>> + local = this_cpu_ptr(&prcu_local);
>> + vhp->version = local->version;
>> + rclp = &local->cblist;
>> + rclp->len++;
>> + *rclp->tail = head;
>> + rclp->tail = &head->next;
>> + *rclp->version_tail = vhp;
>> + rclp->versio

Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation

2018-01-26 Thread David Woodhouse

On Thu, 2018-01-25 at 18:11 -0800, Liran Alon wrote:
> 
> P.S:
> It seems to me that all these issues could be resolved completely at
> hardware in future CPUs if BTB/BHB/RSB entries were tagged with
> prediction-mode (or similar metadata). It will be nice if Intel/AMD
> could share if that is the planned long-term solution instead of
> IBRS-all-the-time.

IBRS-all-the-time is tagging with the ring and VMX root/non-root mode,
it seems. That much they could slip into the upcoming generation of
CPUs. And it's supposed to be fast¹; none of the dirty hacks in
microcode that they needed to implement the first-generation IBRS.

But we still need to tag with ASID/VMID and do proper flushing for
those, before we can completely ditch the need to do IBPB at the right
times.

Reading between the lines, I don't think they could add *that* without
stopping the fabs for a year or so while they go back to the drawing
board. But yes, I sincerely hope they *are* planning to do it, and
expose a 'SPECTRE_NO' bit in IA32_ARCH_CAPABILITIES, as soon as is
humanly possible.

¹ Fast enough that we'll want to use it and ALTERNATIVE out the 
  retpolines.

smime.p7s
Description: S/MIME cryptographic signature

[PATCH v2 BUGFIX] ARM: dts: imx6ull: fix the imx6ull-14x14-evk configuration

2018-01-26 Thread Lothar Waßmann

imx6ull-14x14-evk.dts currently includes the imx6ul.dtsi file for an
i.MX6ULL SoC which is plain wrong.

Rename the current imx6ul-14x14-evk.dts to .dtsi and include it from
imx6ul-14x14-evk.dts and imx6ull-14x14-evk.dts, so that both can
include the appropriate SoC specific (imx6ul.dtsi/imx6ull.dtsi) file.

Signed-off-by: Lothar Waßmann 
---
Changes vs v1:
 - The newly created .dtsi file was missing

 arch/arm/boot/dts/imx6ul-14x14-evk.dts  | 480 +--
 arch/arm/boot/dts/imx6ul-14x14-evk.dtsi | 488 
 arch/arm/boot/dts/imx6ull-14x14-evk.dts |   5 +-
 3 files changed, 493 insertions(+), 480 deletions(-)
 create mode 100644 arch/arm/boot/dts/imx6ul-14x14-evk.dtsi

diff --git a/arch/arm/boot/dts/imx6ul-14x14-evk.dts 
b/arch/arm/boot/dts/imx6ul-14x14-evk.dts
index 18fdb08..6d720b2 100644
--- a/arch/arm/boot/dts/imx6ul-14x14-evk.dts
+++ b/arch/arm/boot/dts/imx6ul-14x14-evk.dts
@@ -9,487 +9,9 @@
 /dts-v1/;
 
 #include "imx6ul.dtsi"
+#include "imx6ul-14x14-evk.dtsi"
 
 / {
model = "Freescale i.MX6 UltraLite 14x14 EVK Board";
compatible = "fsl,imx6ul-14x14-evk", "fsl,imx6ul";
-
-   chosen {
-   stdout-path = &uart1;
-   };
-
-   memory {
-   reg = <0x8000 0x2000>;
-   };
-
-   backlight_display: backlight-display {
-   compatible = "pwm-backlight";
-   pwms = <&pwm1 0 500>;
-   brightness-levels = <0 4 8 16 32 64 128 255>;
-   default-brightness-level = <6>;
-   status = "okay";
-   };
-
-
-   reg_sd1_vmmc: regulator-sd1-vmmc {
-   compatible = "regulator-fixed";
-   regulator-name = "VSD_3V3";
-   regulator-min-microvolt = <330>;
-   regulator-max-microvolt = <330>;
-   gpio = <&gpio1 9 GPIO_ACTIVE_HIGH>;
-   enable-active-high;
-   };
-
-   sound {
-   compatible = "simple-audio-card";
-   simple-audio-card,name = "mx6ul-wm8960";
-   simple-audio-card,format = "i2s";
-   simple-audio-card,bitclock-master = <&dailink_master>;
-   simple-audio-card,frame-master = <&dailink_master>;
-   simple-audio-card,widgets =
-   "Microphone", "Mic Jack",
-   "Line", "Line In",
-   "Line", "Line Out",
-   "Speaker", "Speaker",
-   "Headphone", "Headphone Jack";
-   simple-audio-card,routing =
-   "Headphone Jack", "HP_L",
-   "Headphone Jack", "HP_R",
-   "Speaker", "SPK_LP",
-   "Speaker", "SPK_LN",
-   "Speaker", "SPK_RP",
-   "Speaker", "SPK_RN",
-   "LINPUT1", "Mic Jack",
-   "LINPUT3", "Mic Jack",
-   "RINPUT1", "Mic Jack",
-   "RINPUT2", "Mic Jack";
-
-   simple-audio-card,cpu {
-   sound-dai = <&sai2>;
-   };
-
-   dailink_master: simple-audio-card,codec {
-   sound-dai = <&codec>;
-   clocks = <&clks IMX6UL_CLK_SAI2>;
-   };
-   };
-
-   panel {
-   compatible = "innolux,at043tn24";
-   backlight = <&backlight_display>;
-
-   port {
-   panel_in: endpoint {
-   remote-endpoint = <&display_out>;
-   };
-   };
-   };
-};
-
-&clks {
-   assigned-clocks = <&clks IMX6UL_CLK_PLL4_AUDIO_DIV>;
-   assigned-clock-rates = <786432000>;
-};
-
-&i2c2 {
-   clock_frequency = <10>;
-   pinctrl-names = "default";
-   pinctrl-0 = <&pinctrl_i2c2>;
-   status = "okay";
-
-   codec: wm8960@1a {
-   #sound-dai-cells = <0>;
-   compatible = "wlf,wm8960";
-   reg = <0x1a>;
-   wlf,shared-lrclk;
-   };
-};
-
-&fec1 {
-   pinctrl-names = "default";
-   pinctrl-0 = <&pinctrl_enet1>;
-   phy-mode = "rmii";
-   phy-handle = <ðphy0>;
-   status = "okay";
-};
-
-&fec2 {
-   pinctrl-names = "default";
-   pinctrl-0 = <&pinctrl_enet2>;
-   phy-mode = "rmii";
-   phy-handle = <ðphy1>;
-   status = "okay";
-
-   mdio {
-   #address-cells = <1>;
-   #size-cells = <0>;
-
-   ethphy0: ethernet-phy@2 {
-   reg = <2>;
-   micrel,led-mode = <1>;
-   clocks = <&clks IMX6UL_CLK_ENET_REF>;
-   clock-names = "rmii-ref";
-   };
-
-   ethphy1: ethernet-phy@1 {
-   reg = <1>;
-   micrel,led-mode = <1>;
-   clocks = <&clks IMX6UL_CLK_ENET2_REF>;
-

Re: [PATCH BUGFIX] ARM: dts: imx6ull: fix the imx6ull-14x14-evk configuration

2018-01-26 Thread Dong Aisheng

Hi Lothar,

On Fri, Jan 26, 2018 at 4:25 PM, Lothar Waßmann  
wrote:
> imx6ull-14x14-evk.dts currently includes the imx6ul.dtsi file for an
> i.MX6ULL SoC which is plain wrong.
>
> Rename the current imx6ul-14x14-evk.dts to .dtsi and include it from
> imx6ul-14x14-evk.dts and imx6ull-14x14-evk.dts, so that both can
> include the appropriate SoC specific (imx6ul.dtsi/imx6ull.dtsi) file.
>
> Signed-off-by: Lothar Waßmann 
> ---
>  arch/arm/boot/dts/imx6ul-14x14-evk.dts  | 480 
> +---
>  arch/arm/boot/dts/imx6ull-14x14-evk.dts |   5 +-
>  2 files changed, 5 insertions(+), 480 deletions(-)
>

I'm okay with this idea.
But where is imx6ul-14x14-evk.dtsi file?

BTW, would you help also CC all IMX related patches to linux-...@nxp.com
maillist in the future?
Most NXP/FSL internal driver onwers are also in this list that can help review.

Regards
Dong Aisheng

> diff --git a/arch/arm/boot/dts/imx6ul-14x14-evk.dts 
> b/arch/arm/boot/dts/imx6ul-14x14-evk.dts
> index 18fdb08..6d720b2 100644
> --- a/arch/arm/boot/dts/imx6ul-14x14-evk.dts
> +++ b/arch/arm/boot/dts/imx6ul-14x14-evk.dts
> @@ -9,487 +9,9 @@
>  /dts-v1/;
>
>  #include "imx6ul.dtsi"
> +#include "imx6ul-14x14-evk.dtsi"
>
>  / {
> model = "Freescale i.MX6 UltraLite 14x14 EVK Board";
> compatible = "fsl,imx6ul-14x14-evk", "fsl,imx6ul";
> -
> -   chosen {
> -   stdout-path = &uart1;
> -   };
> -
> -   memory {
> -   reg = <0x8000 0x2000>;
> -   };
> -
> -   backlight_display: backlight-display {
> -   compatible = "pwm-backlight";
> -   pwms = <&pwm1 0 500>;
> -   brightness-levels = <0 4 8 16 32 64 128 255>;
> -   default-brightness-level = <6>;
> -   status = "okay";
> -   };
> -
> -
> -   reg_sd1_vmmc: regulator-sd1-vmmc {
> -   compatible = "regulator-fixed";
> -   regulator-name = "VSD_3V3";
> -   regulator-min-microvolt = <330>;
> -   regulator-max-microvolt = <330>;
> -   gpio = <&gpio1 9 GPIO_ACTIVE_HIGH>;
> -   enable-active-high;
> -   };
> -
> -   sound {
> -   compatible = "simple-audio-card";
> -   simple-audio-card,name = "mx6ul-wm8960";
> -   simple-audio-card,format = "i2s";
> -   simple-audio-card,bitclock-master = <&dailink_master>;
> -   simple-audio-card,frame-master = <&dailink_master>;
> -   simple-audio-card,widgets =
> -   "Microphone", "Mic Jack",
> -   "Line", "Line In",
> -   "Line", "Line Out",
> -   "Speaker", "Speaker",
> -   "Headphone", "Headphone Jack";
> -   simple-audio-card,routing =
> -   "Headphone Jack", "HP_L",
> -   "Headphone Jack", "HP_R",
> -   "Speaker", "SPK_LP",
> -   "Speaker", "SPK_LN",
> -   "Speaker", "SPK_RP",
> -   "Speaker", "SPK_RN",
> -   "LINPUT1", "Mic Jack",
> -   "LINPUT3", "Mic Jack",
> -   "RINPUT1", "Mic Jack",
> -   "RINPUT2", "Mic Jack";
> -
> -   simple-audio-card,cpu {
> -   sound-dai = <&sai2>;
> -   };
> -
> -   dailink_master: simple-audio-card,codec {
> -   sound-dai = <&codec>;
> -   clocks = <&clks IMX6UL_CLK_SAI2>;
> -   };
> -   };
> -
> -   panel {
> -   compatible = "innolux,at043tn24";
> -   backlight = <&backlight_display>;
> -
> -   port {
> -   panel_in: endpoint {
> -   remote-endpoint = <&display_out>;
> -   };
> -   };
> -   };
> -};
> -
> -&clks {
> -   assigned-clocks = <&clks IMX6UL_CLK_PLL4_AUDIO_DIV>;
> -   assigned-clock-rates = <786432000>;
> -};
> -
> -&i2c2 {
> -   clock_frequency = <10>;
> -   pinctrl-names = "default";
> -   pinctrl-0 = <&pinctrl_i2c2>;
> -   status = "okay";
> -
> -   codec: wm8960@1a {
> -   #sound-dai-cells = <0>;
> -   compatible = "wlf,wm8960";
> -   reg = <0x1a>;
> -   wlf,shared-lrclk;
> -   };
> -};
> -
> -&fec1 {
> -   pinctrl-names = "default";
> -   pinctrl-0 = <&pinctrl_enet1>;
> -   phy-mode = "rmii";
> -   phy-handle = <ðphy0>;
> -   status = "okay";
> -};
> -
> -&fec2 {
> -   pinctrl-names = "default";
> -   pinctrl-0 = <&pinctrl_enet2>;
> -   phy-mode = "rmii";
> -   phy-handle = <ðphy1>;
> -   status = "okay";
> -
> -   mdio {
> -   #address-cells = <1>;
> -   #size-cells = <0>;
> -
> -   ethphy0: ethernet-phy@2 {
>

Re: [PATCH 1/3] ARM: dts: imx6ull: fix the i.MX6ULL UART8 configuration

2018-01-26 Thread Dong Aisheng

On Fri, Jan 26, 2018 at 09:23:50AM +0100, Lothar Waßmann wrote:
> UART8 on i.MX6ULL is not located on the SPBA bus like on i.MX6UL but
> on the (otherwise unused) AIPS-3 bus.
> Create the appropriate AIPS-3 bus configuration and move the uart8
> definition where it belongs.
> 
> Signed-off-by: Lothar Waßmann 

Stefan seemed already fixed this.

See:
https://patchwork.kernel.org/patch/10156125/

Regards
Dong Aisheng

> ---
>  arch/arm/boot/dts/imx6ull.dtsi | 29 +
>  1 file changed, 29 insertions(+)
> 
> diff --git a/arch/arm/boot/dts/imx6ull.dtsi b/arch/arm/boot/dts/imx6ull.dtsi
> index 0c18291..abc815f 100644
> --- a/arch/arm/boot/dts/imx6ull.dtsi
> +++ b/arch/arm/boot/dts/imx6ull.dtsi
> @@ -41,3 +41,32 @@
>  
>  #include "imx6ul.dtsi"
>  #include "imx6ull-pinfunc.h"
> +
> +/ {
> + soc {
> + aips-bus@200 {
> + spba-bus@200 {
> + /delete-node/ serial@2024000;
> + };
> + };
> +
> + aips3: aips-bus@220 {
> + compatible = "fsl,aips-bus", "simple-bus";
> + #address-cells = <1>;
> + #size-cells = <1>;
> + reg = <0x0220 0x10>;
> + ranges;
> +
> + uart8: serial@2288000 {
> + compatible = "fsl,imx6ul-uart",
> +  "fsl,imx6q-uart";
> + reg = <0x02288000 0x4000>;
> + interrupts = ;
> + clocks = <&clks IMX6UL_CLK_IPG>,
> +  <&clks IMX6UL_CLK_UART8_SERIAL>;
> + clock-names = "ipg", "per";
> + status = "disabled";
> + };
> + };
> + };
> +};
> -- 
> 2.1.4
>

Re: [PATCH v2 01/15] Documentation: add newcx initramfs format description

2018-01-26 Thread Arnd Bergmann

On Fri, Jan 26, 2018 at 3:39 AM, Rob Landley  wrote:

> The problem with 1 second timestamps was you honestly could confuse
> "make" about which file was newer once an exec() could complete in the
> same second having done real work. That was the motivating issue causing
> the change, going to nanoseconds was just the big hammer of "this is
> large enough it won't matter again in our lifetimes". But nanosecond
> time stamps are recording more jitter than useful information, and that
> seems unlikely to change this century?

Sure, the only thing we really need the nanosecond timestamp for is
to keep them identical. E.g. if you use cpio to make an exact copy
of a file system, using microseconds timestamps will round all mtime
values. If you then use 'rsync' to compare/update the two copies
without passing a --modify-window= or --size-only, it will have
to read all files in rather then skipping those with identical size and
mtime.

Side note: the default behavior for file systems is actually to only use
the coarse timestamps of the last timer tick, so you actually do get
identical timestamps in practice, plus six digits of nonsense:

(on tmpfs)
 $ for i in {000..999} ; do > $i ; done; stat --format="%y" *  | uniq -c
 86 2018-01-26 10:01:48.811135084 +0100
469 2018-01-26 10:01:48.815135143 +0100
445 2018-01-26 10:01:48.819135201 +0100

 Arnd

Re: [PATCH 2/3] ARM: dts: imx6ull: add support for the esai interface

2018-01-26 Thread Dong Aisheng

On Fri, Jan 26, 2018 at 09:23:51AM +0100, Lothar Waßmann wrote:
> The address space taken by the UART8 on the i.MX6UL is used for the
> ESAI interface on i.MX6ULL.
> 
> Since the ESAI unit on i.MX6ULL has two more bits in the TFCR register
> (TFIN, TAENB) it deserves to get its own compatible string, though the
> bits are currently not used by the driver.
> 
> Signed-off-by: Lothar Waßmann 
> ---
>  Documentation/devicetree/bindings/sound/fsl,esai.txt |  4 ++--
>  arch/arm/boot/dts/imx6ull.dtsi   | 17 +
>  sound/soc/fsl/fsl_esai.c |  1 +

Should them be separate patches?

Otherwise this patch is ok to me.

Acked-by: Dong Aisheng 

Regards
Dong Aisheng

>  3 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/sound/fsl,esai.txt 
> b/Documentation/devicetree/bindings/sound/fsl,esai.txt
> index cacd18b..4103f46 100644
> --- a/Documentation/devicetree/bindings/sound/fsl,esai.txt
> +++ b/Documentation/devicetree/bindings/sound/fsl,esai.txt
> @@ -7,8 +7,8 @@ other DSPs. It has up to six transmitters and four receivers.
>  
>  Required properties:
>  
> -  - compatible   : Compatible list, must contain 
> "fsl,imx35-esai" or
> -   "fsl,vf610-esai"
> +  - compatible   : Compatible list, must contain 
> "fsl,imx35-esai",
> +   "fsl,vf610-esai" or "fsl,imx6ull-esai"
>  
>- reg  : Offset and length of the register set for the 
> device.
>  
> diff --git a/arch/arm/boot/dts/imx6ull.dtsi b/arch/arm/boot/dts/imx6ull.dtsi
> index abc815f..8724fdb2 100644
> --- a/arch/arm/boot/dts/imx6ull.dtsi
> +++ b/arch/arm/boot/dts/imx6ull.dtsi
> @@ -47,6 +47,23 @@
>   aips-bus@200 {
>   spba-bus@200 {
>   /delete-node/ serial@2024000;
> +
> + esai: esai@2024000 {
> + compatible = "fsl,imx6ull-esai", 
> "fsl,imx35-esai";
> + reg = <0x02024000 0x4000>;
> + interrupts =  IRQ_TYPE_LEVEL_HIGH>;
> + clocks = <&clks IMX6ULL_CLK_ESAI_IPG>,
> +  <&clks IMX6ULL_CLK_ESAI_MEM>,
> +  <&clks IMX6ULL_CLK_ESAI_EXTAL>,
> +  <&clks IMX6ULL_CLK_ESAI_IPG>,
> +  <&clks IMX6UL_CLK_SPBA>;
> + clock-names = "core", "mem", "extal",
> +   "fsys", "spba";
> + dmas = <&sdma 0 21 0>,
> +<&sdma 47 21 0>;
> + dma-names = "rx", "tx";
> + status = "disabled";
> + };
>   };
>   };
>  
> diff --git a/sound/soc/fsl/fsl_esai.c b/sound/soc/fsl/fsl_esai.c
> index cef79a1..5b6a53f 100644
> --- a/sound/soc/fsl/fsl_esai.c
> +++ b/sound/soc/fsl/fsl_esai.c
> @@ -910,6 +910,7 @@ static int fsl_esai_probe(struct platform_device *pdev)
>  }
>  
>  static const struct of_device_id fsl_esai_dt_ids[] = {
> + { .compatible = "fsl,imx6ull-esai", },
>   { .compatible = "fsl,imx35-esai", },
>   { .compatible = "fsl,vf610-esai", },
>   {}
> -- 
> 2.1.4
>

Re: [PATCH 3/3] ARM: dts: imx6ull: address some more incompatibilites between i.MX6UL and i.MX6ULL

2018-01-26 Thread Dong Aisheng

On Fri, Jan 26, 2018 at 09:23:52AM +0100, Lothar Waßmann wrote:
> The i.MX6ULL doesn't have the CAAM engine nor any SIM interface.
> These are currently not implemented for i.MX6UL but it cannot hurt to
> delete the corresponding nodes from the i.MX6ULL DTB anyway.
> 
> Signed-off-by: Lothar Waßmann 

Acked-by: Dong Aisheng 

Regards
Dong Aisheng

> ---
>  arch/arm/boot/dts/imx6ull.dtsi | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/arm/boot/dts/imx6ull.dtsi b/arch/arm/boot/dts/imx6ull.dtsi
> index 8724fdb2..da325cd 100644
> --- a/arch/arm/boot/dts/imx6ull.dtsi
> +++ b/arch/arm/boot/dts/imx6ull.dtsi
> @@ -67,6 +67,12 @@
>   };
>   };
>  
> + aips-bus@210 {
> + /delete-node/ caam@2140;
> + /delete-node/ sim@218c;
> + /delete-node/ sim@21b4;
> + };
> +
>   aips3: aips-bus@220 {
>   compatible = "fsl,aips-bus", "simple-bus";
>   #address-cells = <1>;
> -- 
> 2.1.4
>

Re: [PATCH RFC 0/3] API for 128-bit IO access

2018-01-26 Thread Yury Norov

On Wed, Jan 24, 2018 at 10:22:13AM +, Will Deacon wrote:
> On Wed, Jan 24, 2018 at 12:05:16PM +0300, Yury Norov wrote:
> > This series adds API for 128-bit memory IO access and enables it for ARM64.
> > The original motivation for 128-bit API came from new Cavium network device
> > driver. The hardware requires 128-bit access to make things work. See
> > description in patch 3 for details.
> > 
> > Also, starting from ARMv8.4, stp and ldp instructions become atomic, and
> > API for 128-bit access would be helpful in core arm64 code.
> 
> Only for normal, cacheable memory, so they're not suitable for IO accesses
> as you're proposing here.

Hi Will,

Thanks for clarification.

Could you elaborate, do you find 128-bit read/write API useless, or
you just correct my comment?

I think that ordered uniform 128-bit access API would be helpful, even
if not atomic.

Yury.

[PATCH v3 0/2] perf stat: Add interval-count and time support

2018-01-26 Thread ufo19890607

From: yuzhoujian 

Introduce two new options for perf stat and update perf-stat documentation
accordingly.

The interval-count option can be used to print counts for fixed number of
times, and it should be used specifically with "-I" option.

Show below is the output of the interval-count option for perf stat.

$ perf stat -I 1000 --interval-count 2 -e cycles -a
#   time counts unit events
 1.002827089 93,884,870  cycles
 2.004231506 56,573,446  cycles

The time option can be used to print counts after a period of time, and it
should not be used with "-I" option.

Show below is the output of the time option for perf stat.

$ perf stat --time 2000 -e cycles -a
Performance counter stats for 'system wide':

157,260,423  cycles

2.003060766 seconds time elapsed

yuzhoujian (2):
  perf stat: Add support to print counts for fixed times
  perf stat: Add support to print counts after a period of time

Changes since v2:
- modify the time check in __run_perf_stat func to keep some consistency
  with the workload case.
- add the warning when the time is set between 10ms to 100ms.
- add the pr_err when the time is set below 10ms.

Changes since v1:
- change the name of the new option "times-print" to "interval-count".
- keep the interval-count option interval specifically.

 tools/perf/Documentation/perf-stat.txt | 10 ++
 tools/perf/builtin-stat.c  | 59 --
 tools/perf/util/stat.h |  2 ++
 3 files changed, 68 insertions(+), 3 deletions(-)

-- 
2.14.1

[PATCH v3 2/2] perf stat: Add support to print counts after a period of time

2018-01-26 Thread ufo19890607

From: yuzhoujian 

Introduce a new option to print counts after N milliseconds
and update perf-stat documentation accordingly.

Show below is the output of the new option for perf stat.

$ perf stat --time 2000 -e cycles -a
Performance counter stats for 'system wide':

157,260,423  cycles

2.003060766 seconds time elapsed

We can print the count deltas after N milliseconds with this new
introduced option. This option is not supported with "-I" option.
In addition, according to Kangliang's patch(19afd10410957), the
monitoring overhead for system-wide core event could be very high
if the interval-print parameter was below 100ms, and the limitation
value is 10ms. So the same warning will be displayed when the time
is set between 10ms to 100ms, and the minimal time is limited to
10ms. Users can make a decision according to their spcific cases.

Changes since v2:
- modify the time check in __run_perf_stat func to keep some consistency
  with the workload case.
- add the warning when the time is set between 10ms to 100ms.
- add the pr_err when the time is set below 10ms.

Changes since v1:
- none.

Signed-off-by: yuzhoujian 
---
 tools/perf/Documentation/perf-stat.txt |  5 +
 tools/perf/builtin-stat.c  | 33 +++--
 tools/perf/util/stat.h |  1 +
 3 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-stat.txt 
b/tools/perf/Documentation/perf-stat.txt
index 47a21645f60c..c822f374c99a 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -151,6 +151,11 @@ Print count deltas for fixed number of times.
 This option should be used together with "-I" option.
example: 'perf stat -I 1000 --interval-count 2 -e cycles -a'
 
+--time msecs::
+Print count deltas after N milliseconds (minimum: 10 ms).
+This option is not supported with "-I" option.
+   example: 'perf stat --time 2000 -e cycles -a'
+
 --metric-only::
 Only print computed metrics. Print them in a single line.
 Don't show any raw values. Not supported with --per-thread.
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 406f546ad74c..73c011acf92a 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -573,6 +573,7 @@ static int __run_perf_stat(int argc, const char **argv)
 {
int interval = stat_config.interval;
int times = stat_config.times;
+   int time = stat_config.time;
char msg[BUFSIZ];
unsigned long long t0, t1;
struct perf_evsel *counter;
@@ -586,6 +587,9 @@ static int __run_perf_stat(int argc, const char **argv)
if (interval) {
ts.tv_sec  = interval / USEC_PER_MSEC;
ts.tv_nsec = (interval % USEC_PER_MSEC) * NSEC_PER_MSEC;
+   } else if (time) {
+   ts.tv_sec  = time / USEC_PER_MSEC;
+   ts.tv_nsec = (time % USEC_PER_MSEC) * NSEC_PER_MSEC;
} else {
ts.tv_sec  = 1;
ts.tv_nsec = 0;
@@ -698,9 +702,11 @@ static int __run_perf_stat(int argc, const char **argv)
perf_evlist__start_workload(evsel_list);
enable_counters();
 
-   if (interval) {
+   if (interval || time) {
while (!waitpid(child_pid, &status, WNOHANG)) {
nanosleep(&ts, NULL);
+   if (time)
+   break;
process_interval();
if (interval_count == true) {
if (--times == 0)
@@ -722,6 +728,8 @@ static int __run_perf_stat(int argc, const char **argv)
enable_counters();
while (!done) {
nanosleep(&ts, NULL);
+   if (time)
+   break;
if (interval) {
process_interval();
if (interval_count == true) {
@@ -1904,6 +1912,8 @@ static const struct option stat_options[] = {
"print counts at regular interval in ms (>= 10)"),
OPT_INTEGER(0, "interval-count", &stat_config.times,
"print counts for fixed number of times"),
+   OPT_UINTEGER(0, "time", &stat_config.time,
+   "print counts after a period of time in ms (>= 10)"),
OPT_SET_UINT(0, "per-socket", &stat_config.aggr_mode,
 "aggregate counts per processor socket", AGGR_SOCKET),
OPT_SET_UINT(0, "per-core", &stat_config.aggr_mode,
@@ -2701,7 +2711,7 @@ int cmd_stat(int argc, const char **argv)
int status = -EINVAL, run_idx;
const char *mode;
FILE *output = stderr;
-   unsigned int interval;
+   unsigned int interval, time;
int times;
const char * const stat_subcomman

[PATCH v3 1/2] perf stat: Add support to print counts for fixed times

2018-01-26 Thread ufo19890607

From: yuzhoujian 

Introduce a new option to print counts for fixed number of times
and update perf-stat documentation accordingly.

Show below is the output of the new option for perf stat.

$ perf stat -I 1000 --interval-count 2 -e cycles -a
#   time counts unit events
 1.002827089 93,884,870  cycles
 2.004231506 56,573,446  cycles

We can just print the counts for several times with this newly introduced
option. The usage of it is a little like vmstat, and it should be used
together with "-I" option.

$ vmstat -n 1 2
procs ---memory-- ---swap-- -io -system-- --cpu-
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa st
 0  0  0 78270544 547484 5173207600 02011  1  0 99  
0  0
 0  0  0 78270512 547484 5173208000 016  477 1555  0  0 100 
 0  0

Changes since v2:
- none.

Changes since v1:
- change the name of the new option "times-print" to "interval-count".
- keep the new option interval specifically.

Signed-off-by: yuzhoujian 
---
 tools/perf/Documentation/perf-stat.txt |  5 +
 tools/perf/builtin-stat.c  | 26 +-
 tools/perf/util/stat.h |  1 +
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-stat.txt 
b/tools/perf/Documentation/perf-stat.txt
index 823fce7674bb..47a21645f60c 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -146,6 +146,11 @@ Print count deltas every N milliseconds (minimum: 10ms)
 The overhead percentage could be high in some cases, for instance with small, 
sub 100ms intervals.  Use with caution.
example: 'perf stat -I 1000 -e cycles -a sleep 5'
 
+--interval-count times::
+Print count deltas for fixed number of times.
+This option should be used together with "-I" option.
+   example: 'perf stat -I 1000 --interval-count 2 -e cycles -a'
+
 --metric-only::
 Only print computed metrics. Print them in a single line.
 Don't show any raw values. Not supported with --per-thread.
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 98bf9d32f222..406f546ad74c 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -168,6 +168,7 @@ static struct timespec  ref_time;
 static struct cpu_map  *aggr_map;
 static aggr_get_id_t   aggr_get_id;
 static boolappend_file;
+static boolinterval_count;
 static const char  *output_name;
 static int output_fd;
 static int print_free_counters_hint;
@@ -571,6 +572,7 @@ static struct perf_evsel 
*perf_evsel__reset_weak_group(struct perf_evsel *evsel)
 static int __run_perf_stat(int argc, const char **argv)
 {
int interval = stat_config.interval;
+   int times = stat_config.times;
char msg[BUFSIZ];
unsigned long long t0, t1;
struct perf_evsel *counter;
@@ -700,6 +702,10 @@ static int __run_perf_stat(int argc, const char **argv)
while (!waitpid(child_pid, &status, WNOHANG)) {
nanosleep(&ts, NULL);
process_interval();
+   if (interval_count == true) {
+   if (--times == 0)
+   break;
+   }
}
}
waitpid(child_pid, &status, 0);
@@ -716,8 +722,13 @@ static int __run_perf_stat(int argc, const char **argv)
enable_counters();
while (!done) {
nanosleep(&ts, NULL);
-   if (interval)
+   if (interval) {
process_interval();
+   if (interval_count == true) {
+   if (--times == 0)
+   break;
+   }
+   }
}
}
 
@@ -1891,6 +1902,8 @@ static const struct option stat_options[] = {
"command to run after to the measured command"),
OPT_UINTEGER('I', "interval-print", &stat_config.interval,
"print counts at regular interval in ms (>= 10)"),
+   OPT_INTEGER(0, "interval-count", &stat_config.times,
+   "print counts for fixed number of times"),
OPT_SET_UINT(0, "per-socket", &stat_config.aggr_mode,
 "aggregate counts per processor socket", AGGR_SOCKET),
OPT_SET_UINT(0, "per-core", &stat_config.aggr_mode,
@@ -2689,6 +2702,7 @@ int cmd_stat(int argc, const char **argv)
const char *mode;
FILE *output = stderr;
unsigned int interval;
+   int

[PATCH] bcma: Replace mdelay with usleep_range in bcma_pmu_resources_init

2018-01-26 Thread Jia-Ju Bai

After checking all possible call chains to bcma_pmu_resources_init() here,
my tool finds that this function is never called in atomic context,
namely never in an interrupt handler or holding a spinlock.
Thus mdelay can be replaced with usleep_range to avoid busy wait.

This is found by a static analysis tool named DCNS written by myself.

Signed-off-by: Jia-Ju Bai 
---
 drivers/bcma/driver_chipcommon_pmu.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/bcma/driver_chipcommon_pmu.c 
b/drivers/bcma/driver_chipcommon_pmu.c
index f1eb4d3..478948c 100644
--- a/drivers/bcma/driver_chipcommon_pmu.c
+++ b/drivers/bcma/driver_chipcommon_pmu.c
@@ -203,7 +203,7 @@ static void bcma_pmu_resources_init(struct bcma_drv_cc *cc)
 * Add some delay; allow resources to come up and settle.
 * Delay is required for SoC (early init).
 */
-   mdelay(2);
+   usleep_range(1500, 2000);
 }
 
 /* Disable to allow reading SPROM. Don't know the adventages of enabling it. */
-- 
1.7.9.5

Re: PATCH v6 6/6] livepatch: Add atomic replace

2018-01-26 Thread Petr Mladek

On Thu 2018-01-25 23:27:57, Jason Baron wrote:
> On 01/25/2018 11:02 AM, Petr Mladek wrote:
> > From: Jason Baron 
> > A better solution would be to create cumulative patch and say that
> > it replaces all older ones.
> > 
> > Signed-off-by: Jason Baron 
> > [pmla...@suse.com: Split, reuse existing code, simplified]
> 
> Hi Petr,
> 
> Thanks for cleaning this up - it looks good.

Uff, I feel relief :-)

> > diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
> > index 6ad3195d995a..c606b291dfcd 100644
> > --- a/kernel/livepatch/core.c
> > +++ b/kernel/livepatch/core.c
> > +/*
> > + * This function removes replaced patches from both func_stack
> > + * and klp_patches stack.
> > + *
> > + * We could be pretty aggressive here. It is called in situation
> > + * when these structures are not longer accessible. All functions
> > + * are redirected using the klp_transition_patch. They use either
> > + * a new code or they in the original code because of the special
> > + * nop function patches.
> > + */
> > +void klp_throw_away_replaced_patches(struct klp_patch *new_patch,
> > +bool keep_module)
> > +{
> > +   struct klp_patch *old_patch, *tmp_patch;
> > +
> > +   list_for_each_entry_safe(old_patch, tmp_patch, &klp_patches, list) {
> > +   if (old_patch == new_patch)
> > +   return;
> > +
> > +   klp_unpatch_objects(old_patch, KLP_FUNC_ANY);
> > +   old_patch->enabled = false;
> > +
> > +   /*
> > +* Replaced patches could not get re-enabled to keep
> > +* the code sane.
> > +*/
> > +   list_del_init(&old_patch->list);
> 
> I'm wondering if this should be:
> 
> list_move(&old_patch->list, &klp_replaced_patches);

Yes, great catch!

The list_del() comes from one iteration where I got rid of the extra
list. I though that it might be enough to check
patch->kobj.state_initialized. But then I realized that this
kobject state was modified outside klp_mutex.

To be honest, I did not only minimal testing with my changes.
Mirek promised to port a battery of his kGraft-based tests and
run it.

Thanks a lot for review.

Best Regards,
Petr

Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation

2018-01-26 Thread David Woodhouse

On Thu, 2018-01-25 at 18:23 -0800, Dave Hansen wrote:
> On 01/25/2018 06:11 PM, Liran Alon wrote:
> > 
> > It is true that attacker cannot speculate to a kernel-address, but it
> > doesn't mean it cannot use the leaked kernel-address together with
> > another unrelated vulnerability to build a reliable exploit.
>
> The address doesn't leak if you can't execute there.  It's the same
> reason that we don't worry about speculation to user addresses from the
> kernel when SMEP is in play.

If both tags and target in the BTB are only 31 bits, then surely a
user-learned prediction of a branch from

  0x01234567 → 0x07654321

would be equivalent to a kernel-mode branch from

 0x81234567 → 0x87654321

... and interpreted in kernel mode as the latter? So I'm not sure why
SMEP saves us there?

Likewise if the RSB only stores the low 31 bits of the target, SMEP
isn't much help there either.

Do we need to look again at the fact that we've disabled the RSB-
stuffing for SMEP?

smime.p7s
Description: S/MIME cryptographic signature

[PATCH 1/3] Partial revert "e1000e: Avoid receiver overrun interrupt bursts"

2018-01-26 Thread Benjamin Poirier

This reverts commit 4aea7a5c5e940c1723add439f4088844cd26196d.

We keep the fix for the first part of the problem (1) described in the log
of that commit however we use the implementation of e1000_msix_other() from
before commit 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt").
We remove the fix for the second part of the problem (2).

Bursts of "Other" interrupts may once again occur during rxo (receive
overflow) traffic conditions. This is deemed acceptable in the interest of
reverting driver code back to its previous state. The e1000e driver should
be in "maintenance" mode and we want to avoid unforeseen fallout from
changes that are not strictly necessary.

Link: https://www.spinics.net/lists/netdev/msg480675.html
Signed-off-by: Benjamin Poirier 
---
 drivers/net/ethernet/intel/e1000e/defines.h |  1 -
 drivers/net/ethernet/intel/e1000e/netdev.c  | 32 +++--
 2 files changed, 12 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/defines.h 
b/drivers/net/ethernet/intel/e1000e/defines.h
index afb7ebe20b24..0641c0098738 100644
--- a/drivers/net/ethernet/intel/e1000e/defines.h
+++ b/drivers/net/ethernet/intel/e1000e/defines.h
@@ -398,7 +398,6 @@
 #define E1000_ICR_LSC   0x0004 /* Link Status Change */
 #define E1000_ICR_RXSEQ 0x0008 /* Rx sequence error */
 #define E1000_ICR_RXDMT00x0010 /* Rx desc min. threshold (0) */
-#define E1000_ICR_RXO   0x0040 /* Receiver Overrun */
 #define E1000_ICR_RXT0  0x0080 /* Rx timer intr (ring 0) */
 #define E1000_ICR_ECCER 0x0040 /* Uncorrectable ECC Error */
 /* If this bit asserted, the driver should claim the interrupt */
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 9f18d39bdc8f..398e940436f8 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -1914,30 +1914,23 @@ static irqreturn_t e1000_msix_other(int __always_unused 
irq, void *data)
struct net_device *netdev = data;
struct e1000_adapter *adapter = netdev_priv(netdev);
struct e1000_hw *hw = &adapter->hw;
-   u32 icr;
-   bool enable = true;
-
-   icr = er32(ICR);
-   if (icr & E1000_ICR_RXO) {
-   ew32(ICR, E1000_ICR_RXO);
-   enable = false;
-   /* napi poll will re-enable Other, make sure it runs */
-   if (napi_schedule_prep(&adapter->napi)) {
-   adapter->total_rx_bytes = 0;
-   adapter->total_rx_packets = 0;
-   __napi_schedule(&adapter->napi);
-   }
-   }
-   if (icr & E1000_ICR_LSC) {
-   ew32(ICR, E1000_ICR_LSC);
+   u32 icr = er32(ICR);
+
+   if (icr & adapter->eiac_mask)
+   ew32(ICS, (icr & adapter->eiac_mask));
+
+   if (icr & E1000_ICR_OTHER) {
+   if (!(icr & E1000_ICR_LSC))
+   goto no_link_interrupt;
hw->mac.get_link_status = true;
/* guard against interrupt when we're going down */
if (!test_bit(__E1000_DOWN, &adapter->state))
mod_timer(&adapter->watchdog_timer, jiffies + 1);
}
 
-   if (enable && !test_bit(__E1000_DOWN, &adapter->state))
-   ew32(IMS, E1000_IMS_OTHER);
+no_link_interrupt:
+   if (!test_bit(__E1000_DOWN, &adapter->state))
+   ew32(IMS, E1000_IMS_LSC | E1000_IMS_OTHER);
 
return IRQ_HANDLED;
 }
@@ -2707,8 +2700,7 @@ static int e1000e_poll(struct napi_struct *napi, int 
weight)
napi_complete_done(napi, work_done);
if (!test_bit(__E1000_DOWN, &adapter->state)) {
if (adapter->msix_entries)
-   ew32(IMS, adapter->rx_ring->ims_val |
-E1000_IMS_OTHER);
+   ew32(IMS, adapter->rx_ring->ims_val);
else
e1000_irq_enable(adapter);
}
-- 
2.15.1

[PATCH 3/3] Revert "e1000e: Do not read ICR in Other interrupt"

2018-01-26 Thread Benjamin Poirier

This reverts commit 16ecba59bc333d6282ee057fb02339f77a880beb.

It was reported that emulated e1000e devices in vmware esxi 6.5 Build
7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver
overrun interrupt bursts"). Some tracing shows that after
e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other()
on emulated e1000e devices. In comparison, on real e1000e 82574 hardware,
icr=0x8004 (_INT_ASSERTED | _LSC) in the same situation.

Some experimentation showed that this flaw in vmware e1000e emulation can
be worked around by not setting Other in EIAC. This is how it was before
commit 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt").

Since the ICR read in the Other interrupt handler has already been
restored, this patch effectively reverts the remainder of commit
16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt").

Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts")
Signed-off-by: Benjamin Poirier 
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index ed103b9a8d3a..fffc1f0e3895 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -1916,6 +1916,13 @@ static irqreturn_t e1000_msix_other(int __always_unused 
irq, void *data)
struct e1000_hw *hw = &adapter->hw;
u32 icr = er32(ICR);
 
+   /* Certain events (such as RXO) which trigger Other do not set
+* INT_ASSERTED. In that case, read to clear of icr does not take
+* place.
+*/
+   if (!(icr & E1000_ICR_INT_ASSERTED))
+   ew32(ICR, E1000_ICR_OTHER);
+
if (icr & adapter->eiac_mask)
ew32(ICS, (icr & adapter->eiac_mask));
 
@@ -2033,7 +2040,6 @@ static void e1000_configure_msix(struct e1000_adapter 
*adapter)
   hw->hw_addr + E1000_EITR_82574(vector));
else
writel(1, hw->hw_addr + E1000_EITR_82574(vector));
-   adapter->eiac_mask |= E1000_IMS_OTHER;
 
/* Cause Tx interrupts on every write back */
ivar |= BIT(31);
@@ -2258,7 +2264,7 @@ static void e1000_irq_enable(struct e1000_adapter 
*adapter)
 
if (adapter->msix_entries) {
ew32(EIAC_82574, adapter->eiac_mask & E1000_EIAC_MASK_82574);
-   ew32(IMS, adapter->eiac_mask | E1000_IMS_LSC);
+   ew32(IMS, adapter->eiac_mask | E1000_IMS_OTHER | E1000_IMS_LSC);
} else if (hw->mac.type >= e1000_pch_lpt) {
ew32(IMS, IMS_ENABLE_MASK | E1000_IMS_ECCER);
} else {
-- 
2.15.1

[PATCH 2/3] Revert "e1000e: Separate signaling for link check/link up"

2018-01-26 Thread Benjamin Poirier

This reverts commit 19110cfbb34d4af0cdfe14cd243f3b09dc95b013.
This reverts commit 4110e02eb45ea447ec6f5459c9934de0a273fb91.

... because they cause an extra 2s delay for the link to come up when
autoneg is off.

After reverting, the race condition described in the log of commit
19110cfbb34d ("e1000e: Separate signaling for link check/link up") is
reintroduced. It may still be triggered by LSC events but this should not
result in link flap. It may no longer be triggered by RXO events because
commit 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts")
restored reading icr in the Other handler.

As discussed, the driver should be in "maintenance mode". In the interest
of stability, revert to the original code as much as possible instead of a
half-baked solution.

Link: https://www.spinics.net/lists/netdev/msg479923.html
Signed-off-by: Benjamin Poirier 
---
 drivers/net/ethernet/intel/e1000e/ich8lan.c | 11 +++
 drivers/net/ethernet/intel/e1000e/mac.c | 11 +++
 drivers/net/ethernet/intel/e1000e/netdev.c  |  2 +-
 3 files changed, 7 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c 
b/drivers/net/ethernet/intel/e1000e/ich8lan.c
index 31277d3bb7dc..d6d4ed7acf03 100644
--- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
+++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
@@ -1367,9 +1367,6 @@ static s32 e1000_disable_ulp_lpt_lp(struct e1000_hw *hw, 
bool force)
  *  Checks to see of the link status of the hardware has changed.  If a
  *  change in link status has been detected, then we read the PHY registers
  *  to get the current speed/duplex if link exists.
- *
- *  Returns a negative error code (-E1000_ERR_*) or 0 (link down) or 1 (link
- *  up).
  **/
 static s32 e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw)
 {
@@ -1385,7 +1382,7 @@ static s32 e1000_check_for_copper_link_ich8lan(struct 
e1000_hw *hw)
 * Change or Rx Sequence Error interrupt.
 */
if (!mac->get_link_status)
-   return 1;
+   return 0;
 
/* First we want to see if the MII Status Register reports
 * link.  If so, then we want to get the current speed/duplex
@@ -1616,12 +1613,10 @@ static s32 e1000_check_for_copper_link_ich8lan(struct 
e1000_hw *hw)
 * different link partner.
 */
ret_val = e1000e_config_fc_after_link_up(hw);
-   if (ret_val) {
+   if (ret_val)
e_dbg("Error configuring flow control\n");
-   return ret_val;
-   }
 
-   return 1;
+   return ret_val;
 }
 
 static s32 e1000_get_variants_ich8lan(struct e1000_adapter *adapter)
diff --git a/drivers/net/ethernet/intel/e1000e/mac.c 
b/drivers/net/ethernet/intel/e1000e/mac.c
index f457c5703d0c..b322011ec282 100644
--- a/drivers/net/ethernet/intel/e1000e/mac.c
+++ b/drivers/net/ethernet/intel/e1000e/mac.c
@@ -410,9 +410,6 @@ void e1000e_clear_hw_cntrs_base(struct e1000_hw *hw)
  *  Checks to see of the link status of the hardware has changed.  If a
  *  change in link status has been detected, then we read the PHY registers
  *  to get the current speed/duplex if link exists.
- *
- *  Returns a negative error code (-E1000_ERR_*) or 0 (link down) or 1 (link
- *  up).
  **/
 s32 e1000e_check_for_copper_link(struct e1000_hw *hw)
 {
@@ -426,7 +423,7 @@ s32 e1000e_check_for_copper_link(struct e1000_hw *hw)
 * Change or Rx Sequence Error interrupt.
 */
if (!mac->get_link_status)
-   return 1;
+   return 0;
 
/* First we want to see if the MII Status Register reports
 * link.  If so, then we want to get the current speed/duplex
@@ -464,12 +461,10 @@ s32 e1000e_check_for_copper_link(struct e1000_hw *hw)
 * different link partner.
 */
ret_val = e1000e_config_fc_after_link_up(hw);
-   if (ret_val) {
+   if (ret_val)
e_dbg("Error configuring flow control\n");
-   return ret_val;
-   }
 
-   return 1;
+   return ret_val;
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 398e940436f8..ed103b9a8d3a 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -5091,7 +5091,7 @@ static bool e1000e_has_link(struct e1000_adapter *adapter)
case e1000_media_type_copper:
if (hw->mac.get_link_status) {
ret_val = hw->mac.ops.check_for_link(hw);
-   link_active = ret_val > 0;
+   link_active = !hw->mac.get_link_status;
} else {
link_active = true;
}
-- 
2.15.1

[PATCH 0/3] e1000e: Revert interrupt handling changes

2018-01-26 Thread Benjamin Poirier

As discussed in the thread "[RFC PATCH] e1000e: Remove Other from EIAC.",
https://www.spinics.net/lists/netdev/msg479311.html

The following list of commits was considered:
4d432f67ff00 e1000e: Remove unreachable code (v4.5-rc1)
16ecba59bc33 e1000e: Do not read ICR in Other interrupt (v4.5-rc1)
a61cfe4ffad7 e1000e: Do not write lsc to ics in msi-x mode (v4.5-rc1)
0a8047ac68e5 e1000e: Fix msi-x interrupt automask (v4.5-rc1)
19110cfbb34d e1000e: Separate signaling for link check/link up (v4.15-rc1)
4aea7a5c5e94 e1000e: Avoid receiver overrun interrupt bursts (v4.15-rc1)
4110e02eb45e e1000e: Fix e1000_check_for_copper_link_ich8lan return value. 
(v4.15-rc8)

There have a been a slew of regressions due to unforeseen consequences
(receive overflow triggers Other, vmware's emulated e1000e) and programming
mistakes (4110e02eb45e). Since the e1000e driver is supposed to be in
maintenance mode, this patch series revisits the above changes to prune
them down.

After this series, the remaining differences related to how interrupts were
handled at commit 4d432f67ff00 ("e1000e: Remove unreachable code",
v4.5-rc1) are:
* the changes in commit 0a8047ac68e5 ("e1000e: Fix msi-x interrupt
  automask", v4.5-rc1) are preserved.
* we manually clear Other from icr in e1000_msix_other().

We try to go back to a long lost time when things were simple and drivers
ran smoothly.


Benjamin Poirier (3):
  Partial revert "e1000e: Avoid receiver overrun interrupt bursts"
  Revert "e1000e: Separate signaling for link check/link up"
  Revert "e1000e: Do not read ICR in Other interrupt"

 drivers/net/ethernet/intel/e1000e/defines.h |  1 -
 drivers/net/ethernet/intel/e1000e/ich8lan.c | 11 ++--
 drivers/net/ethernet/intel/e1000e/mac.c | 11 ++--
 drivers/net/ethernet/intel/e1000e/netdev.c  | 44 ++---
 4 files changed, 27 insertions(+), 40 deletions(-)

-- 
2.15.1

[PATCH v8 2/2] nvme: add tracepoint for nvme_complete_rq

2018-01-26 Thread Johannes Thumshirn

Add a tracepoint in nvme_complete_rq() for completions of NVMe commands. An
expmale output of the trace-point is as follows:

-0 [001] d.h. 3.505266: nvme_complete_rq: cmdid=989, qid=1, 
res=0, retries=0, flags=0x0, status=0

Signed-off-by: Johannes Thumshirn 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Martin K. Petersen 
Reviewed-by: Keith Busch 
Reviewed-by: Sagi Grimberg 

---
Changes to v6:
* Rebase onto nvme-4.16

Changes to v4:
* Print QID for completions (Christoph)

Changes to v2:
* Pass the whole struct request to the tracepoint
* Removed spaces after parenthesis (Christoph)
---
 drivers/nvme/host/core.c  |  2 ++
 drivers/nvme/host/trace.h | 26 ++
 2 files changed, 28 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 358dfdc1f6da..7cbd4a260d30 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -220,6 +220,8 @@ void nvme_complete_rq(struct request *req)
 {
blk_status_t status = nvme_error_status(req);
 
+   trace_nvme_complete_rq(req);
+
if (unlikely(status != BLK_STS_OK && nvme_req_needs_retry(req))) {
if (nvme_req_needs_failover(req, status)) {
nvme_failover_req(req);
diff --git a/drivers/nvme/host/trace.h b/drivers/nvme/host/trace.h
index f17dbfbead5a..afd87235311b 100644
--- a/drivers/nvme/host/trace.h
+++ b/drivers/nvme/host/trace.h
@@ -130,6 +130,32 @@ TRACE_EVENT(nvme_setup_nvm_cmd,
  __parse_nvme_cmd(__entry->opcode, __entry->cdw10))
 );
 
+TRACE_EVENT(nvme_complete_rq,
+   TP_PROTO(struct request *req),
+   TP_ARGS(req),
+   TP_STRUCT__entry(
+   __field(int, qid)
+   __field(int, cid)
+   __field(__le64, result)
+   __field(u8, retries)
+   __field(u8, flags)
+   __field(u16, status)
+   ),
+   TP_fast_assign(
+   __entry->qid = req->q->id;
+   __entry->cid = req->tag;
+   __entry->result = nvme_req(req)->result.u64;
+   __entry->retries = nvme_req(req)->retries;
+   __entry->flags = nvme_req(req)->flags;
+   __entry->status = nvme_req(req)->status;
+   ),
+   TP_printk("cmdid=%u, qid=%d, res=%llu, retries=%u, flags=0x%x, 
status=%u",
+ __entry->cid, __entry->qid,
+ le64_to_cpu(__entry->result),
+ __entry->retries, __entry->flags, __entry->status)
+
+);
+
 #endif /* _TRACE_NVME_H */
 
 #undef TRACE_INCLUDE_PATH
-- 
2.13.6

[PATCH v8 1/2] nvme: add tracepoint for nvme_setup_cmd

2018-01-26 Thread Johannes Thumshirn

Add tracepoints for nvme_setup_cmd() for tracing admin and/or nvm commands.

Examples of the two tracepoints are as follows for trace_nvme_setup_admin_cmd():
kworker/u8:0-5 [003]  2.998792: nvme_setup_admin_cmd: cmdid=14, 
flags=0x0, meta=0x0, cmd=(nvme_admin_create_cq cqid=1, qsize=1023, 
cq_flags=0x3, irq_vector=0)

and trace_nvme_setup_nvm_cmd():
dd-205   [001]  3.503929: nvme_setup_nvm_cmd: qid=1, nsid=1, cmdid=989, 
flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=4096, len=2047, ctrl=0x0, 
dsmgmt=0, reftag=0)

Signed-off-by: Johannes Thumshirn 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Martin K. Petersen 
Reviewed-by: Keith Busch 
Reviewed-by: Sagi Grimberg 
---
Changes to v7:
* Fix endianess issues detected by kbuild robot/sparse
  make C=2 CF=-D__CHECK_ENDIAN__ is silent now

Changes to v5:
* Print QID for nvm commands (Christoph/Martin)

Changes to v4:
* Split trace function into two for admin and nvm cmds (Christoph)
* Remove structures for commands and decode as needed (Christoph)
* Add proper Changelog (Christoph)
* Don't decode NS ID for admin commands

Changes to v3:
* Only build trace.o when CONFIG_TRACE=y (Christoph)
* Only copy non-common command fields to trace decoder (Christoph)
* Merge write_zeros decoder into rw decoder
* Don't decode admin commands as I/O commands

Changes to v2:
* Don't cast le64_to_cpu() conversions to unsigned long long (Christoph)
* Add proper copyright header (Christoph)
* Move trace decoding into own file (Christoph)
* Include the src directory in the Makefile for trace (Christoph)
* Removed spaces before and after parenthesis (Christoph)
* Reduced print lines to fit the 80 char limit (Christoph)

Changes to v1:
* Fix typo (Hannes)
* move include/trace/events/nvme.h -> drivers/nvme/host/trace.h (Christoph)
---
 drivers/nvme/host/Makefile |   4 ++
 drivers/nvme/host/core.c   |   7 +++
 drivers/nvme/host/trace.c  | 127 
 drivers/nvme/host/trace.h  | 141 +
 4 files changed, 279 insertions(+)
 create mode 100644 drivers/nvme/host/trace.c
 create mode 100644 drivers/nvme/host/trace.h

diff --git a/drivers/nvme/host/Makefile b/drivers/nvme/host/Makefile
index a25fd43650ad..441e67e3a9d7 100644
--- a/drivers/nvme/host/Makefile
+++ b/drivers/nvme/host/Makefile
@@ -1,4 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
+
+ccflags-y  += -I$(src)
+
 obj-$(CONFIG_NVME_CORE)+= nvme-core.o
 obj-$(CONFIG_BLK_DEV_NVME) += nvme.o
 obj-$(CONFIG_NVME_FABRICS) += nvme-fabrics.o
@@ -6,6 +9,7 @@ obj-$(CONFIG_NVME_RDMA) += nvme-rdma.o
 obj-$(CONFIG_NVME_FC)  += nvme-fc.o
 
 nvme-core-y:= core.o
+nvme-core-$(CONFIG_TRACING)+= trace.o
 nvme-core-$(CONFIG_NVME_MULTIPATH) += multipath.o
 nvme-core-$(CONFIG_NVM)+= lightnvm.o
 
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index fde6fd2e7eef..358dfdc1f6da 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -29,6 +29,9 @@
 #include 
 #include 
 
+#define CREATE_TRACE_POINTS
+#include "trace.h"
+
 #include "nvme.h"
 #include "fabrics.h"
 
@@ -628,6 +631,10 @@ blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct 
request *req,
}
 
cmd->common.command_id = req->tag;
+   if (ns)
+   trace_nvme_setup_nvm_cmd(req->q->id, cmd);
+   else
+   trace_nvme_setup_admin_cmd(cmd);
return ret;
 }
 EXPORT_SYMBOL_GPL(nvme_setup_cmd);
diff --git a/drivers/nvme/host/trace.c b/drivers/nvme/host/trace.c
new file mode 100644
index ..67c83f56cc06
--- /dev/null
+++ b/drivers/nvme/host/trace.c
@@ -0,0 +1,127 @@
+/*
+ * NVM Express device driver tracepoints
+ * Copyright (c) 2018 Johannes Thumshirn, SUSE Linux GmbH
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include "trace.h"
+
+static const char *nvme_trace_create_sq(struct trace_seq *p, __le32 *cdw10)
+{
+   const char *ret = trace_seq_buffer_ptr(p);
+   u16 qsize = le32_to_cpu(cdw10[0]) >> 16;
+   u16 sqid = le32_to_cpu(cdw10[0]) & 0x;
+   u16 cqid = le32_to_cpu(cdw10[1]) >> 16;
+   u16 sq_flags = le32_to_cpu(cdw10[1]) & 0x;
+
+   trace_seq_printf(p, "sqid=%u, qsize=%u, sq_flags=0x%x, cqid=%u",
+sqid, qsize, sq_flags, cqid);
+   trace_seq_putc(p, 0);
+
+   return ret;
+}
+
+static const char *nvme_trace_create_cq(struct trace_seq *p, __le32 *cdw10)
+{
+   cons

[PATCH v8 0/2] add tracepoints for nvme command submission and completion

2018-01-26 Thread Johannes Thumshirn

Add tracepoints for nvme command submission and completion. The tracepoints
are modeled after SCSI's trace_scsi_dispatch_cmd_start() and
trace_scsi_dispatch_cmd_done() tracepoints and fulfil a similar purpose,
namely a fast way to check which command is going to be queued into the HW or
Fabric driver and which command is completed again.

Here's an example output using the qemu emulated pci nvme:
# tracer: nop
#
#  _-=> irqs-off
# / _=> need-resched
#| / _---=> hardirq/softirq
#|| / _--=> preempt-depth
#||| / delay
#   TASK-PID   CPU#  TIMESTAMP  FUNCTION
#  | |   |      | |
kworker/u8:0-5 [003]  9.158541: nvme_setup_admin_cmd:  
cmdid=14, flags=0x0, meta=0x0, cmd=(nvme_admin_create_cq cqid=1, qsize=1023, 
cq_flags=0x3, irq_vector=0)
  -0 [003] d.h. 9.158705: nvme_complete_rq: cmdid=14, 
qid=0, res=0, retries=0, flags=0x0, status=0
kworker/u8:0-5 [003]  9.158712: nvme_setup_admin_cmd:  
cmdid=14, flags=0x0, meta=0x0, cmd=(nvme_admin_create_sq sqid=1, qsize=1023, 
sq_flags=0x1, cqid=1)
  -0 [003] d.h. 9.159214: nvme_complete_rq: cmdid=14, 
qid=0, res=0, retries=0, flags=0x0, status=0
kworker/u8:0-5 [003]  9.159236: nvme_setup_admin_cmd:  
cmdid=14, flags=0x0, meta=0x0, cmd=(nvme_admin_create_cq cqid=2, qsize=1023, 
cq_flags=0x3, irq_vector=1)
  -0 [003] d.h. 9.159288: nvme_complete_rq: cmdid=14, 
qid=0, res=0, retries=0, flags=0x0, status=0
kworker/u8:0-5 [003]  9.159293: nvme_setup_admin_cmd:  
cmdid=14, flags=0x0, meta=0x0, cmd=(nvme_admin_create_sq sqid=2, qsize=1023, 
sq_flags=0x1, cqid=2)
  -0 [003] d.h. 9.159479: nvme_complete_rq: cmdid=14, 
qid=0, res=0, retries=0, flags=0x0, status=0
kworker/u8:0-5 [003]  9.159525: nvme_setup_admin_cmd:  
cmdid=14, flags=0x0, meta=0x0, cmd=(nvme_admin_create_cq cqid=3, qsize=1023, 
cq_flags=0x3, irq_vector=2)
  -0 [003] d.h. 9.159565: nvme_complete_rq: cmdid=14, 
qid=0, res=0, retries=0, flags=0x0, status=0
kworker/u8:0-5 [003]  9.159569: nvme_setup_admin_cmd:  
cmdid=14, flags=0x0, meta=0x0, cmd=(nvme_admin_create_sq sqid=3, qsize=1023, 
sq_flags=0x1, cqid=3)
  -0 [003] d.h. 9.159726: nvme_complete_rq: cmdid=14, 
qid=0, res=0, retries=0, flags=0x0, status=0
kworker/u8:0-5 [003]  9.159769: nvme_setup_admin_cmd:  
cmdid=14, flags=0x0, meta=0x0, cmd=(nvme_admin_create_cq cqid=4, qsize=1023, 
cq_flags=0x3, irq_vector=3)
  -0 [003] d.h. 9.159795: nvme_complete_rq: cmdid=14, 
qid=0, res=0, retries=0, flags=0x0, status=0
kworker/u8:0-5 [003]  9.159799: nvme_setup_admin_cmd:  
cmdid=14, flags=0x0, meta=0x0, cmd=(nvme_admin_create_sq sqid=4, qsize=1023, 
sq_flags=0x1, cqid=4)
  -0 [003] d.h. 9.159957: nvme_complete_rq: cmdid=14, 
qid=0, res=0, retries=0, flags=0x0, status=0
kworker/u8:0-5 [003]  9.160971: nvme_setup_admin_cmd:  
cmdid=14, flags=0x0, meta=0x0, cmd=(nvme_admin_identify cns=0, ctrlid=1)
  -0 [003] d.h. 9.161059: nvme_complete_rq: cmdid=14, 
qid=0, res=0, retries=0, flags=0x0, status=0
kworker/u8:0-5 [003]  9.161101: nvme_setup_admin_cmd:  
cmdid=14, flags=0x0, meta=0x0, cmd=(nvme_admin_identify cns=0, ctrlid=0)
  -0 [003] d.h. 9.161160: nvme_complete_rq: cmdid=14, 
qid=0, res=0, retries=0, flags=0x0, status=0
kworker/u8:0-5 [003]  9.161305: nvme_setup_admin_cmd:  
cmdid=14, flags=0x0, meta=0x0, cmd=(nvme_admin_identify cns=0, ctrlid=0)
  -0 [003] d.h. 9.161344: nvme_complete_rq: cmdid=14, 
qid=0, res=0, retries=0, flags=0x0, status=0
kworker/u8:0-5 [003]  9.161390: nvme_setup_nvm_cmd: qid=1, 
nsid=1, cmdid=718, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=0, len=7, 
ctrl=0x0, dsmgmt=0, reftag=0)
  -0 [003] d.h. 9.161578: nvme_complete_rq: cmdid=718, 
qid=1, res=0, retries=0, flags=0x0, status=0
kworker/u8:0-5 [003]  9.161608: nvme_setup_nvm_cmd: qid=1, 
nsid=1, cmdid=718, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=24, len=7, 
ctrl=0x0, dsmgmt=0, reftag=0)
  -0 [003] d.h. 9.161681: nvme_complete_rq: cmdid=718, 
qid=1, res=0, retries=0, flags=0x0, status=0
  dd-205   [001]  9.662909: nvme_setup_nvm_cmd: qid=1, 
nsid=1, cmdid=1011, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=0, len=2559, 
ctrl=0x0, dsmgmt=0, reftag=0)
  dd-205   [001]  9.662967: nvme_setup_nvm_cmd: qid=1, 
nsid=1, cmdid=1012, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=2560, 
len=1535, ctrl=0x0, dsmgmt=0, reftag=0)
  -0 [001] d.h. 9.664413: nvme_complete_rq: cmdid=1011, 
qid=1, res=0, retries=0, flags=0x0, status=0
  -0 [

Re: [RFC PATCH] vsprintf: add flag ZEROPAD handling before crng is ready

2018-01-26 Thread Andy Shevchenko

+Rasmus

On Fri, 2018-01-26 at 15:39 +0800, Yang Shunyong wrote:
> Before crng is ready, output of "%p" composes of "(ptrval)" and
> left padding spaces for alignment as no random address can be
> generated. This seems a little strange sometimes.
> For example, when irq domain names are built with "%p", the nodes
> under /sys/kernel/debug/irq/domains like this,
> 
> [root@y irq]# ls domains/
> default   irqchip@(ptrval)-2
> irqchip@(ptrval)-4  \_SB_.TCS0.QIC1  \_SB_.TCS0.QIC3
> irqchip@(ptrval)  irqchip@(ptrval)-3
> \_SB_.TCS0.QIC0 \_SB_.TCS0.QIC2
> 
> The name "irqchip@(ptrval)-2" is not so readable in console
> output.

Yes, this is not best output.

> This patch adds ZEROPAD handling in widen_string() and move_right().
> When ZEROPAD is set in spec, it will use '0' for padding. If not
> set, it will use ' '.
> This patch also sets ZEROPAD in ptr_to_id() before crgn is ready.

Did you run tests?

Have you added specific test cases to see what's going on for patterns
like

printf("%0s\n", "(my string)");

?

> Following is the output after applying the patch,
> 
> [root@y irq]# ls domains/
> default   irqchip@(ptrval)-2
> irqchip@(ptrval)-4  \_SB_.TCS0.QIC1  \_SB_.TCS0.QIC3
> irqchip@(ptrval)  irqchip@(ptrval)-3  \_SB_.TCS0.QIC0
> \_SB_.TCS0.QIC2
> 

So, for me it looks like curing symptoms. After all, it's debugfs, no
one prevents you to fix an output of the certain nodes there.

> I am not sure whether crng can get enough random data at very early
> stage of system startup (eg. before irq system in the case above)
> and the effort to port current random driver to work at that time.
> So, this issue may exist some time.
> I use '0' for padding in this patch. This should be ok because output
> of "%pK" is all '0's when kptr_restrict = 2. I don't know whether
> other character, such as 'x', may be more preferable.
> 
> Signed-off-by: Yang Shunyong 
> ---
>  lib/vsprintf.c | 19 ++-
>  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index 01c3957b2de6..e0b6e1edae31 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -535,14 +535,18 @@ char *special_hex_number(char *buf, char *end,
> unsigned long long num, int size)
>   return number(buf, end, num, spec);
>  }
>  
> -static void move_right(char *buf, char *end, unsigned len, unsigned
> spaces)
> +static void move_right(char *buf, char *end, unsigned int len,
> +unsigned int spaces, struct printf_spec spec)
>  {
>   size_t size;
> + char pad;
> +
> + pad = (spec.flags & ZEROPAD) ? '0' : ' ';
>   if (buf >= end) /* nowhere to put anything */
>   return;
>   size = end - buf;
>   if (size <= spaces) {
> - memset(buf, ' ', size);
> + memset(buf, pad, size);
>   return;
>   }
>   if (len) {
> @@ -550,7 +554,7 @@ static void move_right(char *buf, char *end,
> unsigned len, unsigned spaces)
>   len = size - spaces;
>   memmove(buf + spaces, buf, len);
>   }
> - memset(buf, ' ', spaces);
> + memset(buf, pad, spaces);
>  }
>  
>  /*
> @@ -565,18 +569,21 @@ static void move_right(char *buf, char *end,
> unsigned len, unsigned spaces)
>  char *widen_string(char *buf, int n, char *end, struct printf_spec
> spec)
>  {
>   unsigned spaces;
> + char pad;
>  
>   if (likely(n >= spec.field_width))
>   return buf;
>   /* we want to pad the sucker */
>   spaces = spec.field_width - n;
>   if (!(spec.flags & LEFT)) {
> - move_right(buf - n, end, n, spaces);
> + move_right(buf - n, end, n, spaces, spec);
>   return buf + spaces;
>   }
> +
> + pad = (spec.flags & ZEROPAD) ? '0' : ' ';
>   while (spaces--) {
>   if (buf < end)
> - *buf = ' ';
> + *buf = pad;
>   ++buf;
>   }
>   return buf;
> @@ -1702,6 +1709,8 @@ static char *ptr_to_id(char *buf, char *end,
> void *ptr, struct printf_spec spec)
>  
>   if (unlikely(!have_filled_random_ptr_key)) {
>   spec.field_width = default_width;
> + spec.flags |= ZEROPAD;
> +
>   /* string length must be less than default_width */
>   return string(buf, end, "(ptrval)", spec);
>   }

-- 
Andy Shevchenko 
Intel Finland Oy

Re: [PATCH v8 0/2] add tracepoints for nvme command submission and completion

2018-01-26 Thread Christoph Hellwig

Still gives lots of warnings:

make -j4 C=2 SUBDIRS=drivers/nvme 2>err.log
drivers/nvme/host/./trace.h:78:1: warning: cast to restricted __le64
drivers/nvme/host/./trace.h:78:1: warning: cast to restricted __le64
drivers/nvme/host/./trace.h:78:1: warning: restricted __le64 degrades to integer
drivers/nvme/host/./trace.h:78:1: warning: restricted __le64 degrades to integer
drivers/nvme/host/./trace.h:78:1: warning: cast to restricted __le32
drivers/nvme/host/./trace.h:78:1: warning: cast to restricted __le32
drivers/nvme/host/./trace.h:78:1: warning: restricted __le32 degrades to integer
drivers/nvme/host/./trace.h:78:1: warning: restricted __le32 degrades to integer
drivers/nvme/host/./trace.h:104:1: warning: cast to restricted __le32
drivers/nvme/host/./trace.h:104:1: warning: cast to restricted __le32
drivers/nvme/host/./trace.h:104:1: warning: restricted __le32 degrades to 
integer
drivers/nvme/host/./trace.h:104:1: warning: restricted __le32 degrades to 
integer
drivers/nvme/host/./trace.h:104:1: warning: cast to restricted __le64
drivers/nvme/host/./trace.h:104:1: warning: cast to restricted __le64
drivers/nvme/host/./trace.h:104:1: warning: restricted __le64 degrades to 
integer
drivers/nvme/host/./trace.h:104:1: warning: restricted __le64 degrades to 
integer
drivers/nvme/host/./trace.h:104:1: warning: cast to restricted __le32
drivers/nvme/host/./trace.h:104:1: warning: cast to restricted __le32
drivers/nvme/host/./trace.h:104:1: warning: restricted __le32 degrades to 
integer
drivers/nvme/host/./trace.h:104:1: warning: restricted __le32 degrades to 
integer
drivers/nvme/host/./trace.h:133:1: warning: cast to restricted __le64
drivers/nvme/host/./trace.h:133:1: warning: cast to restricted __le64
drivers/nvme/host/./trace.h:133:1: warning: restricted __le64 degrades to 
integer
drivers/nvme/host/./trace.h:133:1: warning: restricted __le64 degrades to 
integer

Re: [PATCH v2 1/3] dt-bindings: pinctrl: Add a reserved-gpio-ranges property

2018-01-26 Thread Andy Shevchenko

On Thu, 2018-01-25 at 17:13 -0800, Stephen Boyd wrote:
> Some qcom platforms make some GPIOs or pins unavailable for use
> by non-secure operating systems, and thus reading or writing the
> registers for those pins will cause access control issues.
> Introduce a DT property to describe the set of GPIOs that are
> available for use so that higher level OSes are able to know what
> pins to avoid reading/writing.

>   gpio-controller;
>   #gpio-cells = <2>;
>   ngpios = <18>;
> + reserved-gpio-ranges = <0 4>, <12 2>;

What about preserving namespace, i.e.

gpio-reserved-ranges vs. your variant?

-- 
Andy Shevchenko 
Intel Finland Oy

Re: [PATCH] kdb: use ktime_get_seconds() instead of ktime_get_ts()

2018-01-26 Thread Arnd Bergmann

On Fri, Jan 26, 2018 at 4:03 AM, Baolin Wang  wrote:
> The kdb code will print the monotonic time by ktime_get_ts(), but
> the ktime_get_ts() will be protected by a sequence lock, that will
> introduce one deadlock risk if the lock was already held in the
> context from which we entered the debugger.
>
> Since kdb is only interested in the second field, we can use the
> ktime_get_seconds() to get the monotonic time without a lock,
> moreover we can remove the 'struct timespec', which is not y2038
> safe.
>
> Signed-off-by: Baolin Wang 
> ---
>  kernel/debug/kdb/kdb_main.c |4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
> index 69e70f4..f0fc6f7 100644
> --- a/kernel/debug/kdb/kdb_main.c
> +++ b/kernel/debug/kdb/kdb_main.c
> @@ -2486,10 +2486,8 @@ static int kdb_kill(int argc, const char **argv)
>   */
>  static void kdb_sysinfo(struct sysinfo *val)
>  {
> -   struct timespec uptime;
> -   ktime_get_ts(&uptime);
> memset(val, 0, sizeof(*val));
> -   val->uptime = uptime.tv_sec;
> +   val->uptime = ktime_get_seconds();
> val->loads[0] = avenrun[0];
> val->loads[1] = avenrun[1];
> val->loads[2] = avenrun[2];

Using ktime_get_seconds() avoids locking problems, but I wonder what
would happen if we trigger the 'WARN_ON(timekeeping_suspended)'
from kdb. Is that a problem? If it is, we have to use ktime_get_mono_fast_ns()
and div_u64() instead.

   Arnd

[PATCH V2 1/2] ARM: dts: imx7s: add temperature monitor support

2018-01-26 Thread Anson Huang

Add i.MX7 temperature monitor support.

Signed-off-by: Anson Huang 
Acked-by: Dong Aisheng 
---
no changes since V1.
 .../devicetree/bindings/thermal/imx-thermal.txt  |  5 +++--
 arch/arm/boot/dts/imx7s.dtsi | 20 
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/thermal/imx-thermal.txt 
b/Documentation/devicetree/bindings/thermal/imx-thermal.txt
index 28be51a..9575d45 100644
--- a/Documentation/devicetree/bindings/thermal/imx-thermal.txt
+++ b/Documentation/devicetree/bindings/thermal/imx-thermal.txt
@@ -1,8 +1,9 @@
 * Temperature Monitor (TEMPMON) on Freescale i.MX SoCs
 
 Required properties:
-- compatible : "fsl,imx6q-tempmon" for i.MX6Q, "fsl,imx6sx-tempmon" for 
i.MX6SX.
-  i.MX6SX has two more IRQs than i.MX6Q, one is IRQ_LOW and the other is 
IRQ_PANIC,
+- compatible : "fsl,imx6q-tempmon" for i.MX6Q, "fsl,imx6sx-tempmon" for 
i.MX6SX,
+  "fsl,imx7-tempmon" for i.MX7S/D.
+  i.MX6SX and i.MX7S/D have two more IRQs than i.MX6Q, one is IRQ_LOW and the 
other is IRQ_PANIC,
   when temperature is below than low threshold, IRQ_LOW will be triggered, 
when temperature
   is higher than panic threshold, system will auto reboot by SRC module.
 - fsl,tempmon : phandle pointer to system controller that contains TEMPMON
diff --git a/arch/arm/boot/dts/imx7s.dtsi b/arch/arm/boot/dts/imx7s.dtsi
index 82ad26e..2e2eda53 100644
--- a/arch/arm/boot/dts/imx7s.dtsi
+++ b/arch/arm/boot/dts/imx7s.dtsi
@@ -497,9 +497,29 @@
};
 
ocotp: ocotp-ctrl@3035 {
+   #address-cells = <1>;
+   #size-cells = <1>;
compatible = "fsl,imx7d-ocotp", "syscon";
reg = <0x3035 0x1>;
clocks = <&clks IMX7D_OCOTP_CLK>;
+
+   tempmon_calib: calib@3c {
+   reg = <0x3c 0x4>;
+   };
+
+   tempmon_temp_grade: temp-grade@10 {
+   reg = <0x10 0x4>;
+   };
+   };
+
+   tempmon: tempmon {
+   compatible = "fsl,imx7-tempmon";
+   interrupts = ;
+   fsl,tempmon =<&anatop>;
+   nvmem-cells = <&tempmon_calib>,
+   <&tempmon_temp_grade>;
+   nvmem-cell-names = "calib", "temp_grade";
+   clocks = <&clks IMX7D_PLL_SYS_MAIN_CLK>;
};
 
anatop: anatop@3036 {
-- 
2.7.4

Re: [RESEND PATCH] rtc: Fix overflow when converting time64_t to rtc_time

2018-01-26 Thread Arnd Bergmann

On Fri, Jan 26, 2018 at 6:06 AM, Baolin Wang  wrote:
> If we convert one large time values to rtc_time, in the original formula
> 'days * 86400' can be overflowed in 'unsigned int' type to make the formula
> get one incorrect remain seconds value. Thus we can use div_s64_rem()
> function to avoid this situation.
>
> Signed-off-by: Baolin Wang 

Acked-by: Arnd Bergmann

Re: [PATCHv2] musb_host: fix lockup on rxcsr_h_error

2018-01-26 Thread Maxim Uvarov

Bin,

I looked to my local git and code does not have this latest line "goto
finish".  It was tested without it and everything worked. Right now I
can not get access to that hardware to check with and without. But
only can confirm that without "goto finish" function works with bunch
of drivers (usb ethernet, hids, hdd).

Best regards,
Maxim.

2018-01-25 19:31 GMT+03:00 Bin Liu :
> Maxim,
>
> On Thu, Jan 25, 2018 at 07:24:02PM +0300, Maxim Uvarov wrote:
>> [1] says that issue is with back ported driver to 3.12.10. Can the
>> latest kernel be tested on the same hw?
>
> Agreed that it should be tested with the latest kernel. But my concern
> now is if stopping scheduling urbs on errors is a right thing to do,
> that is why I asked if you can re-test your usecase with reverting the
> commit. I am unable to reproduce the original issue you had.
>
> Thanks,
> -Bin.

-- 
Best regards,
Maxim Uvarov

[PATCH v2] arm: dts: ls1021a: Add configure-gfladj property to USB3 node

2018-01-26 Thread yinbo.zhu

From: yinbo.zhu 

Add "configure-gfladj" boolean property to USB3 node. This property
is used to determine whether frame length adjustent is required
or not

Signed-off-by: Ramneek Mehresh 
Signed-off-by: Honghua Yin 
Signed-off-by: yinbo zhu 
---
Change in v2:
Remove the automatic generated tag from Gerrit.
Change "yinbo.zhu" to "yinbo zhu".
 arch/arm/boot/dts/ls1021a.dtsi |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/arm/boot/dts/ls1021a.dtsi b/arch/arm/boot/dts/ls1021a.dtsi
index 9319e1f..ae8fc40 100644
--- a/arch/arm/boot/dts/ls1021a.dtsi
+++ b/arch/arm/boot/dts/ls1021a.dtsi
@@ -681,6 +681,7 @@
reg = <0x0 0x310 0x0 0x1>;
interrupts = ;
dr_mode = "host";
+   configure-gfladj;
snps,quirk-frame-length-adjustment = <0x20>;
snps,dis_rxdet_inp3_quirk;
};
-- 
1.7.1

[PATCH V2 2/2] thermal: imx: add i.MX7 thermal sensor support

2018-01-26 Thread Anson Huang

This patch adds i.MX7 thermal sensor support, most
of the i.MX7 thermal sensor functions are same with
i.MX6 except the registers offset/layout, so we move
those registers offset/layout definitions to soc data
structure.

i.MX7 uses single calibration data @25C, the calibration
data is located at OCOTP offset 0x4F0, bit[17:9], the
formula is as below:

Tmeas = (Nmeas - n1) + 25; n1 is the fuse value for 25C.

Signed-off-by: Anson Huang 
Signed-off-by: Bai Ping 
---
changes since V1:
1. remove MX7 operation in imx_set_panic_temp since MX7 does NOT 
support it;
2. add const for struct imx_thermal_data in imx_init_from_tempmon_data;
3. add .low_alarm_ctrl for MX6SX and use it in .probe function.
 drivers/thermal/imx_thermal.c | 314 +-
 1 file changed, 247 insertions(+), 67 deletions(-)

diff --git a/drivers/thermal/imx_thermal.c b/drivers/thermal/imx_thermal.c
index e7d4ffc..c95fa82 100644
--- a/drivers/thermal/imx_thermal.c
+++ b/drivers/thermal/imx_thermal.c
@@ -31,34 +31,58 @@
 #define REG_CLR0x8
 #define REG_TOG0xc
 
-#define MISC0  0x0150
-#define MISC0_REFTOP_SELBIASOFF(1 << 3)
-#define MISC1  0x0160
-#define MISC1_IRQ_TEMPHIGH (1 << 29)
+/* i.MX6 specific */
+#define IMX6_MISC0 0x0150
+#define IMX6_MISC0_REFTOP_SELBIASOFF   (1 << 3)
+#define IMX6_MISC1 0x0160
+#define IMX6_MISC1_IRQ_TEMPHIGH(1 << 29)
 /* Below LOW and PANIC bits are only for TEMPMON_IMX6SX */
-#define MISC1_IRQ_TEMPLOW  (1 << 28)
-#define MISC1_IRQ_TEMPPANIC(1 << 27)
-
-#define TEMPSENSE0 0x0180
-#define TEMPSENSE0_ALARM_VALUE_SHIFT   20
-#define TEMPSENSE0_ALARM_VALUE_MASK(0xfff << TEMPSENSE0_ALARM_VALUE_SHIFT)
-#define TEMPSENSE0_TEMP_CNT_SHIFT  8
-#define TEMPSENSE0_TEMP_CNT_MASK   (0xfff << TEMPSENSE0_TEMP_CNT_SHIFT)
-#define TEMPSENSE0_FINISHED(1 << 2)
-#define TEMPSENSE0_MEASURE_TEMP(1 << 1)
-#define TEMPSENSE0_POWER_DOWN  (1 << 0)
-
-#define TEMPSENSE1 0x0190
-#define TEMPSENSE1_MEASURE_FREQ0x
-/* Below TEMPSENSE2 is only for TEMPMON_IMX6SX */
-#define TEMPSENSE2 0x0290
-#define TEMPSENSE2_LOW_VALUE_SHIFT 0
-#define TEMPSENSE2_LOW_VALUE_MASK  0xfff
-#define TEMPSENSE2_PANIC_VALUE_SHIFT   16
-#define TEMPSENSE2_PANIC_VALUE_MASK0xfff
+#define IMX6_MISC1_IRQ_TEMPLOW (1 << 28)
+#define IMX6_MISC1_IRQ_TEMPPANIC   (1 << 27)
+
+#define IMX6_TEMPSENSE00x0180
+#define IMX6_TEMPSENSE0_ALARM_VALUE_SHIFT  20
+#define IMX6_TEMPSENSE0_ALARM_VALUE_MASK   (0xfff << 20)
+#define IMX6_TEMPSENSE0_TEMP_CNT_SHIFT 8
+#define IMX6_TEMPSENSE0_TEMP_CNT_MASK  (0xfff << 8)
+#define IMX6_TEMPSENSE0_FINISHED   (1 << 2)
+#define IMX6_TEMPSENSE0_MEASURE_TEMP   (1 << 1)
+#define IMX6_TEMPSENSE0_POWER_DOWN (1 << 0)
+
+#define IMX6_TEMPSENSE10x0190
+#define IMX6_TEMPSENSE1_MEASURE_FREQ   0x
+#define IMX6_TEMPSENSE1_MEASURE_FREQ_SHIFT 0
 
-#define OCOTP_MEM0 0x0480
-#define OCOTP_ANA1 0x04e0
+/* Below TEMPSENSE2 is only for TEMPMON_IMX6SX */
+#define IMX6_TEMPSENSE20x0290
+#define IMX6_TEMPSENSE2_LOW_VALUE_SHIFT0
+#define IMX6_TEMPSENSE2_LOW_VALUE_MASK 0xfff
+#define IMX6_TEMPSENSE2_PANIC_VALUE_SHIFT  16
+#define IMX6_TEMPSENSE2_PANIC_VALUE_MASK   0xfff
+
+/* i.MX7 specific */
+#define IMX7_ANADIG_DIGPROG0x800
+#define IMX7_TEMPSENSE00x300
+#define IMX7_TEMPSENSE0_PANIC_ALARM_SHIFT  18
+#define IMX7_TEMPSENSE0_PANIC_ALARM_MASK   (0x1ff << 18)
+#define IMX7_TEMPSENSE0_HIGH_ALARM_SHIFT   9
+#define IMX7_TEMPSENSE0_HIGH_ALARM_MASK(0x1ff << 9)
+#define IMX7_TEMPSENSE0_LOW_ALARM_SHIFT0
+#define IMX7_TEMPSENSE0_LOW_ALARM_MASK 0x1ff
+
+#define IMX7_TEMPSENSE10x310
+#define IMX7_TEMPSENSE1_MEASURE_FREQ_SHIFT 16
+#define IMX7_TEMPSENSE1_MEASURE_FREQ_MASK  (0x << 16)
+#define IMX7_TEMPSENSE1_FINISHED   (1 << 11)
+#define IMX7_TEMPSENSE1_MEASURE_TEMP   (1 << 10)
+#define IMX7_TEMPSENSE1_POWER_DOWN (1 << 9)
+#define IMX7_TEMPSENSE1_TEMP_VALUE_SHIFT   0
+#define IMX7_TEMPSENSE1_TEMP_VALUE_MASK0x1ff
+
+#define IMX6_OCOTP_MEM00x0480
+#define IMX6_OCOTP_ANA10x04e0
+#define IMX7_OCOTP_TESTER3 0x0440
+#define IMX7_OCOTP_ANA10x04f0
 
 /* The driver supports 1 passive trip point and 1 c

Re: [RFC PATCH 00/16] PTI support for x86-32

2018-01-26 Thread Krzysztof Mazur

On Thu, Jan 25, 2018 at 02:09:40PM -0800, Nadav Amit wrote:
> The PoC apparently does not work with 3GB of memory or more on 32-bit. Does
> you setup has more? Can you try the attack while setting max_addr=1G ?

No, I tested on:

Pentium M (Dothan): 1.5 GB RAM, PAE for NX, 2GB/2GB split

CONFIG_NOHIGHMEM=y
CONFIG_VMSPLIT_2G=y
CONFIG_PAGE_OFFSET=0x8000
CONFIG_X86_PAE=y

and

Xeon (Pentium 4): 2 GB RAM, no PAE, 1.75GB/2.25GB split
CONFIG_NOHIGHMEM=y
CONFIG_VMSPLIT_2G_OPT=y
CONFIG_PAGE_OFFSET=0x7800

Now I'm testing with standard settings on
Pentium M: 1.5 GB RAM, no PAE, 3GB/1GB split, ~890 MB RAM available

CONFIG_NOHIGHMEM=y
CONFIG_PAGE_OFFSET=0xc000
CONFIG_X86_PAE=n

and it still does not work.

reliability from https://github.com/IAIK/meltdown reports 0.38%
(1/256 = 0.39%, "true" random), and other libkdump tools does not work.

https://github.com/paboldin/meltdown-exploit (on linux_proc_banner
symbol) reports:
cached = 46, uncached = 515, threshold 153
read c0897020 = ff   (score=0/1000)
read c0897021 = ff   (score=0/1000)
read c0897022 = ff   (score=0/1000)
read c0897023 = ff   (score=0/1000)
read c0897024 = ff   (score=0/1000)
NOT VULNERABLE

and my exploit with:

for (i = 0; i < 256; i++) {
unsigned char *px = p + (i << 12);

t = rdtsc();
readb(px);
t = rdtsc() - t;
if (t < 100)
printf("%02x %lld\n", i, t);
}

loop returns only "00 45". When I change the exploit code (now based
on paboldin code to be sure) to:

movzx (%[addr]), %%eax
movl $0xaa, %%eax
shl $12, %%eax
movzx (%[target], %%eax), %%eax

I always get "0xaa 51", so the CPU is speculatively executing the second
load with (0xaa << 12) in eax, and without the movl instruction, eax seems
to be always 0. I even tried to remove the shift:

movzx (%[addr]), %%eax
movzx (%[target], %%eax), %%eax

and I've been reading known value (from /dev/mem, for instance 0x20),
I've modified target array offset, and the CPU is still touching "wrong"
cacheline, eax == 0 instead of 0x20. I've also tested movl instead
of movzx (with and 0xff).

On Core 2 Quad in 64-bit mode everything works as expected, vulnerable
to Meltdown (I did not test it in 32-bit mode). I don't have any Core
"1" to test.

On that Pentium M syscall slowdown caused by PTI is huge, 7.5 times slower
(7 times compared to patched kernel with disabled PTI), on Skylake with
PCID the same trivial benchmark is "only" 3.5 times slower (and 5.2
times slower without PCID).

Krzysiek

Re: [PATCH net-next 0/3 V1] rtnetlink: enable IFLA_IF_NETNSID for RTM_{DEL,SET}LINK

2018-01-26 Thread Nicolas Dichtel

Le 26/01/2018 à 09:36, Jiri Benc a écrit :
> On Fri, 26 Jan 2018 00:34:51 +0100, Nicolas Dichtel wrote:
>> Why meaningful? The user knows that the answer is like if if was done in 
>> another
>> netns. It enables to have only one netlink socket instead of one per netns. 
>> But
>> the code using it will be the same.  
> 
> Because you can't use it to query the linked interface. You can't even
> use it as an opaque value to track interfaces (netnsid+ifindex) because
> netnsids are not unique across net name spaces. You can easily have two
> interfaces that have all the ifindex, ifname, netnsid (and basically
> everything else) identical but being completely different interfaces.
Yes, the user have to map those info correctly. And this complexifies the (user)
code a lot.

> That's really not helpful.
> 
>> I fear that with your approach, it will results to a lot of complexity in the
>> kernel.  
> 
> The complexity is (at least partly) already there. It's an inevitable
> result of the design decision to have relative identifiers.
Yes, you're right. My approach moves the complexity to the user, which make this
feature hard to use.

> 
> I agree that we should think about how to make this easy to implement.
> I like your idea of doing this somehow generically. Perhaps it's
> possible to do while keeping the netnsids valid in the caller's netns?
Yes. I agree that it will be a lot easier to use if the conversion is done in
the kernel. And having a generic mechanism will also help a lot to use it.

> 
>> What is really missing for me, is a way to get a fd from an nsid. The user
>> should be able to call RTM_GETNSID with an fd and a nsid and the kernel 
>> performs
>> the needed operations so that the fd points to the corresponding netns.  
> 
> That's what I was missing, too. I even looked into implementing it. But
> opening a fd on behalf of the process and returning it over netlink is a
> wrong thing to do. Netlink messages can get lost. Then you have a fd
> leak you can do nothing about.
Yes, I also looked at this ;-)

> 
> Given that we have netnsids used for so much stuff already (like
> NETLINK_LISTEN_ALL_NSID) you need to track them anyway. And if you need
> to track them, why bother with another identifier? It would be better
> if netnsid can be used universally for anything. Then there will be no
> need for the conversion.
I like this idea a lot. So the missing part is a setns() using the nsid ;-)


Regards,
Nicolas

Re: [PATCH v2 1/2] x86/mm/64: Fix vmapped stack syncing on very-large-memory 4-level systems

2018-01-26 Thread Ingo Molnar

* Andy Lutomirski  wrote:

> What I'd really like to see is an entirely different API.  Maybe:
> 
> typedef struct {
>   opaque, but probably includes:
>   int depth;  /* 0 is root */
>   void *table;
> } ptbl_ptr;
> 
> ptbl_ptr root_table = mm_root_ptbl(mm);
> 
> set_ptbl_entry(root_table, pa, prot);
> 
> /* walk tables */
> ptbl_ptr pt = ...;
> ptentry_ptr entry;
> while (ptbl_has_children(pt)) {
>   pt = pt_next(pt, addr);
> }
> entry = pt_entry_at(pt, addr);
> /* do something with entry */
> 
> etc.
> 
> Now someone can add a sixth level without changing every code path in
> the kernel that touches page tables.

Iteration based page table lookups would be neat.

A sixth level is unavoidable on x86-64 I think - we'll get there in a decade or 
so? The sixth level will also use up the last ~8 bits of virtual memory 
available 
on 64-bit.

Thanks,

Ingo

Re: [PATCH 2/2] blk-mq: simplify queue mapping & schedule with each possisble CPU

2018-01-26 Thread Ming Lei

Hi Jianchao,

On Fri, Jan 19, 2018 at 11:05:35AM +0800, jianchao.wang wrote:
> Hi ming
> 
> Sorry for delayed report this.
> 
> On 01/17/2018 05:57 PM, Ming Lei wrote:
> > 2) hctx->next_cpu can become offline from online before 
> > __blk_mq_run_hw_queue
> > is run, there isn't warning, but once the IO is submitted to hardware,
> > after it is completed, how does the HBA/hw queue notify CPU since CPUs
> > assigned to this hw queue(irq vector) are offline? blk-mq's timeout
> > handler may cover that, but looks too tricky.
> 
> In theory, the irq affinity will be migrated to other cpu. This is done by

Yes, but the other CPU should belong to this irq's affinity, and if all
CPUs in the irq's affinity is DEAD, this irq vector will be shutdown,
and if there is in-flight IO or will be, then the completion for this
IOs won't be delivered to CPUs. And now seems we depend on queue's timeout
handler to handle them.

> fixup_irqs() in the context of stop_machine.

> However, in my test, I found this log:
> 
> [  267.161043] do_IRQ: 7.33 No irq handler for vector
> 
> The 33 is the vector used by nvme cq.
> The irq seems to be missed and sometimes IO hang occurred.

As I mentioned above, it shouldn't be strange to see in CPU offline/online
stress test.

-- 
Ming

[PATCH] block: aoenet: Replace GFP_ATOMIC with GFP_KERNEL in aoenet_rcv

2018-01-26 Thread Jia-Ju Bai

After checking all possible call chains to aoenet_rcv(),
my tool finds that aoenet_rcv() is never called in atomic context, 
namely never in an interrupt handler or holding a spinlock.
Thus GFP_ATOMIC is not necessary, and it can be replaced with GFP_KERNEL.

This is found by a static analysis tool named DCNS written by myself.

Signed-off-by: Jia-Ju Bai 
---
 drivers/block/aoe/aoenet.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/aoe/aoenet.c b/drivers/block/aoe/aoenet.c
index 63773a9..d5fff7a 100644
--- a/drivers/block/aoe/aoenet.c
+++ b/drivers/block/aoe/aoenet.c
@@ -138,7 +138,7 @@ static int __init aoe_iflist_setup(char *str)
if (dev_net(ifp) != &init_net)
goto exit;
 
-   skb = skb_share_check(skb, GFP_ATOMIC);
+   skb = skb_share_check(skb, GFP_KERNEL);
if (skb == NULL)
return 0;
if (!is_aoe_netif(ifp))
-- 
1.7.9.5

Re: [PATCH 4.4 0/4] Backport missing sccurity and deadlock fix

2018-01-26 Thread Greg KH

On Thu, Jan 25, 2018 at 11:37:40AM -0700, Shuah Khan wrote:
> As I started backporting security fixes, I found a deadlock bug that was
> fixed in a later release. This patch series contains backports for all
> these problems.

All now queued up, thanks for the backports.

greg k-h

[PATCH] kvm: x86: remove efer_reload entry in kvm_vcpu_stat

2018-01-26 Thread Longpeng(Mike)

The efer_reload is never used since
commit 26bb0981b3ff ("KVM: VMX: Use shared msr infrastructure"),
so remove it.

Signed-off-by: Longpeng(Mike) 
---
 arch/x86/include/asm/kvm_host.h | 1 -
 arch/x86/kvm/x86.c  | 1 -
 2 files changed, 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5167984..b24b34d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -883,7 +883,6 @@ struct kvm_vcpu_stat {
u64 request_irq_exits;
u64 irq_exits;
u64 host_state_reload;
-   u64 efer_reload;
u64 fpu_reload;
u64 insn_emulation;
u64 insn_emulation_fail;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c53298d..6573526 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -177,7 +177,6 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ "request_irq", VCPU_STAT(request_irq_exits) },
{ "irq_exits", VCPU_STAT(irq_exits) },
{ "host_state_reload", VCPU_STAT(host_state_reload) },
-   { "efer_reload", VCPU_STAT(efer_reload) },
{ "fpu_reload", VCPU_STAT(fpu_reload) },
{ "insn_emulation", VCPU_STAT(insn_emulation) },
{ "insn_emulation_fail", VCPU_STAT(insn_emulation_fail) },
-- 
1.8.3.1

Re: [PATCH AUTOSEL for 4.14 006/100] KVM: nVMX/nSVM: Don't intercept #UD when running L2

2018-01-26 Thread Greg KH

On Thu, Jan 25, 2018 at 11:35:11AM -0500, Paolo Bonzini wrote:
> 
> > > Just wanted stable maintainers to note that Jim, Paolo & myself decided
> > > eventually to revert this commit along with commit ae1f57670703 on
> > > upstream KVM. However, it is true that this commit makes commit
> > > ae1f57670703 more complete. Therefore we have 2 options here:
> > > 1) Apply this backport and sometime in the future also apply the reverts 
> > > of
> > > both these commits with Paolo's commit which reverts them.
> > 
> > Being "bug compatible" is good, I like that option :)
> 
> It's not even a bug, just different behavior and in the end it turns out to be
> less surprising if we revert.  So even better. :)
> 
> > When is the revert patch going to hit Linus's tree?  During the 4.16-rc1
> > merge window?
> 
> It's already there, commit ac9b305caa.  But since this one was not marked
> for stable, ac9b305caa wasn't either.

Ok, Sasha can you pick up both of these patches as well?  That way we
end up in sync with what is in Linus's tree.

thanks,

greg k-h

Re: [PATCH v2 2/3] gpiolib-of: Support 'reserved-gpio-ranges' property

2018-01-26 Thread Andy Shevchenko

On Thu, 2018-01-25 at 17:13 -0800, Stephen Boyd wrote:
> Some qcom platforms make some GPIOs or pins unavailable for use
> by non-secure operating systems, and thus reading or writing the
> registers for those pins will cause access control issues.  Add
> support for a DT property to describe the set of GPIOs that are
> available for use so that higher level OSes are able to know what
> pins to avoid reading/writing.
> 
> For now, we plumb this into the gpiochip irq APIs so that
> GPIO/pinctrl drivers can use the gpiochip_irqchip_irq_valid() to
> test validity of GPIOs.


> +static void of_gpiochip_init_irq_valid_mask(struct gpio_chip *chip)
> +{

> + int len, i;
> + u32 start, count;
> + struct device_node *np = chip->of_node;

Perhaps reversed tree style? (In the following function as well)

> + len = of_property_count_u32_elems(np,  "reserved-gpio-
> ranges");
> 

> + for (i = 0; i < len; i += 2) {
> + of_property_read_u32_index(np, "reserved-gpio-
> ranges",
> +i, &start);
> + of_property_read_u32_index(np, "reserved-gpio-
> ranges",
> +i + 1, &count);

of_find_property() + of_prop_next_u32() ?

> + if (size > 0 && size % 2 == 0)
> + gpiochip->irq.need_valid_mask = true;

 ffs(size) >= 2 ?


-- 
Andy Shevchenko 
Intel Finland Oy

[PATCH] perf tools: fix spelling mistake: "successfull"-> "successful"

2018-01-26 Thread Colin King

From: Colin Ian King 

Trivial fix to spelling mistakes in pr_debug message text.

Signed-off-by: Colin Ian King 
---
 tools/perf/util/bpf-loader.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 72c107fcbc5a..8062e1db0eb2 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -99,7 +99,7 @@ struct bpf_object *bpf__prepare_load(const char *filename, 
bool source)
if (err)
return ERR_PTR(-BPF_LOADER_ERRNO__COMPILE);
} else
-   pr_debug("bpf: successfull builtin compilation\n");
+   pr_debug("bpf: successful builtin compilation\n");
obj = bpf_object__open_buffer(obj_buf, obj_buf_sz, filename);
 
if (!IS_ERR(obj) && llvm_param.dump_obj)
-- 
2.15.1

Re: [PATCH v4 6/7] x86/cpufeature: Blacklist SPEC_CTRL on early Spectre v2 microcodes

2018-01-26 Thread Ingo Molnar


* David Woodhouse  wrote:

> On Thu, 2018-01-25 at 12:34 +0100, Thomas Gleixner wrote:
> > 
> > This stuff is really a master piece of trainwreck engineering.
> > 
> > So yeah, whatever we do we end up with a proper mess. Lets go for a
> > blacklist and hope that we'll have something which holds at some
> > foreseeable day in the future.
> > 
> > The other concern I have is IBRS vs. IBPB. Are we sufficiently sure that
> > IBPB is working on those IBRS blacklisted ucode revisions? Or should we
> > just play safe and not touch any of this at all when we detect a
> > blacklisted one?
> 
> That isn't sufficiently clear to me. I've changed it back to blacklist
> *everything* for now, to be safe. If at any point Intel want to get
> their act together and give us coherent information to the contrary, we
> can change to separate IBPB/IBRS blacklists.

Yes.

I also agree that blacklists are the fundamentally correct approach here: a 
bit-rotting blacklist is far better to users than a bit-rotting whitelist, 
assuming that the number of CPU and microcode bugs goes down with time.

Thanks,

Ingo

Re: [RFC PATCH] vsprintf: add flag ZEROPAD handling before crng is ready

2018-01-26 Thread Rasmus Villemoes

On 26 January 2018 at 10:17, Andy Shevchenko
 wrote:
> +Rasmus

Thanks.

> On Fri, 2018-01-26 at 15:39 +0800, Yang Shunyong wrote:
>> Before crng is ready, output of "%p" composes of "(ptrval)" and
>> left padding spaces for alignment as no random address can be
>> generated. This seems a little strange sometimes.
>> For example, when irq domain names are built with "%p", the nodes
>> under /sys/kernel/debug/irq/domains like this,
>>
>> [root@y irq]# ls domains/
>> default   irqchip@(ptrval)-2
>> irqchip@(ptrval)-4  \_SB_.TCS0.QIC1  \_SB_.TCS0.QIC3
>> irqchip@(ptrval)  irqchip@(ptrval)-3
>> \_SB_.TCS0.QIC0 \_SB_.TCS0.QIC2
>>
>> The name "irqchip@(ptrval)-2" is not so readable in console
>> output.
>
> Yes, this is not best output.
>
>> This patch adds ZEROPAD handling in widen_string() and move_right().
>> When ZEROPAD is set in spec, it will use '0' for padding. If not
>> set, it will use ' '.
>> This patch also sets ZEROPAD in ptr_to_id() before crgn is ready.

Yew.

> Have you added specific test cases to see what's going on for patterns
> like
>
> printf("%0s\n", "(my string)");

[That's not really relevant, since we'll never have those (gcc says
"warning: '0' flag used with ‘%s’").]

>> @@ -1702,6 +1709,8 @@ static char *ptr_to_id(char *buf, char *end,
>> void *ptr, struct printf_spec spec)
>>
>>   if (unlikely(!have_filled_random_ptr_key)) {
>>   spec.field_width = default_width;
>> + spec.flags |= ZEROPAD;
>> +
>>   /* string length must be less than default_width */
>>   return string(buf, end, "(ptrval)", spec);
>>   }

So why not just use a string literal with the right width to begin
with, e.g. =(ptrval)= or whatever manual padding left or right
seems appropriate. Space-padding is not nice, but 0-padding isn't much
better. That way you only affect the uncommon case of %p before
have_filled_random_ptr_key instead of adding a few instructions to all
%s users.

While at it, it may be worth looking into whether the irqdomain output
actually needs the @%p thing or if one could improve that instead.

Rasmus

Re: [PATCH v5 08/13] iommu/rockchip: Control clocks needed to access the IOMMU

2018-01-26 Thread JeffyChen


Hi Robin,

Thanks for your reply.

On 01/24/2018 09:49 PM, Robin Murphy wrote:


+Optional properties:
+- clocks : A list of master clocks requires for the IOMMU to be
accessible


s/requires/required/

ok



+   by the host CPU. The number of clocks depends on the master
+   block and might as well be zero. See [1] for generic clock


Oops, some subtleties of English here :)

To say "the number of clocks ... might as well be zero" effectively
implies "there's no point ever specifying any clocks". I guess what you
really mean here is "...might well be...", i.e. it is both valid and
reasonably likely to require zero clocks.

ok



+   bindings description.
+
+[1] Documentation/devicetree/bindings/clock/clock-bindings.txt
  Optional properties:
  - rockchip,disable-mmu-reset : Don't use the mmu reset operation.
@@ -27,5 +34,6 @@ Example:
  reg = <0xff940300 0x100>;
  interrupts = ;
  interrupt-names = "vopl_mmu";
+clocks = <&cru ACLK_VOP1>, <&cru DCLK_VOP1>, <&cru HCLK_VOP1>;
  #iommu-cells = <0>;
  };
diff --git a/drivers/iommu/rockchip-iommu.c
b/drivers/iommu/rockchip-iommu.c
index c4131ca792e0..8a5e2a659b67 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -4,6 +4,7 @@
   * published by the Free Software Foundation.
   */
+#include 
  #include 
  #include 
  #include 
@@ -91,6 +92,8 @@ struct rk_iommu {
  struct device *dev;
  void __iomem **bases;
  int num_mmu;
+struct clk_bulk_data *clocks;
+int num_clocks;
  bool reset_disabled;
  struct iommu_device iommu;
  struct list_head node; /* entry in rk_iommu_domain.iommus */
@@ -450,6 +453,38 @@ static int rk_iommu_force_reset(struct rk_iommu
*iommu)
  return 0;
  }
+static int rk_iommu_of_get_clocks(struct rk_iommu *iommu)
+{
+struct device_node *np = iommu->dev->of_node;
+int ret;
+int i;
+
+ret = of_count_phandle_with_args(np, "clocks", "#clock-cells");
+if (ret == -ENOENT)
+return 0;
+else if (ret < 0)
+return ret;
+
+iommu->num_clocks = ret;
+iommu->clocks = devm_kcalloc(iommu->dev, iommu->num_clocks,
+ sizeof(*iommu->clocks), GFP_KERNEL);
+if (!iommu->clocks)
+return -ENOMEM;
+
+for (i = 0; i < iommu->num_clocks; ++i) {
+iommu->clocks[i].clk = of_clk_get(np, i);
+if (IS_ERR(iommu->clocks[i].clk)) {
+ret = PTR_ERR(iommu->clocks[i].clk);
+goto err_clk_put;
+}
+}


Just to confirm my understanding from a quick scan through the code, the
reason we can't use clk_bulk_get() here is that currently, clocks[i].id
being NULL means we'd end up just getting the first clock multiple
times, right?

right, without a valid name, it would return the first clock.

/* Walk up the tree of devices looking for a clock that matches */
while (np) {
int index = 0;

/*
 * For named clocks, first look up the name in the
 * "clock-names" property.  If it cannot be found, then
 * index will be an error code, and of_clk_get() will fail.
 */
if (name)
index = of_property_match_string(np, "clock-names", name);
clk = __of_clk_get(np, index, dev_id, name);




I guess there could be other users who also want "just get whatever
clocks I have" functionality, so it might be worth proposing that for
the core API as a separate/follow-up patch, but it definitely doesn't
need to be part of this series.

right, i can try to do it later :)


I really don't know enough about correct clk API usage, but modulo the
binding comments it certainly looks nice and tidy now;

Acked-by: Robin Murphy 

thanks.


Thanks,
Robin.

Re: [PATCH v4] Support intel-vbtn based tablet mode switch

2018-01-26 Thread Marco Martin

On martedì 23 gennaio 2018 16:18:24 CET Marco Martin wrote:
> Some laptops such as Dell Inspiron 7000 series have the
> tablet mode switch implemented in Intel ACPI,
> the events to enter and exit the tablet mode are 0xCC and 0xCD
> 
> CC: platform-driver-...@vger.kernel.org
> CC: Matthew Garrett 
> CC: "Pali Rohár" 
> CC: Darren Hart 
> CC: Mario Limonciello 
> CC: Andy Shevchenko 
> 
> Signed-off-by: Marco Martin 
> ---
>  drivers/platform/x86/intel-vbtn.c | 21 +
>  1 file changed, 21 insertions(+)
> 
> diff --git a/drivers/platform/x86/intel-vbtn.c
> b/drivers/platform/x86/intel-vbtn.c index 58c5ff3..64b4b34 100644
> --- a/drivers/platform/x86/intel-vbtn.c
> +++ b/drivers/platform/x86/intel-vbtn.c
> @@ -26,6 +26,9 @@
>  #include 
>  #include 
> 
> +/* When NOT in tablet mode, VBDS has the flag 0x40 */
> +#define TABLET_MODE_FLAG 0x40
> +
>  MODULE_LICENSE("GPL");
>  MODULE_AUTHOR("AceLan Kao");
> 
> @@ -42,6 +45,8 @@ static const struct key_entry intel_vbtn_keymap[] = {
>   { KE_IGNORE, 0xC5, { KEY_VOLUMEUP } },  /* volume-up key 
> release */
>   { KE_KEY, 0xC6, { KEY_VOLUMEDOWN } },   /* volume-down key 
> press */
>   { KE_IGNORE, 0xC7, { KEY_VOLUMEDOWN } },/* volume-down key 
> release */
> + { KE_SW,  0xCC, { .sw = { SW_TABLET_MODE, 1 } } }, /* Tablet mode in */
> + { KE_SW,  0xCD, { .sw = { SW_TABLET_MODE, 0 } } }, /* Tablet mode out */
>   { KE_END },
>  };
> 
> @@ -88,6 +93,7 @@ static void notify_handler(acpi_handle handle, u32 event,
> void *context)
> 
>  static int intel_vbtn_probe(struct platform_device *device)
>  {
> + struct acpi_buffer vgbs_output = { ACPI_ALLOCATE_BUFFER, NULL };
>   acpi_handle handle = ACPI_HANDLE(&device->dev);
>   struct intel_vbtn_priv *priv;
>   acpi_status status;
> @@ -110,6 +116,21 @@ static int intel_vbtn_probe(struct platform_device
> *device) return err;
>   }
> 
> + status = acpi_evaluate_object(handle, "VGBS", NULL, &vgbs_output);
> + /* VGBS being present and returning something means
> +  * we have a tablet mode switch
> +  */
> + if (ACPI_SUCCESS(status)) {
> + union acpi_object *obj = vgbs_output.pointer;
> +
> + if (obj && obj->type == ACPI_TYPE_INTEGER) {
> + input_set_capability(priv->input_dev, EV_SW, 
> SW_TABLET_MODE);
> + input_report_switch(priv->input_dev,
> + SW_TABLET_MODE,
> + 
> !(obj->integer.value & TABLET_MODE_FLAG));
> + }
> + }
> +
>   status = acpi_install_notify_handler(handle,
>ACPI_DEVICE_NOTIFY,
>notify_handler,

Is there still something to change in this version of the patch?

-- 
Marco Martin

Re: [PATCH v19 03/10] video: backlight: Add of_find_backlight helper in backlight.c

2018-01-26 Thread Lee Jones

On Wed, 24 Jan 2018, Meghana Madhyastha wrote:

> Add of_find_backlight, a helper function which is a generic version
> of tinydrm_of_find_backlight that can be used by other drivers to avoid
> repetition of code and simplify things.
> 
> Acked-by: Daniel Thompson 
> Reviewed-by: Noralf Trønnes 
> Reviewed-by: Sean Paul
> Signed-off-by: Meghana Madhyastha 

Nit: These should be in chronological order.

> ---
>  drivers/video/backlight/backlight.c | 43 
> +
>  include/linux/backlight.h   | 19 
>  2 files changed, 62 insertions(+)

-- 
Lee Jones
Linaro Services Technical Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog

Re: [Patch v1 3/8] ACPI / LPIT: Export lpit_read_residency_count_address()

2018-01-26 Thread Andy Shevchenko

On Fri, Jan 19, 2018 at 10:58 AM, Rajneesh Bhardwaj
 wrote:
> From: Srinivas Pandruvada 
>
> Export lpit_read_residency_count_address(), so that it can be used from
> drivers built as module. With the recent changes, the builtin_pci
> functionality of the intel_pmc_core driver is removed and now it can be
> built as a module to read this exported interface to calculate the PMC base
> address.
>

This needs Ack from ACPI maintainer(s).

Rafael, are you OK with exporting this method?

> Cc: Rafael J. Wysocki 
> Cc: Len Brown 
> Cc: linux-a...@vger.kernel.org
>
> Tested-by: Rajneesh Bhardwaj 
> Signed-off-by: Srinivas Pandruvada 
> ---
>
>  drivers/acpi/acpi_lpit.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/acpi/acpi_lpit.c b/drivers/acpi/acpi_lpit.c
> index e94e478dd18b..cf4fc0161164 100644
> --- a/drivers/acpi/acpi_lpit.c
> +++ b/drivers/acpi/acpi_lpit.c
> @@ -100,6 +100,7 @@ int lpit_read_residency_count_address(u64 *address)
>
> return 0;
>  }
> +EXPORT_SYMBOL_GPL(lpit_read_residency_count_address);
>
>  static void lpit_update_residency(struct lpit_residency_info *info,
>  struct acpi_lpit_native *lpit_native)
> --
> 2.7.4
>



-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH 02/24] objtool: Add retpoline validation

2018-01-26 Thread David Woodhouse

On Tue, 2018-01-23 at 16:25 +0100, Peter Zijlstra wrote:
> 
> +   if (insn->type != INSN_JUMP_DYNAMIC &&
> +   insn->type != INSN_CALL_DYNAMIC) {
> +   WARN_FUNC("retpoline_safe hint not a indirect 
> jump/call",
> + insn->sec, insn->offset);
> +   return -1;

...

case 0xff:
if (modrm_reg == 2 || modrm_reg == 3)

*type = INSN_CALL_DYNAMIC;

else if (modrm_reg == 4)

*type = INSN_JUMP_DYNAMIC;

else if (modrm_reg == 5)

/* jmpf */
*type = INSN_CONTEXT_SWITCH;

I *think* your check includes far calls (FF/3), although not far jumps?
It shouldn't, because I don't believe far calls are subject to the same
speculation?

Other than that, which you can probably ignore if you didn't have to
explicitly annotate [m]any safe far calls anyway,

Reviewed-by: David Woodhouse 

Thanks for doing this.

smime.p7s
Description: S/MIME cryptographic signature

Re: [PATCH] drm/bridge/synopsys: dsi: use adjusted_mode in mode_set

2018-01-26 Thread Philippe CORNU

Hi Brian,
And a big thanks for your Tested-by

On 01/25/2018 11:47 PM, Brian Norris wrote:
> On Thu, Jan 25, 2018 at 7:55 AM, Philippe Cornu  wrote:
>> The "adjusted_mode" clock value (ie the real pixel clock) is more
>> accurate than "mode" clock value (ie the panel/bridge requested
>> clock value). It offers a better preciseness for timing
>> computations and allows to reduce the extra dsi bandwidth in
>> burst mode (from ~20% to ~10-12%, hw platform dependant).
>>
>> Signed-off-by: Philippe Cornu 
>> ---
>> Note: This patch replaces "drm/bridge/synopsys: dsi: add optional pixel 
>> clock"
> 
> These two appear to be the same for my cases, but at least nothing breaks:
> 

In drivers/gpu/drm/rockchip/rockchip_drm_vop.c function 
vop_crtc_mode_fixup(), the adjusted_mode->clock (ie. vop px clk output = 
dw dsi px clk input) is updated according to rockchip hw pll/dividers...

So you "may" have a different value in adjusted_mode->clock compare to 
mode->clock. Maybe there is no difference for the panel you are using 
because its px clock matches perfectly with rockchip hw pll/dividers... 
or has been set to match with ;-)

I did a similar patch (see [1]) and it works "fine" on stm, the only 
difference with the rockchip vop is that clk_round_rate() returns odd 
values on stm so I used set/get_rate instead.

So now, both rockchip & stm crtc have an "adjusted_mode->clock" so it 
makes sense to use it in dw dsi :)

Philippe :-)

[1] https://patchwork.freedesktop.org/patch/200720/
"[PATCH] drm/stm: ltdc: use crtc_mode_fixup to update adjusted_mode clock"

> Tested-by: Brian Norris 
>

Re: [PATCH 03/24] x86/paravirt: Annotate indirect calls

2018-01-26 Thread David Woodhouse

On Thu, 2018-01-25 at 12:35 +0100, Peter Zijlstra wrote:
> On Thu, Jan 25, 2018 at 10:52:53AM +, David Woodhouse wrote:
> > 
> > OK, my brain hurts a bit but I'm happy now. Thank you.
> OK, I've updated the Changelog thusly. Is this satisfactory?
> 
> ---
> Subject: x86/paravirt: Annotate indirect calls
> From: Peter Zijlstra 
> Date: Wed Jan 17 16:58:11 CET 2018
> 
> Paravirt emits indirect calls which get flagged by objtool retpoline
> checks, annotate it away because all these indirect calls will be
> patched out before we start userspace.
> 
> This patching happens through alternative_instructions() ->
> apply_paravirt() -> pv_init_ops.patch() which will eventually end up
> in paravirt_patch_default(). This function _will_ write direct
> alternatives.
> 
> Signed-off-by: Peter Zijlstra (Intel) 

Reviewed-by: David Woodhouse 

I love you, Peter.

smime.p7s
Description: S/MIME cryptographic signature

Re: [PATCH 04/16] arm64: capabilities: Prepare for fine grained capabilities

2018-01-26 Thread Dave Martin

On Thu, Jan 25, 2018 at 05:56:02PM +, Suzuki K Poulose wrote:
> On 25/01/18 17:33, Dave Martin wrote:
> >On Tue, Jan 23, 2018 at 12:27:57PM +, Suzuki K Poulose wrote:
> >>We use arm64_cpu_capabilities to represent CPU ELF HWCAPs exposed
> >>to the userspace and the CPU hwcaps used by the kernel, which
> >>include cpu features and CPU errata work arounds.
> >>
> >>At the moment we have the following restricions:
> >>
> >>  a) CPU feature hwcaps (arm64_features) and ELF HWCAPs (arm64_elf_hwcap)
> >>- Detected mostly on system wide CPU feature register. But
> >>  there are some which really uses a local CPU's value to
> >>  decide the availability (e.g, availability of hardware
> >>  prefetch). So, we run the check only once, after all the
> >>  boot-time active CPUs are turned on.
> >
> >[ARM64_HAS_NO_HW_PREFETCH is kinda broken, but we also get away with it
> >presumably because the only systems to which it applies are homogeneous,
> >and anyway it's only an optimisation IIUC.
> >
> >This could be a separate category, but as a one-off that may be a bit
> >pointless.
> I understand and was planning to fix this back when it was introduced.
> But then it was pointless at that time, given that it was always
> guaranteed to be a homogeneous system. We do something about it in
> Patch 9.

This was just on observation than something that needs to be fixed,
but it it's been cleaned up then so much the better :)

I'll take a look.

> >.def_scope == SCOPE_SYSTEM appears anomalous there, but it's also
> >unused in that case.]
> >
> >>- Any late CPU which doesn't posses all the established features
> >>  is killed.
> >
> >Does "established feature" above ...
> >
> >>- Any late CPU which possess a feature *not* already available
> >>  is allowed to boot.
> >
> >mean the same as "feature already available" here?
> 
> Yes, its the same. I should have been more consistent.
> 
> >
> >>
> >>  b) CPU Errata work arounds (arm64_errata)
> >>- Detected mostly based on a local CPU's feature register.
> >>  The checks are run on each boot time activated CPUs.
> >>- Any late CPU which doesn't have any of the established errata
> >>  work around capabilities is ignored and is allowed to boot.
> >>- Any late CPU which has an errata work around not already available
> >>  is killed.
> >>
> >>However there are some exceptions to the cases above.
> >>
> >>1) KPTI is a feature that we need to enable when at least one CPU needs it.
> >>And any late CPU that has this "feature" should be killed.
> >
> >Should that be "If KPTI is not enabled during system boot, then any late
> >CPU that has this "feature" should be killed."
> 
> Yes.
> 
> >
> >>2) Hardware DBM feature is a non-conflicting capability which could be
> >>enabled on CPUs which has it without any issues, even if the CPU is
> >
> >have
> >
> 
> >>brought up late.
> >>
> >>So this calls for a lot more fine grained behavior for each capability.
> >>And if we define all the attributes to control their behavior properly,
> >>we may be able to use a single table for the CPU hwcaps (not the
> >>ELF HWCAPs, which cover errata and features). This is a prepartory step
> >>to get there. We define type for a capability, which for now encodes the
> >>scope of the check. i.e, whether it should be checked system wide or on
> >>each local CPU. We define two types :
> >>
> >>   1) ARM64_CPUCAP_BOOT_SYSTEM_FEATURE - Implies (a) as described above.
> >>   1) ARM64_CPUCAP_STRICT_CPU_LOCAL_ERRATUM - Implies (b) as described 
> >> above.
> >
> >2)

Meaning you've got 1) twice above (in case you didn't spot it).

> >
> >>As such there is no change in how the capabilities are treated.
> >
> >OK, I think I finally have my head around this, more or less.
> >
> >Mechanism (operations on architectural feature regs) and policy (kernel
> >runtime configuration) seem to be rather mixed together.  This works
> >fairly naturally for things like deriving the sanitised feature regs
> >seen by userspace and determining the ELF hwcaps; but not so naturally
> >for errata workarounds and other anomalous things like
> >ARM64_HAS_NO_HW_PREFETCH.
> 
> Right. We are stuck with "cpu_hwcaps" for both erratum and features,
> based on which we make some decisions to change the kernel behavior,
> as it is tied to alternative patching.
> 
> >
> >I'm not sure that there is a better approach though -- anyway, that
> >would be out of scope for this series.
> >
> >>Signed-off-by: Suzuki K Poulose 
> >>---
> >>  arch/arm64/include/asm/cpufeature.h | 24 +--
> >>  arch/arm64/kernel/cpu_errata.c  |  8 
> >>  arch/arm64/kernel/cpufeature.c  | 38 
> >> ++---
> >>  3 files changed, 41 insertions(+), 29 deletions(-)
> >>
> >>diff --git a/arch/arm64/include/asm/cpufeature.h 
> >>b/arch/arm64/include/asm/cpufeature.h
> >>index a23c0d4f27e9..4fd5de8ef33e 100644
> >>--- a/arch/arm64/include/asm/cpu

Re: [patch -mm 3/4] mm, memcg: replace memory.oom_group with policy tunable

2018-01-26 Thread Michal Hocko

On Thu 25-01-18 15:27:29, David Rientjes wrote:
> On Thu, 25 Jan 2018, Michal Hocko wrote:
> 
> > > As a result, this would remove patch 3/4 from the series.  Do you have 
> > > any 
> > > other feedback regarding the remainder of this patch series before I 
> > > rebase it?
> > 
> > Yes, and I have provided it already. What you are proposing is
> > incomplete at best and needs much better consideration and much more
> > time to settle.
> > 
> 
> Could you elaborate on why specifying the oom policy for the entire 
> hierarchy as part of the root mem cgroup and also for individual subtrees 
> is incomplete?  It allows admins to specify and delegate policy decisions 
> to subtrees owners as appropriate.  It addresses your concern in the 
> /admins and /students example.  It addresses my concern about evading the 
> selection criteria simply by creating child cgroups.  It appears to be a 
> win-win.  What is incomplete or are you concerned about?

I will get back to this later. I am really busy these days. This is not
a trivial thing at all.

> > > I will address the unfair root mem cgroup vs leaf mem cgroup comparison 
> > > in 
> > > a separate patchset to fix an issue where any user of oom_score_adj on a 
> > > system that is not fully containerized gets very unusual, unexpected, and 
> > > undocumented results.
> > 
> > I will not oppose but as it has been mentioned several times, this is by
> > no means a blocker issue. It can be added on top.
> > 
> 
> The current implementation is only useful for fully containerized systems 
> where no processes are attached to the root mem cgroup.  Anything in the 
> root mem cgroup is judged by different criteria and if they use 
> /proc/pid/oom_score_adj the entire heuristic breaks down.

Most usecases I've ever seen usually use oom_score_adj only to disable
the oom killer for a particular service. In those case the current
heuristic works reasonably well.

I am not _aware_ of any usecase which actively uses oom_score_adj to
actively control the oom selection decisions and it would _require_ the
memcg aware oom killer. Maybe there are some but then we need to do much
more than to "fix" the root memcg comparison. We would need a complete
memcg aware prioritization as well. It simply doesn't make much sense
to tune oom selection only on subset of tasks ignoring the rest of the
system workload which is likely to comprise the majority of the resource
consumers.

We have already discussed that something like that will emerge sooner or
later but I am not convinced we need it _now_. It is perfectly natural
to start with a simple model without any priorities at all.

> That's because per-process usage and oom_score_adj are only relevant  
> for the root mem cgroup and irrelevant when attached to a leaf.   

This is the simplest implementation. You could go and ignore
oom_score_adj on root tasks. Would it be much better? Should you ignore
oom disabled tasks? Should you consider kernel memory footprint of those
tasks? Maybe we will realize that we simply have to account root memcg
like any other memcg.  We used to do that but it has been reverted due
to performance footprint. There are more questions to answer I believe
but the most important one is whether actually any _real_ user cares.

I can see your arguments and they are true. You can construct setups
where the current memcg oom heuristic works sub-optimally. The same has
been the case for the OOM killer in general. The OOM killer has always
been just a heuristic and there always be somebody complaining. This
doesn't mean we should just remove it because it works reasonably well
for most users.

> Because of that, users are 
> affected by the design decision and will organize their hierarchies as 
> approrpiate to avoid it.  Users who only want to use cgroups for a subset 
> of processes but still treat those processes as indivisible logical units 
> when attached to cgroups find that it is simply not possible.

Nobody enforces the memcg oom selection as presented here for those
users. They have to explicitly _opt-in_. If the new heuristic doesn't
work for them we will hear about that most likely. I am really skeptical
that oom_score_adj can be reused for memcg aware oom selection.

> I'm focused solely on fixing the three main issues that this 
> implementation causes.  One of them, userspace influence to protect 
> important cgroups, can be added on top.  The other two, evading the 
> selection criteria and unfair comparison of root vs leaf, are shortcomings 
> in the design that I believe should be addressed before it's merged to 
> avoid changing the API later.

I believe I have explained why the root memcg comparison is an
implementation detail. The subtree delegation is something that we will
have to care eventually. But I do not see it as an immediate thread.
Same as I do not see the current OOM killer flawed because there are
ways to evade from it. Moreover the delegation is much less of a problem
because cr

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2018-01-26 Thread Michal Hocko

On Fri 26-01-18 12:12:00, Tetsuo Handa wrote:
> Would you answer to Michal's questions
> 
>   Is this a permanent state or does the holder eventually releases the lock?
> 
>   Do you remember the last good kernel?
> 
> and my guess
> 
>   Since commit 0bcac06f27d75285 was not backported to 4.14-stable kernel,
>   this is unlikely the bug introduced by 0bcac06f27d75285 unless Eric
>   explicitly backported 0bcac06f27d75285.

Can we do that in the original email thread please. Conflating these two
things while we have no idea about the culprit is just mess.

-- 
Michal Hocko
SUSE Labs

Re: [PATCH 05/16] arm64: Add flags to check the safety of a capability for late CPU

2018-01-26 Thread Dave Martin

On Tue, Jan 23, 2018 at 12:27:58PM +, Suzuki K Poulose wrote:
> Add two different flags to indicate if the conflict of a capability
> on a late CPU with the current system state
> 
>  1) Can a CPU have a capability when the system doesn't have it ?
> 
> Most arm64 features could have this set. While erratum work arounds
> cannot have this, as we may miss work arounds.
> 
>  2) Can a CPU miss a capability when the system has it ?
> This could be set for arm64 erratum work arounds as we don't
> care if a CPU doesn't need the work around. However it should
> be clear for features.
> 
> These flags could be added to certain entries based on their nature.
> 
> Signed-off-by: Suzuki K Poulose 
> ---
>  arch/arm64/include/asm/cpufeature.h | 35 +++
>  1 file changed, 31 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/cpufeature.h 
> b/arch/arm64/include/asm/cpufeature.h
> index 4fd5de8ef33e..27d037bb0451 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -94,10 +94,25 @@ extern struct arm64_ftr_reg arm64_ftr_reg_ctrel0;
>  #define SCOPE_SYSTEM ARM64_CPUCAP_SCOPE_SYSTEM
>  #define SCOPE_LOCAL_CPU  
> ARM64_CPUCAP_SCOPE_LOCAL_CPU
>  
> -/* CPU errata detected at boot time based on feature of one or more CPUs */
> -#define ARM64_CPUCAP_STRICT_CPU_LOCAL_ERRATUM
> (ARM64_CPUCAP_SCOPE_LOCAL_CPU)
> -/* CPU feature detected at boot time based on system-wide value of a feature 
> */
> -#define ARM64_CPUCAP_BOOT_SYSTEM_FEATURE (ARM64_CPUCAP_SCOPE_SYSTEM)
> +/* Is it safe for a late CPU to have this capability when system doesn't 
> already have */
> +#define ARM64_CPUCAP_LATE_CPU_SAFE_TO_HAVE   BIT(2)
> +/* Is it safe for a late CPU to miss this capability when system has it */
> +#define ARM64_CPUCAP_LATE_CPU_SAFE_TO_MISS   BIT(3)

Maybe _OPTIONAL and _PERMITTED would be a bit less verbose?

Alternatively,
ARM64_CPUCAP_PERMITTED_FOR_LATE_CPU
ARM64_CPUCAP_OPTIONAL_FOR_LATE_CPU

might be easier to understand.

[...]

Cheers
---Dave

Re: [PATCH v2 1/2] Input: edt-ft5x06 - Add support for regulator

2018-01-26 Thread Mylene Josserand

Hello Dmitry,

Le Tue, 23 Jan 2018 09:58:29 -0800,
Dmitry Torokhov  a écrit :

> On Tue, Jan 23, 2018 at 10:10:55AM +0100, Mylene Josserand wrote:
> > Hello Dimitry,
> > 
> > Thank you for the review!
> > 
> > Le Mon, 22 Jan 2018 09:42:08 -0800,
> > Dmitry Torokhov  a écrit :
> >   
> > > Hi Mylène,
> > > 
> > > On Thu, Dec 28, 2017 at 8:33 AM, Mylène Josserand
> > >  wrote:  
> > > > Add the support of regulator to use it as VCC source.
> > > >
> > > > Signed-off-by: Mylène Josserand 
> > > > ---
> > > >  .../bindings/input/touchscreen/edt-ft5x06.txt  |  1 +
> > > >  drivers/input/touchscreen/edt-ft5x06.c | 33 
> > > > ++
> > > >  2 files changed, 34 insertions(+)
> > > >
> > > > diff --git 
> > > > a/Documentation/devicetree/bindings/input/touchscreen/edt-ft5x06.txt 
> > > > b/Documentation/devicetree/bindings/input/touchscreen/edt-ft5x06.txt
> > > > index 025cf8c9324a..48e975b9c1aa 100644
> > > > --- a/Documentation/devicetree/bindings/input/touchscreen/edt-ft5x06.txt
> > > > +++ b/Documentation/devicetree/bindings/input/touchscreen/edt-ft5x06.txt
> > > > @@ -30,6 +30,7 @@ Required properties:
> > > >  Optional properties:
> > > >   - reset-gpios: GPIO specification for the RESET input
> > > >   - wake-gpios:  GPIO specification for the WAKE input
> > > > + - vcc-supply:  Regulator that supplies the touchscreen
> > > >
> > > >   - pinctrl-names: should be "default"
> > > >   - pinctrl-0:   a phandle pointing to the pin settings for the
> > > > diff --git a/drivers/input/touchscreen/edt-ft5x06.c 
> > > > b/drivers/input/touchscreen/edt-ft5x06.c
> > > > index c53a3d7239e7..5ee14a25a382 100644
> > > > --- a/drivers/input/touchscreen/edt-ft5x06.c
> > > > +++ b/drivers/input/touchscreen/edt-ft5x06.c
> > > > @@ -39,6 +39,7 @@
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > > +#include 
> > > >
> > > >  #define WORK_REGISTER_THRESHOLD0x00
> > > >  #define WORK_REGISTER_REPORT_RATE  0x08
> > > > @@ -91,6 +92,7 @@ struct edt_ft5x06_ts_data {
> > > > struct touchscreen_properties prop;
> > > > u16 num_x;
> > > > u16 num_y;
> > > > +   struct regulator *vcc;
> > > >
> > > > struct gpio_desc *reset_gpio;
> > > > struct gpio_desc *wake_gpio;
> > > > @@ -993,6 +995,23 @@ static int edt_ft5x06_ts_probe(struct i2c_client 
> > > > *client,
> > > >
> > > > tsdata->max_support_points = chip_data->max_support_points;
> > > >
> > > > +   tsdata->vcc = devm_regulator_get(&client->dev, "vcc");
> > > > +   if (IS_ERR(tsdata->vcc)) {
> > > > +   error = PTR_ERR(tsdata->vcc);
> > > > +   dev_err(&client->dev, "failed to request regulator: 
> > > > %d\n",
> > > > +   error);
> > > > +   return error;
> > > > +   };
> > > 
> > > As 0-day pounted out, this semicolon is not needed.  
> > 
> > Yes, thanks, I will fix that in next version.
> >   
> > >   
> > > > +
> > > > +   if (tsdata->vcc) {
> > > 
> > > You do not need to check for non-NULL here, devm_regulator_get() wil
> > > lnever give you a NULL. If regulator is not defined in DT/board
> > > mappings, then dummy regulator will be provided. You can call
> > > regulator_enable() and regulator_disable() and other regulator APIs
> > > with dummy regulator.  
> > 
> > Okay, thanks for the explanation, I will remove that.
> >   
> > >   
> > > > +   error = regulator_enable(tsdata->vcc);
> > > > +   if (error < 0) {
> > > > +   dev_err(&client->dev, "failed to enable vcc: 
> > > > %d\n",
> > > > +   error);
> > > > +   return error;
> > > > +   }
> > > > +   }
> > > > +
> > > > tsdata->reset_gpio = devm_gpiod_get_optional(&client->dev,
> > > >  "reset", 
> > > > GPIOD_OUT_HIGH);
> > > > if (IS_ERR(tsdata->reset_gpio)) {
> > > > @@ -1122,20 +1141,34 @@ static int edt_ft5x06_ts_remove(struct 
> > > > i2c_client *client)
> > > >  static int __maybe_unused edt_ft5x06_ts_suspend(struct device *dev)
> > > >  {
> > > > struct i2c_client *client = to_i2c_client(dev);
> > > > +   struct edt_ft5x06_ts_data *tsdata = i2c_get_clientdata(client);
> > > >
> > > > if (device_may_wakeup(dev))
> > > > enable_irq_wake(client->irq);
> > > >
> > > > +   if (tsdata->vcc)
> > > 
> > > Same here.  
> > 
> > yep
> >   
> > >   
> > > > +   regulator_disable(tsdata->vcc);
> > > > +
> > > > return 0;
> > > >  }
> > > >
> > > >  static int __maybe_unused edt_ft5x06_ts_resume(struct device *dev)
> > > >  {
> > > > struct i2c_client *client = to_i2c_client(dev);
> > > > +   struct edt_ft5x06_ts_data *tsdata = i2c_get_clientdata(client);
> > > > +   int ret;
> > > >
> > > > if (device_may_wakeup(dev))
> > > > disable_irq_wake(client->irq

Re: [PATCH 04/24] x86,nospec: Annotate indirect calls/jumps

2018-01-26 Thread David Woodhouse

On Tue, 2018-01-23 at 16:25 +0100, Peter Zijlstra wrote:
> Annotate the indirect calls/jumps in the CALL_NOSPEC/JUMP_NOSPEC
> alternatives.
> 
> 
> Signed-off-by: Peter Zijlstra (Intel) 

Reviewed-by: David Woodhouse 

However...


>  /*
> + * This should be used immediately before an indirect jump/call. It tells
> + * objtool the subsequent indirect jump/call is vouched safe for retpoline
> + * builds.
> + */
> +.macro ANNOTATE_RETPOLINE_SAFE
> + .Lannotate_\@:
> + .pushsection .discard.retpoline_safe
> + _ASM_PTR .Lannotate_\@
> + .popsection
> +.endm

Didn't I just see one of those in patch 3? So this makes two...



> @@ -143,6 +155,12 @@
>   ".long 999b - .\n\t"\
>   ".popsection\n\t"
>  
> +#define ANNOTATE_RETPOLINE_SAFE  \
> + "999:\n\t"  \
> + ".pushsection .discard.retpoline_safe\n\t"  \
> + _ASM_PTR " 999b\n\t"\
> + ".popsection\n\t"
> +
>  #if defined(CONFIG_X86_64) && defined(RETPOLINE)

... three.

Now, I did briefly toy with the idea of using a .macro from both
__ASSEMBLY__ and inline asm, making the latter work by means of 
asm(".include \"asm/nospec-branch.h\");

In the end I just ended up with the __FILL_RETURN_BUFFER CPP macro
which is used from both by other tricks.

Can we look at doing something like that, please?



smime.p7s
Description: S/MIME cryptographic signature

[PATCH] block: DAC960: Replace GFP_ATOMIC with GFP_KERNEL in DAC960_DetectController

2018-01-26 Thread Jia-Ju Bai

After checking all possible call chains to DAC960_DetectController(), 
my tool finds that this function is never called in atomic context, 
namely never in an interrupt handler or holding a spinlock.
And DAC960_DetectController() calls pci_enable_device that can sleep, 
so it indicates that DAC960_DetectController() can call functions 
which may sleep.
Thus GFP_ATOMIC is not necessary, and it can be replaced with GFP_KERNEL.

This is found by a static analysis tool named DCNS written by myself.

Signed-off-by: Jia-Ju Bai 
---
 drivers/block/DAC960.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/DAC960.c b/drivers/block/DAC960.c
index 442e777..25ee9af 100644
--- a/drivers/block/DAC960.c
+++ b/drivers/block/DAC960.c
@@ -2725,7 +2725,7 @@ static void DAC960_DetectCleanup(DAC960_Controller_T 
*Controller)
   void __iomem *BaseAddress;
   int i;
 
-  Controller = kzalloc(sizeof(DAC960_Controller_T), GFP_ATOMIC);
+  Controller = kzalloc(sizeof(DAC960_Controller_T), GFP_KERNEL);
   if (Controller == NULL) {
DAC960_Error("Unable to allocate Controller structure for "
"Controller at\n", NULL);
-- 
1.7.9.5

[PATCH 2/2] block: DAC960: Replace GFP_ATOMIC with GFP_KERNEL in DAC960_CreateAuxiliaryStructures

2018-01-26 Thread Jia-Ju Bai

After checking all possible call chains to 
DAC960_CreateAuxiliaryStructures(),
my tool finds that this function is never called in atomic context,
namely never in an interrupt handler or holding a spinlock.
And DAC960_CreateAuxiliaryStructures() calls 
pci_pool_create() and pci_pool_destroy() that can sleep,
so it indicates that DAC960_CreateAuxiliaryStructures() can call 
functions which may sleep.
Thus GFP_ATOMIC is not necessary, and it can be replaced with GFP_KERNEL.

This is found by a static analysis tool named DCNS written by myself.

Signed-off-by: Jia-Ju Bai 
---
 drivers/block/DAC960.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/block/DAC960.c b/drivers/block/DAC960.c
index 442e777..39fc016 100644
--- a/drivers/block/DAC960.c
+++ b/drivers/block/DAC960.c
@@ -323,7 +323,7 @@ static bool 
DAC960_CreateAuxiliaryStructures(DAC960_Controller_T *Controller)
CommandsRemaining = CommandAllocationGroupSize;
  CommandGroupByteCount =
CommandsRemaining * CommandAllocationLength;
- AllocationPointer = kzalloc(CommandGroupByteCount, GFP_ATOMIC);
+ AllocationPointer = kzalloc(CommandGroupByteCount, GFP_KERNEL);
  if (AllocationPointer == NULL)
return DAC960_Failure(Controller,
"AUXILIARY STRUCTURE CREATION");
@@ -335,13 +335,13 @@ static bool 
DAC960_CreateAuxiliaryStructures(DAC960_Controller_T *Controller)
   Command->Next = Controller->FreeCommands;
   Controller->FreeCommands = Command;
   Controller->Commands[CommandIdentifier-1] = Command;
-  ScatterGatherCPU = pci_pool_alloc(ScatterGatherPool, GFP_ATOMIC,
+  ScatterGatherCPU = pci_pool_alloc(ScatterGatherPool, GFP_KERNEL,
&ScatterGatherDMA);
   if (ScatterGatherCPU == NULL)
  return DAC960_Failure(Controller, "AUXILIARY STRUCTURE CREATION");
 
   if (RequestSensePool != NULL) {
- RequestSenseCPU = pci_pool_alloc(RequestSensePool, GFP_ATOMIC,
+ RequestSenseCPU = pci_pool_alloc(RequestSensePool, GFP_KERNEL,
&RequestSenseDMA);
  if (RequestSenseCPU == NULL) {
 pci_pool_free(ScatterGatherPool, ScatterGatherCPU,
-- 
1.7.9.5

[PATCH v9 0/2] add tracepoints for nvme command submission and completion

2018-01-26 Thread Johannes Thumshirn

Add tracepoints for nvme command submission and completion. The tracepoints
are modeled after SCSI's trace_scsi_dispatch_cmd_start() and
trace_scsi_dispatch_cmd_done() tracepoints and fulfil a similar purpose,
namely a fast way to check which command is going to be queued into the HW or
Fabric driver and which command is completed again.

Here's an example output using the qemu emulated pci nvme:
# tracer: nop
#
#  _-=> irqs-off
# / _=> need-resched
#| / _---=> hardirq/softirq
#|| / _--=> preempt-depth
#||| / delay
#   TASK-PID   CPU#  TIMESTAMP  FUNCTION
#  | |   |      | |
kworker/u8:0-5 [003]  9.158541: nvme_setup_admin_cmd:  
cmdid=14, flags=0x0, meta=0x0, cmd=(nvme_admin_create_cq cqid=1, qsize=1023, 
cq_flags=0x3, irq_vector=0)
  -0 [003] d.h. 9.158705: nvme_complete_rq: cmdid=14, 
qid=0, res=0, retries=0, flags=0x0, status=0
kworker/u8:0-5 [003]  9.158712: nvme_setup_admin_cmd:  
cmdid=14, flags=0x0, meta=0x0, cmd=(nvme_admin_create_sq sqid=1, qsize=1023, 
sq_flags=0x1, cqid=1)
  -0 [003] d.h. 9.159214: nvme_complete_rq: cmdid=14, 
qid=0, res=0, retries=0, flags=0x0, status=0
kworker/u8:0-5 [003]  9.159236: nvme_setup_admin_cmd:  
cmdid=14, flags=0x0, meta=0x0, cmd=(nvme_admin_create_cq cqid=2, qsize=1023, 
cq_flags=0x3, irq_vector=1)
  -0 [003] d.h. 9.159288: nvme_complete_rq: cmdid=14, 
qid=0, res=0, retries=0, flags=0x0, status=0
kworker/u8:0-5 [003]  9.159293: nvme_setup_admin_cmd:  
cmdid=14, flags=0x0, meta=0x0, cmd=(nvme_admin_create_sq sqid=2, qsize=1023, 
sq_flags=0x1, cqid=2)
  -0 [003] d.h. 9.159479: nvme_complete_rq: cmdid=14, 
qid=0, res=0, retries=0, flags=0x0, status=0
kworker/u8:0-5 [003]  9.159525: nvme_setup_admin_cmd:  
cmdid=14, flags=0x0, meta=0x0, cmd=(nvme_admin_create_cq cqid=3, qsize=1023, 
cq_flags=0x3, irq_vector=2)
  -0 [003] d.h. 9.159565: nvme_complete_rq: cmdid=14, 
qid=0, res=0, retries=0, flags=0x0, status=0
kworker/u8:0-5 [003]  9.159569: nvme_setup_admin_cmd:  
cmdid=14, flags=0x0, meta=0x0, cmd=(nvme_admin_create_sq sqid=3, qsize=1023, 
sq_flags=0x1, cqid=3)
  -0 [003] d.h. 9.159726: nvme_complete_rq: cmdid=14, 
qid=0, res=0, retries=0, flags=0x0, status=0
kworker/u8:0-5 [003]  9.159769: nvme_setup_admin_cmd:  
cmdid=14, flags=0x0, meta=0x0, cmd=(nvme_admin_create_cq cqid=4, qsize=1023, 
cq_flags=0x3, irq_vector=3)
  -0 [003] d.h. 9.159795: nvme_complete_rq: cmdid=14, 
qid=0, res=0, retries=0, flags=0x0, status=0
kworker/u8:0-5 [003]  9.159799: nvme_setup_admin_cmd:  
cmdid=14, flags=0x0, meta=0x0, cmd=(nvme_admin_create_sq sqid=4, qsize=1023, 
sq_flags=0x1, cqid=4)
  -0 [003] d.h. 9.159957: nvme_complete_rq: cmdid=14, 
qid=0, res=0, retries=0, flags=0x0, status=0
kworker/u8:0-5 [003]  9.160971: nvme_setup_admin_cmd:  
cmdid=14, flags=0x0, meta=0x0, cmd=(nvme_admin_identify cns=0, ctrlid=1)
  -0 [003] d.h. 9.161059: nvme_complete_rq: cmdid=14, 
qid=0, res=0, retries=0, flags=0x0, status=0
kworker/u8:0-5 [003]  9.161101: nvme_setup_admin_cmd:  
cmdid=14, flags=0x0, meta=0x0, cmd=(nvme_admin_identify cns=0, ctrlid=0)
  -0 [003] d.h. 9.161160: nvme_complete_rq: cmdid=14, 
qid=0, res=0, retries=0, flags=0x0, status=0
kworker/u8:0-5 [003]  9.161305: nvme_setup_admin_cmd:  
cmdid=14, flags=0x0, meta=0x0, cmd=(nvme_admin_identify cns=0, ctrlid=0)
  -0 [003] d.h. 9.161344: nvme_complete_rq: cmdid=14, 
qid=0, res=0, retries=0, flags=0x0, status=0
kworker/u8:0-5 [003]  9.161390: nvme_setup_nvm_cmd: qid=1, 
nsid=1, cmdid=718, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=0, len=7, 
ctrl=0x0, dsmgmt=0, reftag=0)
  -0 [003] d.h. 9.161578: nvme_complete_rq: cmdid=718, 
qid=1, res=0, retries=0, flags=0x0, status=0
kworker/u8:0-5 [003]  9.161608: nvme_setup_nvm_cmd: qid=1, 
nsid=1, cmdid=718, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=24, len=7, 
ctrl=0x0, dsmgmt=0, reftag=0)
  -0 [003] d.h. 9.161681: nvme_complete_rq: cmdid=718, 
qid=1, res=0, retries=0, flags=0x0, status=0
  dd-205   [001]  9.662909: nvme_setup_nvm_cmd: qid=1, 
nsid=1, cmdid=1011, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=0, len=2559, 
ctrl=0x0, dsmgmt=0, reftag=0)
  dd-205   [001]  9.662967: nvme_setup_nvm_cmd: qid=1, 
nsid=1, cmdid=1012, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=2560, 
len=1535, ctrl=0x0, dsmgmt=0, reftag=0)
  -0 [001] d.h. 9.664413: nvme_complete_rq: cmdid=1011, 
qid=1, res=0, retries=0, flags=0x0, status=0
  -0 [

[PATCH v9 2/2] nvme: add tracepoint for nvme_complete_rq

2018-01-26 Thread Johannes Thumshirn

Add a tracepoint in nvme_complete_rq() for completions of NVMe commands. An
expmale output of the trace-point is as follows:

-0 [001] d.h. 3.505266: nvme_complete_rq: cmdid=989, qid=1, 
res=0, retries=0, flags=0x0, status=0

Signed-off-by: Johannes Thumshirn 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Martin K. Petersen 
Reviewed-by: Keith Busch 
Reviewed-by: Sagi Grimberg 

---
Changes to v6:
* Rebase onto nvme-4.16

Changes to v4:
* Print QID for completions (Christoph)

Changes to v2:
* Pass the whole struct request to the tracepoint
* Removed spaces after parenthesis (Christoph)
---
 drivers/nvme/host/core.c  |  2 ++
 drivers/nvme/host/trace.h | 25 +
 2 files changed, 27 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 358dfdc1f6da..7cbd4a260d30 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -220,6 +220,8 @@ void nvme_complete_rq(struct request *req)
 {
blk_status_t status = nvme_error_status(req);
 
+   trace_nvme_complete_rq(req);
+
if (unlikely(status != BLK_STS_OK && nvme_req_needs_retry(req))) {
if (nvme_req_needs_failover(req, status)) {
nvme_failover_req(req);
diff --git a/drivers/nvme/host/trace.h b/drivers/nvme/host/trace.h
index feadf0b57d17..ea91fccd1bc0 100644
--- a/drivers/nvme/host/trace.h
+++ b/drivers/nvme/host/trace.h
@@ -129,6 +129,31 @@ TRACE_EVENT(nvme_setup_nvm_cmd,
  __parse_nvme_cmd(__entry->opcode, __entry->cdw10))
 );
 
+TRACE_EVENT(nvme_complete_rq,
+   TP_PROTO(struct request *req),
+   TP_ARGS(req),
+   TP_STRUCT__entry(
+   __field(int, qid)
+   __field(int, cid)
+   __field(u64, result)
+   __field(u8, retries)
+   __field(u8, flags)
+   __field(u16, status)
+   ),
+   TP_fast_assign(
+   __entry->qid = req->q->id;
+   __entry->cid = req->tag;
+   __entry->result = le64_to_cpu(nvme_req(req)->result.u64);
+   __entry->retries = nvme_req(req)->retries;
+   __entry->flags = nvme_req(req)->flags;
+   __entry->status = nvme_req(req)->status;
+   ),
+   TP_printk("cmdid=%u, qid=%d, res=%llu, retries=%u, flags=0x%x, 
status=%u",
+ __entry->cid, __entry->qid, __entry->result,
+ __entry->retries, __entry->flags, __entry->status)
+
+);
+
 #endif /* _TRACE_NVME_H */
 
 #undef TRACE_INCLUDE_PATH
-- 
2.13.6

[PATCH v9 1/2] nvme: add tracepoint for nvme_setup_cmd

2018-01-26 Thread Johannes Thumshirn

Add tracepoints for nvme_setup_cmd() for tracing admin and/or nvm commands.

Examples of the two tracepoints are as follows for trace_nvme_setup_admin_cmd():
kworker/u8:0-5 [003]  2.998792: nvme_setup_admin_cmd: cmdid=14, 
flags=0x0, meta=0x0, cmd=(nvme_admin_create_cq cqid=1, qsize=1023, 
cq_flags=0x3, irq_vector=0)

and trace_nvme_setup_nvm_cmd():
dd-205   [001]  3.503929: nvme_setup_nvm_cmd: qid=1, nsid=1, cmdid=989, 
flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=4096, len=2047, ctrl=0x0, 
dsmgmt=0, reftag=0)

Signed-off-by: Johannes Thumshirn 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Martin K. Petersen 
Reviewed-by: Keith Busch 
Reviewed-by: Sagi Grimberg 
---
Changes to v8:
* Really fix sparse issues

Changes to v7:
* Fix endianess issues detected by kbuild robot/sparse

Changes to v5:
* Print QID for nvm commands (Christoph/Martin)

Changes to v4:
* Split trace function into two for admin and nvm cmds (Christoph)
* Remove structures for commands and decode as needed (Christoph)
* Add proper Changelog (Christoph)
* Don't decode NS ID for admin commands

Changes to v3:
* Only build trace.o when CONFIG_TRACE=y (Christoph)
* Only copy non-common command fields to trace decoder (Christoph)
* Merge write_zeros decoder into rw decoder
* Don't decode admin commands as I/O commands

Changes to v2:
* Don't cast le64_to_cpu() conversions to unsigned long long (Christoph)
* Add proper copyright header (Christoph)
* Move trace decoding into own file (Christoph)
* Include the src directory in the Makefile for trace (Christoph)
* Removed spaces before and after parenthesis (Christoph)
* Reduced print lines to fit the 80 char limit (Christoph)

Changes to v1:
* Fix typo (Hannes)
* move include/trace/events/nvme.h -> drivers/nvme/host/trace.h (Christoph)
---
 drivers/nvme/host/Makefile |   4 ++
 drivers/nvme/host/core.c   |   7 +++
 drivers/nvme/host/trace.c  | 130 +
 drivers/nvme/host/trace.h  | 140 +
 4 files changed, 281 insertions(+)
 create mode 100644 drivers/nvme/host/trace.c
 create mode 100644 drivers/nvme/host/trace.h

diff --git a/drivers/nvme/host/Makefile b/drivers/nvme/host/Makefile
index a25fd43650ad..441e67e3a9d7 100644
--- a/drivers/nvme/host/Makefile
+++ b/drivers/nvme/host/Makefile
@@ -1,4 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
+
+ccflags-y  += -I$(src)
+
 obj-$(CONFIG_NVME_CORE)+= nvme-core.o
 obj-$(CONFIG_BLK_DEV_NVME) += nvme.o
 obj-$(CONFIG_NVME_FABRICS) += nvme-fabrics.o
@@ -6,6 +9,7 @@ obj-$(CONFIG_NVME_RDMA) += nvme-rdma.o
 obj-$(CONFIG_NVME_FC)  += nvme-fc.o
 
 nvme-core-y:= core.o
+nvme-core-$(CONFIG_TRACING)+= trace.o
 nvme-core-$(CONFIG_NVME_MULTIPATH) += multipath.o
 nvme-core-$(CONFIG_NVM)+= lightnvm.o
 
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index fde6fd2e7eef..358dfdc1f6da 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -29,6 +29,9 @@
 #include 
 #include 
 
+#define CREATE_TRACE_POINTS
+#include "trace.h"
+
 #include "nvme.h"
 #include "fabrics.h"
 
@@ -628,6 +631,10 @@ blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct 
request *req,
}
 
cmd->common.command_id = req->tag;
+   if (ns)
+   trace_nvme_setup_nvm_cmd(req->q->id, cmd);
+   else
+   trace_nvme_setup_admin_cmd(cmd);
return ret;
 }
 EXPORT_SYMBOL_GPL(nvme_setup_cmd);
diff --git a/drivers/nvme/host/trace.c b/drivers/nvme/host/trace.c
new file mode 100644
index ..41944bbef835
--- /dev/null
+++ b/drivers/nvme/host/trace.c
@@ -0,0 +1,130 @@
+/*
+ * NVM Express device driver tracepoints
+ * Copyright (c) 2018 Johannes Thumshirn, SUSE Linux GmbH
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include 
+#include "trace.h"
+
+static const char *nvme_trace_create_sq(struct trace_seq *p, u8 *cdw10)
+{
+   const char *ret = trace_seq_buffer_ptr(p);
+   u16 sqid = get_unaligned_le16(cdw10);
+   u16 qsize = get_unaligned_le16(cdw10 + 2);
+   u16 sq_flags = get_unaligned_le16(cdw10 + 4);
+   u16 cqid = get_unaligned_le16(cdw10 + 6);
+
+
+   trace_seq_printf(p, "sqid=%u, qsize=%u, sq_flags=0x%x, cqid=%u",
+sqid, qsize, sq_flags, cqid);
+   trace_seq_putc(p, 0);
+
+   return ret;
+}
+
+static const char *nvme_trace_create_cq(struct trace_seq *p, u8 *cdw10)
+{
+   cons

Re: [PATCH v5 0/3] livepatch: introduce atomic replace

2018-01-26 Thread Petr Mladek

On Fri 2018-01-19 16:10:42, Jason Baron wrote:
> 
> 
> On 01/19/2018 02:20 PM, Evgenii Shatokhin wrote:
> > On 12.01.2018 22:55, Jason Baron wrote:
> > There is one more thing that might need attention here. In my
> > experiments with this patch series, I saw that unpatch callbacks are not
> > called for the older binary patch (the one being replaced).
> 
> So I think the pre_unpatch() can be called for any prior livepatch
> modules from __klp_enable_patch(). Perhaps in reverse order of loading
> (if there is more than one), and *before* the pre_patch() for the
> livepatch module being loaded. Then, if it sucessfully patches in
> klp_complete_transition() the post_unpatch() can be called for any prior
> livepatch modules as well. I think again it makes sense to call the
> post_unpatch() for prior modules *before* the post_patch() for the
> current livepatch modules.

I played with this when working on v6. And I am not sure if it is
worth it.

The main reason is that we are talking about cumulative patches.
They are supposed to preserve most of the existing changes and
just remove and/or add few changes. The older patches might or
might not expect to be replaced this way.

If we would decide to run callbacks from the replaced patches
then it would make sense to run the one from the new patch
first. It is because we might need to do some hacks to preserve
the already existing changes.

We might need something like this for __klp_enable_patch():

static int klp_run_pre_patch_callbacks(struct klp_patch *patch)
{
struct klp_patch *old_patch;
struct klp_object *old_obj;
int ret;

list_for_each_entry_reverse(old_patch, &klp_patches, list) {
if (!old_patch->enabled && old_patch != patch)
continue;

klp_for_each_object(old_patch, old_obj) {
if (!klp_is_object_loaded())
continue;

if (old_patch == patch) {
/* pre_patch from new patch */
ret = klp_pre_patch_callback(obj);
if (ret)
return ret;
if (!patch->replace)
return;
} else {
/* preunpatch from replaced patches */
klp_pre_unpatch_callback(obj);
}
}
}

return 0;
}

This was quite hairy. Alternative would be:

static void klp_run_pre_unpatch_callbacks_when_replacing(struct klp_patch 
*patch)
{
struct klp_patch *old_patch;
struct klp_object *old_obj;

if (WARN_ON(!patch->replace))
return;

list_for_each_entry_reverse(old_patch, &klp_patches, list) {
if (!old_patch->enabled || old_patch == patch)
continue;

klp_for_each_object(old_patch, old_obj) {
if (!klp_is_object_loaded())
continue;

klp_pre_unpatch_callback(obj);
}
}
}

static int klp_run_pre_patch_callbacks(struct klp_patch *patch)
{
struct klp_object *old_obj;
int ret;

klp_for_each_object(patch, old_obj) {
if (!klp_is_object_loaded())
continue;

ret = klp_pre_patch_callback(obj);
if (ret)
return ret;
}

if (patch->replace)
klp_run_pre_unpatch_callbacks_when_replacing(patch);

return 0;
}

2nd variant is easier to read but a lot of code. And this is only
what we would need for __klp_enable_patch(). But we would need
solution also for:

klp_cancel_transition();
klp_try_transition();   (2 variants for patching and unpatching)
klp_module_coming();
klp_module_going();

So, we are talking about a lot of rather non-trivial code.
IMHO, it might be easier to run just the callbacks from
the new patch. In reality, the author should always know
what it might be replacing and what needs to be done.

By other words, it might be much easier to handle all
situations in a single script in the new patch. Alternative
would be doing crazy hacks to prevent the older scripts from
destroying what we would like to keep. We would need to
keep in mind interactions between the scripts and
the order in which they are called.

Or do you plan to use cumulative patches to simply
get rid of any other "random" livepatches with something
completely different? In this case, it might be much more
safe to disable the older patches a normal way.

I would suggest to just document the current behavior.
We should create Documentation/livepatch/cummulative-patches.txt
anyway.

Best Regards,
Petr

Re: [PATCH 05/24] x86: Annotate indirect jump in head_64.S

2018-01-26 Thread David Woodhouse

On Tue, 2018-01-23 at 16:25 +0100, Peter Zijlstra wrote:
> The objtool retpoline validation found this indirect jump. Seeing how
> its on CPU bringup before we run userspace it should be safe, annotate

http://angryflower.com/itsits.gif

> it.
> 
> Signed-off-by: Peter Zijlstra (Intel) 

Reviewed-by: David Woodhouse 


> ---
>  arch/x86/kernel/head_64.S |2 ++
>  1 file changed, 2 insertions(+)
> 
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -23,6 +23,7 @@
>  #include 
>  #include "../entry/calling.h"
>  #include 
> +#include 
>  
>  #ifdef CONFIG_PARAVIRT
>  #include 
> @@ -134,6 +135,7 @@ ENTRY(secondary_startup_64)
>  
>   /* Ensure I am executing from virtual addresses */
>   movq$1f, %rax
> + ANNOTATE_RETPOLINE_SAFE
>   jmp *%rax
>  1:
>   UNWIND_HINT_EMPTY
> 
> 

smime.p7s
Description: S/MIME cryptographic signature

Re: [PATCH] base: power: domain: Replace mdelay with msleep

2018-01-26 Thread Pavel Machek

On Fri 2018-01-26 16:38:19, Jia-Ju Bai wrote:
> After checking all possible call chains to genpd_dev_pm_detach() and
> genpd_dev_pm_attach() here,
> my tool finds that these functions are never called in atomic context,
> namely never in an interrupt handler or holding a spinlock.
> Thus mdelay can be replaced with msleep to avoid busy wait.
> 
> This is found by a static analysis tool named DCNS written by
myself.

Well, cond_resched() just after msleep certainly looks like that.

Did the patch receive any testing?

> Signed-off-by: Jia-Ju Bai 
> ---
>  drivers/base/power/domain.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> index 0c80bea..f84ac72 100644
> --- a/drivers/base/power/domain.c
> +++ b/drivers/base/power/domain.c
> @@ -2144,7 +2144,7 @@ static void genpd_dev_pm_detach(struct device *dev, 
> bool power_off)
>   if (ret != -EAGAIN)
>   break;
>  
> - mdelay(i);
> + msleep(i);
>   cond_resched();
>   }
>  
> @@ -2231,7 +2231,7 @@ int genpd_dev_pm_attach(struct device *dev)
>   if (ret != -EAGAIN)
>   break;
>  
> - mdelay(i);
> + msleep(i);
>   cond_resched();
>   }
>   mutex_unlock(&gpd_list_lock);

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

Re: [PATCH] base: power: domain: Replace mdelay with msleep

2018-01-26 Thread Jia-Ju Bai




On 2018/1/26 18:26, Pavel Machek wrote:

On Fri 2018-01-26 16:38:19, Jia-Ju Bai wrote:

After checking all possible call chains to genpd_dev_pm_detach() and
genpd_dev_pm_attach() here,
my tool finds that these functions are never called in atomic context,
namely never in an interrupt handler or holding a spinlock.
Thus mdelay can be replaced with msleep to avoid busy wait.

This is found by a static analysis tool named DCNS written by

myself.

Well, cond_resched() just after msleep certainly looks like that.

Did the patch receive any testing?



Thanks for reply :)

I only perform compilation testing but did not run it in real execution.


Thanks,
Jia-Ju Bai

Re: [PATCH v2 01/15] Documentation: add newcx initramfs format description

2018-01-26 Thread Henrique de Moraes Holschuh

On Thu, 25 Jan 2018, Rob Landley wrote:
> That said, I don't think -h newcx should emit (or recognize) the
> "TRAILER!!!1!" entry. That's kinda silly in-band signaling for 2018:
> files have a length, pipes provide EOF, and each cpiox entry starts with
> 6 bytes of c_magic anyway. (I stopped toybox from producing the TRAILER
> entry back in june, toybox commit 32550751997d, and the kernel consumes
> the resulting cpio just fine. All the trailer does is prevent you from
> concatenating cpio files, which is a feature multiple people asked me for.)

Not in the kernel.  What TRAILER does in the kernel is to act as a
barrier for the hardlink creation state, which IS a good thing.  You
could just specify it as such for "newcx".

The kernel will continue reading for more entries after TRAILER, so
concatenation is not broken by TRAILER.  It is also insensitive to
NUL-padding length (as long as it is 4-byte aligned), which is another
nice feature you could specify for "newcx".

Also, the kernel does something nothing in userspace ever tried to,
AFAIK: it detects compression signatures along with the CPIO header
signatures, and thus it can take several compressed and uncompressed
archives concatenater together (and the compressor doesn't need to be
the same, either).

-- 
  Henrique Holschuh

Re: PATCH v6 6/6] livepatch: Add atomic replace

2018-01-26 Thread Evgenii Shatokhin


On 25.01.2018 19:02, Petr Mladek wrote:

From: Jason Baron 

Sometimes we would like to revert a particular fix. Currently, this
is not easy because we want to keep all other fixes active and we
could revert only the last applied patch.

One solution would be to apply new patch that implemented all
the reverted functions like in the original code. It would work
as expected but there will be unnecessary redirections. In addition,
it would also require knowing which functions need to be reverted at
build time.

Another problem is when there are many patches that touch the same
functions. There might be dependencies between patches that are
not enforced on the kernel side. Also it might be pretty hard to
actually prepare the patch and ensure compatibility with
the other patches.

A better solution would be to create cumulative patch and say that
it replaces all older ones.

This patch adds a new "replace" flag to struct klp_patch. When it is
enabled, a set of 'nop' klp_func will be dynamically created for all
functions that are already being patched but that will not longer be
modified by the new patch. They are temporarily used as a new target
during the patch transition.

There are used several simplifications:

   + nops' structures are generated already when the patch is registered.
 All registered patches are taken into account, even the disabled ones.
 As a result, there might be more nops than are really needed when
 the patch is enabled and some disabled patches were removed before.
 But we are on the safe side and it simplifies the implementation.
 Especially we could reuse the existing init() functions. Also freeing
 is easier because the same nops are created and removed only once.

 Alternative solution would be to create nops when the patch is enabled.
 But then we would need to complicated to reuse the init() functions
 and error paths. It would increase the risk of errors because of
 late kobject initialization. It would need tricky waiting for
 freed kobjects when finalizing a reverted enable transaction.

   + The replaced patches are removed from the stack and cannot longer
 be enabled directly. Otherwise, we would need to implement a more
 complex logic of handling the stack of patches. It might be hard
 to come with a reasonable semantic.

 A fallback is to remove (rmmod) the replaced patches and register
 (insmod) them again.

   + Nops are handled like normal function patches. It reduces changes
 in the existing code.

 It would be possible to copy internal values when they are allocated
 and make short cuts in init() functions. It would be possible to use
 the fact that old_func and new_func point to the same function and
 do not init new_func and new_size at all. It would be possible to
 detect nop func in ftrace handler and just leave. But all these would
 just complicate the code and maintenance.

   + The callbacks from the replaced patches are not called. It would be
 pretty hard to define a reasonable semantic and implement it.


At least, it surely simplifies error handling, if these callbacks are 
not called.


Anyway, I guess, this restriction should be mentioned explicitly in the 
docs. I think this is not obvious for the patch developers (esp. those 
familiar with RPM spec files and such ;-) ).


What concerns me is that downgrading of the cumulative patches with 
callbacks becomes much more difficult this way.


I mean, suppose a user has v1 of a cumulative patch installed. Then a 
newer version, v2, is released. They install it and find that it is 
buggy (very unfortunate but might still happen). Now they cannot 
atomically replace v2 back with v1, because the callbacks from v1 cannot 
clean up after v2.


It will be needed to unload v2 explicitly and then load v1 back, which 
is more fragile. The loading failures are much more unlikely with 
livepatch than with the old kpatch, but they are still possible.


I have no good solution to this though.



 It might even be counter-productive. The new patch is cumulative.
 It is supposed to include most of the changes from older patches.
 In most cases, it will not want to call pre_unpatch() post_unpatch()
 callbacks from the replaced patches. It would disable/break things
 for no good reasons. Also it should be easier to handle various
 scenarios in a single script in the new patch than think about
 interactions caused by running many scripts from older patches.
 No to say that the old scripts even would not expect to be called
 in this situation.

Signed-off-by: Jason Baron 
[pmla...@suse.com: Split, reuse existing code, simplified]
Signed-off-by: Petr Mladek 
Cc: Josh Poimboeuf 
Cc: Jessica Yu 
Cc: Jiri Kosina 
Cc: Miroslav Benes 
Cc: Petr Mladek 
---
  include/linux/livepatch.h |   3 +
  kernel/livepatch/core.c   | 162 +-
  kernel/livepatch/core.h

[PATCH] x86/microcode/intel: print previous microcode revision during early update

2018-01-26 Thread Petr Oros

  When kernel do early microcode update, code printing only
  new microcode version. But in this case we no have chance
  to check which version was in cpu from BIOS vendor.

  Patch add this info into output message.

Signed-off-by: Petr Oros 
---
 arch/x86/kernel/cpu/microcode/intel.c | 27 ++-
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel.c 
b/arch/x86/kernel/cpu/microcode/intel.c
index d9e460fc7a3b..78330d29cd4c 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -515,9 +515,11 @@ static bool load_builtin_intel_microcode(struct cpio_data 
*cp)
  * Print ucode update info.
  */
 static void
-print_ucode_info(struct ucode_cpu_info *uci, unsigned int date)
+print_ucode_info(struct ucode_cpu_info *uci, unsigned int date,
+   unsigned int prev_rev)
 {
-   pr_info_once("microcode updated early to revision 0x%x, date = 
%04x-%02x-%02x\n",
+   pr_info_once("microcode updated early from revision 0x%x to 0x%x, date 
= %04x-%02x-%02x\n",
+prev_rev,
 uci->cpu_sig.rev,
 date & 0x,
 date >> 24,
@@ -528,6 +530,7 @@ print_ucode_info(struct ucode_cpu_info *uci, unsigned int 
date)
 
 static int delay_ucode_info;
 static int current_mc_date;
+static int prev_revision;
 
 /*
  * Print early updated ucode info after printk works. This is delayed info 
dump.
@@ -538,7 +541,7 @@ void show_ucode_info_early(void)
 
if (delay_ucode_info) {
collect_cpu_info_early(&uci);
-   print_ucode_info(&uci, current_mc_date);
+   print_ucode_info(&uci, current_mc_date, prev_revision);
delay_ucode_info = 0;
}
 }
@@ -547,11 +550,12 @@ void show_ucode_info_early(void)
  * At this point, we can not call printk() yet. Delay printing microcode info 
in
  * show_ucode_info_early() until printk() works.
  */
-static void print_ucode(struct ucode_cpu_info *uci)
+static void print_ucode(struct ucode_cpu_info *uci, unsigned int prev_rev)
 {
struct microcode_intel *mc;
int *delay_ucode_info_p;
int *current_mc_date_p;
+   int *prev_revision_p;
 
mc = uci->mc;
if (!mc)
@@ -559,13 +563,16 @@ static void print_ucode(struct ucode_cpu_info *uci)
 
delay_ucode_info_p = (int *)__pa_nodebug(&delay_ucode_info);
current_mc_date_p = (int *)__pa_nodebug(¤t_mc_date);
+   prev_revision_p = (int *)__pa_nodebug(&prev_revision);
 
*delay_ucode_info_p = 1;
*current_mc_date_p = mc->hdr.date;
+   *prev_revision_p = prev_rev;
 }
 #else
 
-static inline void print_ucode(struct ucode_cpu_info *uci)
+static inline void print_ucode(struct ucode_cpu_info *uci,
+   unsigned int prev_rev)
 {
struct microcode_intel *mc;
 
@@ -573,19 +580,21 @@ static inline void print_ucode(struct ucode_cpu_info *uci)
if (!mc)
return;
 
-   print_ucode_info(uci, mc->hdr.date);
+   print_ucode_info(uci, mc->hdr.date, prev_rev);
 }
 #endif
 
 static int apply_microcode_early(struct ucode_cpu_info *uci, bool early)
 {
struct microcode_intel *mc;
-   u32 rev;
+   u32 rev, prev_rev;
 
mc = uci->mc;
if (!mc)
return 0;
 
+   prev_rev = intel_get_microcode_revision();
+
/* write microcode via MSR 0x79 */
native_wrmsrl(MSR_IA32_UCODE_WRITE, (unsigned long)mc->bits);
 
@@ -596,9 +605,9 @@ static int apply_microcode_early(struct ucode_cpu_info 
*uci, bool early)
uci->cpu_sig.rev = rev;
 
if (early)
-   print_ucode(uci);
+   print_ucode(uci, prev_rev);
else
-   print_ucode_info(uci, mc->hdr.date);
+   print_ucode_info(uci, mc->hdr.date, prev_rev);
 
return 0;
 }
-- 
2.16.1

Re: [PATCH 08/24] x86,sme: Annotate indirect call

2018-01-26 Thread David Woodhouse

On Tue, 2018-01-23 at 16:25 +0100, Peter Zijlstra wrote:
> This is boot code, we run this _way_ before userspace comes along to
> poison our branch predictor.

Hm, objtool knows about sections, doesn't it? Why it is whining about
indirect jumps in inittext anyway?

In fact, why are we even *doing* retpolines in inittext? Not that we
are; since we flipped the ALTERNATIVE logic around, at that point we
still have the 'oldinstr' which is a bare jmp anyway. We might as well
do this:

--- a/include/linux/init.h
+++ b/include/linux/init.h
@@ -37,10 +37,15 @@
  * as gcc otherwise puts the data into the bss section and not into the init
  * section.
  */
+#if defined(RETPOLINE) && !defined(MODULE)
+#define __noretpoline __attribute__((indirect_branch("keep")))
+#else
+#define __noretpoline
+#endif

 /* These are for everybody (although not all archs will actually
discard it in modules) */
-#define __init __section(.init.text) __cold __inittrace 
__latent_entropy
+#define __init __section(.init.text) __cold __inittrace 
__latent_entropy __noretpoline
 #define __initdata __section(.init.data)
 #define __initconst__section(.init.rodata)
 #define __exitdata __section(.exit.data)

I had that once and dropped it because of concerns about VM guests
being "vulnerable" at boot time. But really, do they even have any
interesting data to purloin at that point? And shouldn't the hypervisor
be protecting them with STIBP if they have nasty HT siblings? 

(And if hypervisors do start doing that, it might be nice for a guest
to have a way to say "you can stop now; I'm safe")

smime.p7s
Description: S/MIME cryptographic signature

Re: [PATCH] x86/microcode/intel: print previous microcode revision during early update

2018-01-26 Thread Borislav Petkov

On Fri, Jan 26, 2018 at 11:34:50AM +0100, Petr Oros wrote:
>   When kernel do early microcode update, code printing only
>   new microcode version. But in this case we no have chance
>   to check which version was in cpu from BIOS vendor.

And we need that because?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH 13/24] objtool: Implement base jump_assert support

2018-01-26 Thread David Woodhouse

On Tue, 2018-01-23 at 16:25 +0100, Peter Zijlstra wrote:
> Implement a jump_label assertion that asserts that the code location
> is indeed only reachable through a static_branch. Because if GCC is
> absolutely retaded it could generate code like:
> 
> xor rax,rax
> NOP/JMP 1f
> mov $1, rax
> 1:
> test rax,rax
> jz 2f
> 
> 2:
> 
> instead of the sensible:
> 
> NOP/JMP 1f
> 
> 1:
> 
> This implements objtool infrastructure for ensuring the code ends up
> sane, since we'll rely on that for correctness and security.
> 
> We tag the instructions after the static branch with static_jump_dest=true;
> that is the instruction after the NOP and the instruction at the
> JMP+disp site.
> 
> Then, when we read the .discard.jump_assert section, we assert that
> each entry points to an instruction that has static_jump_dest set.
> 
> With this we can assert that the code emitted for the if statement
> ends up at the static jump location and nothing untowards happened.
> 
> Cc: Thomas Gleixner 
> Cc: Borislav Petkov 
> Cc: Josh Poimboeuf 
> 
> Signed-off-by: Peter Zijlstra (Intel) 

Thank you for pandering to my paranoia. I suspect that misspelling the
word 'retarded' isn't going to be sufficient to stop people from
objecting to the use of that word, but other than that,

Reviewed-by: David Woodhouse 


smime.p7s
Description: S/MIME cryptographic signature

[PATCH v2] net: macb: Handle HRESP error

2018-01-26 Thread Harini Katakam

From: Harini Katakam 

Handle HRESP error by doing a SW reset of RX and TX and
re-initializing the descriptors, RX and TX queue pointers.

Signed-off-by: Harini Katakam 
Signed-off-by: Michal Simek 
---
v2:
Rebased on top of latest net-next and reinitialized
all rx queues.

 drivers/net/ethernet/cadence/macb.h  |  3 ++
 drivers/net/ethernet/cadence/macb_main.c | 59 +---
 2 files changed, 58 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.h 
b/drivers/net/ethernet/cadence/macb.h
index c50c5ec..8665982 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 

 #if defined(CONFIG_ARCH_DMA_ADDR_T_64BIT) || defined(CONFIG_MACB_USE_HWSTAMP)
 #define MACB_EXT_DESC
@@ -1200,6 +1201,8 @@ struct macb {
struct ethtool_rx_fs_list rx_fs_list;
spinlock_t rx_fs_lock;
unsigned int max_tuples;
+
+   struct tasklet_struct   hresp_err_tasklet;
 };

 #ifdef CONFIG_MACB_USE_HWSTAMP
diff --git a/drivers/net/ethernet/cadence/macb_main.c 
b/drivers/net/ethernet/cadence/macb_main.c
index 234667e..e84afcf 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -1258,6 +1258,57 @@ static int macb_poll(struct napi_struct *napi, int 
budget)
return work_done;
 }

+static void macb_hresp_error_task(unsigned long data)
+{
+   struct macb *bp = (struct macb *)data;
+   struct net_device *dev = bp->dev;
+   struct macb_queue *queue = bp->queues;
+   unsigned int q;
+   u32 ctrl;
+
+   for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) {
+   queue_writel(queue, IDR, MACB_RX_INT_FLAGS |
+MACB_TX_INT_FLAGS |
+MACB_BIT(HRESP));
+   }
+   ctrl = macb_readl(bp, NCR);
+   ctrl &= ~(MACB_BIT(RE) | MACB_BIT(TE));
+   macb_writel(bp, NCR, ctrl);
+
+   netif_tx_stop_all_queues(dev);
+   netif_carrier_off(dev);
+
+   bp->macbgem_ops.mog_init_rings(bp);
+
+   /* Initialize TX and RX buffers */
+   for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) {
+   queue_writel(queue, RBQP, lower_32_bits(queue->rx_ring_dma));
+#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
+   if (bp->hw_dma_cap & HW_DMA_CAP_64B)
+   queue_writel(queue, RBQPH,
+upper_32_bits(queue->rx_ring_dma));
+#endif
+   queue_writel(queue, TBQP, lower_32_bits(queue->tx_ring_dma));
+#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
+   if (bp->hw_dma_cap & HW_DMA_CAP_64B)
+   queue_writel(queue, TBQPH,
+upper_32_bits(queue->tx_ring_dma));
+#endif
+
+   /* Enable interrupts */
+   queue_writel(queue, IER,
+MACB_RX_INT_FLAGS |
+MACB_TX_INT_FLAGS |
+MACB_BIT(HRESP));
+   }
+
+   ctrl |= MACB_BIT(RE) | MACB_BIT(TE);
+   macb_writel(bp, NCR, ctrl);
+
+   netif_carrier_on(dev);
+   netif_tx_start_all_queues(dev);
+}
+
 static irqreturn_t macb_interrupt(int irq, void *dev_id)
 {
struct macb_queue *queue = dev_id;
@@ -1347,10 +1398,7 @@ static irqreturn_t macb_interrupt(int irq, void *dev_id)
}

if (status & MACB_BIT(HRESP)) {
-   /* TODO: Reset the hardware, and maybe move the
-* netdev_err to a lower-priority context as well
-* (work queue?)
-*/
+   tasklet_schedule(&bp->hresp_err_tasklet);
netdev_err(dev, "DMA bus error: HRESP not OK\n");

if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
@@ -3937,6 +3985,9 @@ static int macb_probe(struct platform_device *pdev)
goto err_out_unregister_mdio;
}

+   tasklet_init(&bp->hresp_err_tasklet, macb_hresp_error_task,
+(unsigned long)bp);
+
phy_attached_info(phydev);

netdev_info(dev, "Cadence %s rev 0x%08x at 0x%08lx irq %d (%pM)\n",
--
2.7.4

This email and any attachments are intended for the sole use of the named 
recipient(s) and contain(s) confidential information that may be proprietary, 
privileged or copyrighted under applicable law. If you are not the intended 
recipient, do not read, copy, or forward this email message or any attachments. 
Delete this email message and any attachments immediately.

Re: [PATCH v4] Support intel-vbtn based tablet mode switch

2018-01-26 Thread Pali Rohár

On Friday 26 January 2018 10:45:55 Marco Martin wrote:
> On martedì 23 gennaio 2018 16:18:24 CET Marco Martin wrote:
> > Some laptops such as Dell Inspiron 7000 series have the
> > tablet mode switch implemented in Intel ACPI,
> > the events to enter and exit the tablet mode are 0xCC and 0xCD
> > 
> > CC: platform-driver-...@vger.kernel.org
> > CC: Matthew Garrett 
> > CC: "Pali Rohár" 
> > CC: Darren Hart 
> > CC: Mario Limonciello 
> > CC: Andy Shevchenko 
> > 
> > Signed-off-by: Marco Martin 
> > ---
> >  drivers/platform/x86/intel-vbtn.c | 21 +
> >  1 file changed, 21 insertions(+)
> > 
> > diff --git a/drivers/platform/x86/intel-vbtn.c
> > b/drivers/platform/x86/intel-vbtn.c index 58c5ff3..64b4b34 100644
> > --- a/drivers/platform/x86/intel-vbtn.c
> > +++ b/drivers/platform/x86/intel-vbtn.c
> > @@ -26,6 +26,9 @@
> >  #include 
> >  #include 
> > 
> > +/* When NOT in tablet mode, VBDS has the flag 0x40 */
> > +#define TABLET_MODE_FLAG 0x40
> > +
> >  MODULE_LICENSE("GPL");
> >  MODULE_AUTHOR("AceLan Kao");
> > 
> > @@ -42,6 +45,8 @@ static const struct key_entry intel_vbtn_keymap[] = {
> > { KE_IGNORE, 0xC5, { KEY_VOLUMEUP } },  /* volume-up key 
> > release */
> > { KE_KEY, 0xC6, { KEY_VOLUMEDOWN } },   /* volume-down key 
> > press */
> > { KE_IGNORE, 0xC7, { KEY_VOLUMEDOWN } },/* volume-down key 
> > release */
> > +   { KE_SW,  0xCC, { .sw = { SW_TABLET_MODE, 1 } } }, /* Tablet mode in */
> > +   { KE_SW,  0xCD, { .sw = { SW_TABLET_MODE, 0 } } }, /* Tablet mode out */
> > { KE_END },
> >  };
> > 
> > @@ -88,6 +93,7 @@ static void notify_handler(acpi_handle handle, u32 event,
> > void *context)
> > 
> >  static int intel_vbtn_probe(struct platform_device *device)
> >  {
> > +   struct acpi_buffer vgbs_output = { ACPI_ALLOCATE_BUFFER, NULL };
> > acpi_handle handle = ACPI_HANDLE(&device->dev);
> > struct intel_vbtn_priv *priv;
> > acpi_status status;
> > @@ -110,6 +116,21 @@ static int intel_vbtn_probe(struct platform_device
> > *device) return err;
> > }
> > 
> > +   status = acpi_evaluate_object(handle, "VGBS", NULL, &vgbs_output);
> > +   /* VGBS being present and returning something means
> > +* we have a tablet mode switch
> > +*/
> > +   if (ACPI_SUCCESS(status)) {
> > +   union acpi_object *obj = vgbs_output.pointer;
> > +
> > +   if (obj && obj->type == ACPI_TYPE_INTEGER) {
> > +   input_set_capability(priv->input_dev, EV_SW, 
> > SW_TABLET_MODE);
> > +   input_report_switch(priv->input_dev,
> > +   SW_TABLET_MODE,
> > +   
> > !(obj->integer.value & TABLET_MODE_FLAG));
> > +   }
> > +   }
> > +
> > status = acpi_install_notify_handler(handle,
> >  ACPI_DEVICE_NOTIFY,
> >  notify_handler,
> 
> Is there still something to change in this version of the patch?

Yes, I already wrote it in thread for older patch version. Calling
input_set_capability() is not needed at all because all capabilities are
already set by sparse_keymap_setup() function.

-- 
Pali Rohár
pali.ro...@gmail.com

Re: [PATCH v2] of: use hash based search in of_find_node_by_phandle

2018-01-26 Thread Rasmus Villemoes

On 2018-01-26 09:31, Chintan Pandya wrote:
> Implement, device-phandle relation in hash-table so
> that look up can be faster, irrespective of where my
> device is defined in the DT.
> 
> There are ~6.7k calls to of_find_node_by_phandle() and
> total improvement observed during boot is 400ms.

I'm probably missing something obvious, but: Aren't phandles in practice
small consecutive integers assigned by dtc? If so, why not just have a
smallish static array mapping the small phandle values directly to
device node, instead of adding a pointer to every struct device_node? Or
one could determine the size of the array dynamically (largest seen
phandle value, capping at something sensible, e.g. 1024).

In either case, one would still need to keep the code doing the
whole-tree traversal for handling large phandle values, but I think the
above should make lookup O(1) in most cases.

Alternatively, one could just count the number of nodes with a phandle,
allocate an array of that many pointers (so the memory use is certainly
no more than if adding a pointer to each device_node), and sort it by
phandle, so one can do lookup using a binary search.

Rasmus

1 2 3 4 5 6 7 8 >

1 - 100 of 727 matches

Mail list logo