date:20191010

Re: [PATCH v2 04/22] mtd: spi-nor: Rename nor->params to nor->flash

2019-10-10 Thread Boris Brezillon

On Tue, 24 Sep 2019 07:46:03 +
 wrote:

> From: Tudor Ambarus 
> 
> Rename nor->params to nor->flash for a clearer separation
> between the controller and flash operations.

Hm, I'm not sure 'flash' is clearer than 'params', and the spi_nor
object is supposed to represent the NOR chip anyway, so it was pretty
clear to me that nor->params were the NOR flash parameters not the
NOR controller ones.
If I had anything to change it would be s/params/properties/ (and
s/spi_nor_flash_parameter/spi_nor_properties/) since those parameters
look like immutable information discovered during the NOR detection,
but I'm nitpicking here.

RE: [RFC/RFT][PATCH v8] cpuidle: New timer events oriented governor for tickless systems

2019-10-10 Thread Doug Smythies

On 2019.10.09 06:37 Rafael J. Wysocki wrote:
> On Wednesday, October 9, 2019 1:19:51 AM CEST Rafael J. Wysocki wrote:
>> On Tuesday, October 8, 2019 12:49:01 PM CEST Rafael J. Wysocki wrote:
>>> On Tue, Oct 8, 2019 at 11:51 AM Rafael J. Wysocki  wrote:
 On Tue, Oct 8, 2019 at 8:20 AM Doug Smythies  wrote:
> O.K. Thanks for your quick reply, and insight.
>
> I think long durations always need to be counted, but currently if
> the deepest idle state is disabled, they are not.
...
 AFAICS, adding early_hits to count is not a mistake if there are still
 enabled states deeper than the current one.
>>> 
>>> And the mistake appears to be that the "hits" and "misses" metrics
>>> aren't handled in analogy with the "early_hits" one when the current
>>> state is disabled.

I only know how to exploit and test the "hits" and "misses" path
that should use the deepest available idle state upon transition
to an idle system. Even so, the test has a low probability of
failing, and so needs to be run many times.

I do not know how to demonstrate and/or test any "early_hits" path
to confirm that an issue exists or that it is fixed.

>>> 
>>> Let me try to cut a patch to address that.
>> 
>> Appended below, not tested.

Reference as: rjw1

>> 
>> It is meant to address two problems, one of which is that the "hits" and
>> "misses" metrics of disabled states need to be taken into account too in
>> some cases, and the other is an issue with the handling of "early hits"
>> which may lead to suboptimal state selection if some states are disabled.
>
> Well, it still misses a couple of points.
>
> First, disable states that are too deep should not be taken into consideration
> at all.
>
> Second, the "hits" and "misses" metrics of disabled states need to be used for
> idle duration ranges corresponding to them regardless of whether or not the
> "hits" value is greater than the "misses" one.
>
> Updated patch is below (still not tested), but it tries to do too much in one
> go, so I need to split it into a series of smaller changes.

Thanks for your continued look at this.

Reference as: rjw2

Test 1, hack job statistical test (old tests re-stated):

Kernel  testsfail rate
5.4-rc1  6616   13.45%
5.3  23764.50%
5.3-teov7   121360.00%  <<< teo.c reverted and teov7 
put in its place.
5.4-rc1-ds  111680.00%  <<< [old] ds proposed patch (> 
7 hours test time)
5.4-rc1-ds12 42240.00% <<< [old] new ds proposed patch
5.4-rc2-rjw1112800.00%
5.4-rc2-rjw2  6400.00%  <<< Will be run again, for longer.

Test 2: I also looked at every possible enable/disable idle combination,
and they all seemed O.K.

No other tests have been run yet.

System:
Processor: i7-2600K
Deepest idle state: 4 (C6)

... Doug

Re: [PATCH] pwm: stm32: add comment to better describe breakinput feature

2019-10-10 Thread Uwe Kleine-König

On Wed, Oct 09, 2019 at 11:51:05AM +0200, Fabrice Gasnier wrote:
> On 10/8/19 4:45 PM, Uwe Kleine-König wrote:
> > On Tue, Oct 08, 2019 at 01:41:27PM +0200, Fabrice Gasnier wrote:
> >> Add a comment to better describe the purpose of breakinput feature that
> >> can be found on some STM32 timer instances. Briefly comment on the
> >> characteristics of this input for PWM, and pinmuxing as suggested in [1].
> >>
> >> [1] https://lkml.org/lkml/2019/10/1/207
> >>
> >> Signed-off-by: Fabrice Gasnier 
> >> ---
> >>  drivers/pwm/pwm-stm32.c | 8 +++-
> >>  1 file changed, 7 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/pwm/pwm-stm32.c b/drivers/pwm/pwm-stm32.c
> >> index 359b085..6406ebb 100644
> >> --- a/drivers/pwm/pwm-stm32.c
> >> +++ b/drivers/pwm/pwm-stm32.c
> >> @@ -522,8 +522,14 @@ static int stm32_pwm_apply_breakinputs(struct 
> >> stm32_pwm *priv,
> >> sizeof(struct stm32_breakinput));
> >>  
> >>/*
> >> +   * Some timer instances can have BRK input pins (e.g. basically a fault
> >> +   * pin from the output power stage). The break feature allows a safe
> >> +   * shut-down of the PWM outputs to a predefined state. Further details
> >> +   * are available in application note AN4277, "Using STM32 device PWM
> >> +   * shut-down features..."
> > 
> > Without having read the application note I don't understand the purpose.
> > Not sure if this should be a show stopper though.
> 
> Hi Uwe,
> 
> I can rephrase this. Do you think the bellow comment would be more
> relevant and easy to understand ?
> 
> /*
>  * The break feature allows a safe shut-down of the PWM outputs.
>  * It's based on the BRK event signal defined in the dt-bindings
>  * by  values.
>  * Because "st,breakinput" parameter is optional do not make probe
>  * failed if it doesn't exist.
>  */

I still fail to understand. This is an input that determines the actual
value of the output pin? What makes a shutdown of outputs safe? What is
a shutdown anyhow?

Apart from that: s/do not make probe failed/don't fail to probe/.

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |

Re: [PATCH] clk: bcm2835: Fix memory leak in bcm2835_register_pll

2019-10-10 Thread Alexander Stein

Hello,

On Thursday, October 10, 2019, 3:30:58 AM CEST Navid Emamdoost wrote:
> In the implementation of bcm2835_register_pll(), the allocated memory
> for pll should be released if devm_clk_hw_register() fails.
> 
> Fixes: b19f009d4510 ("clk: bcm2835: Migrate to clk_hw based registration and 
> OF APIs")
> Signed-off-by: Navid Emamdoost 
> ---
>  drivers/clk/bcm/clk-bcm2835.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/clk/bcm/clk-bcm2835.c b/drivers/clk/bcm/clk-bcm2835.c
> index 802e488fd3c3..99549642110a 100644
> --- a/drivers/clk/bcm/clk-bcm2835.c
> +++ b/drivers/clk/bcm/clk-bcm2835.c
> @@ -1320,8 +1320,10 @@ static struct clk_hw *bcm2835_register_pll(struct 
> bcm2835_cprman *cprman,
>   pll->hw.init = &init;
>  
>   ret = devm_clk_hw_register(cprman->dev, &pll->hw);
> - if (ret)
> + if (ret) {
> + kfree(pll);
>   return NULL;
> + }
>   return &pll->hw;
>  }

Eh, is pll freed at all, even in successful case? I failed to find a 
corresponding kfree().
The pointer itself is lost once the function returns.
The solution would rather be to use devm_kzalloc instead of kzalloc, like the 
other clocks
in e.g. bcm2835_register_pll()

Best regards,
Alexander

Re: [PATCH v2 05/22] mtd: spi-nor: Rework read_sr()

2019-10-10 Thread Boris Brezillon

On Tue, 24 Sep 2019 07:46:08 +
 wrote:

> From: Tudor Ambarus 
> 
> static int read_sr(struct spi_nor *nor)
> becomes
> static int spi_nor_read_sr(struct spi_nor *nor, u8 *sr)
> 
> The new function returns 0 on success and -errno otherwise.
> We let the callers pass the pointer to the buffer where the
> value of the Status Register will be written. This way we avoid
> the casts between int and u8, which can be confusing.
> 
> Prepend spi_nor_ to the function name, all functions should begin
> with that.
> 
> S/pr_err/dev_err and drop duplicated dev_err in callers, in case the
> function returns error.

Too many things done in a single patch, can you split that please?

> 
> Signed-off-by: Tudor Ambarus 
> ---
>  drivers/mtd/spi-nor/spi-nor.c | 131 
> +-
>  1 file changed, 65 insertions(+), 66 deletions(-)
> 
> diff --git a/drivers/mtd/spi-nor/spi-nor.c b/drivers/mtd/spi-nor/spi-nor.c
> index 7d0c1b598250..a23783641146 100644
> --- a/drivers/mtd/spi-nor/spi-nor.c
> +++ b/drivers/mtd/spi-nor/spi-nor.c
> @@ -388,12 +388,14 @@ static ssize_t spi_nor_write_data(struct spi_nor *nor, 
> loff_t to, size_t len,
>   return nor->controller_ops->write(nor, to, len, buf);
>  }
>  
> -/*
> - * Read the status register, returning its value in the location
> - * Return the status register value.
> - * Returns negative if error occurred.
> +/**
> + * spi_nor_read_sr() - Read the Status Register.
> + * @nor:pointer to 'struct spi_nor'
> + * @sr:  buffer where the value of the Status Register will be 
> written.

You should definitely mention that this sr pointer has to be DMA-safe.

> + *
> + * Return: 0 on success, -errno otherwise.
>   */
> -static int read_sr(struct spi_nor *nor)
> +static int spi_nor_read_sr(struct spi_nor *nor, u8 *sr)
>  {
>   int ret;
>  
> @@ -402,20 +404,17 @@ static int read_sr(struct spi_nor *nor)
>   SPI_MEM_OP(SPI_MEM_OP_CMD(SPINOR_OP_RDSR, 1),
>  SPI_MEM_OP_NO_ADDR,
>  SPI_MEM_OP_NO_DUMMY,
> -SPI_MEM_OP_DATA_IN(1, nor->bouncebuf, 1));
> +SPI_MEM_OP_DATA_IN(1, sr, 1));
>  
>   ret = spi_mem_exec_op(nor->spimem, &op);
>   } else {
> - ret = nor->controller_ops->read_reg(nor, SPINOR_OP_RDSR,
> - nor->bouncebuf, 1);
> + ret = nor->controller_ops->read_reg(nor, SPINOR_OP_RDSR, sr, 1);
>   }
>  
> - if (ret < 0) {
> - pr_err("error %d reading SR\n", (int) ret);
> - return ret;
> - }
> + if (ret)
> + dev_err(nor->dev, "error %d reading SR\n", ret);
>  
> - return nor->bouncebuf[0];
> + return ret;
>  }

Re: [PATCH v2 06/22] mtd: spi-nor: Rework read_fsr()

2019-10-10 Thread Boris Brezillon

On Tue, 24 Sep 2019 07:46:12 +
 wrote:

> From: Tudor Ambarus 
> 
> static int read_fsr(struct spi_nor *nor)
> becomes
> static int spi_nor_read_fsr(struct spi_nor *nor, u8 *fsr)
> 
> The new function returns 0 on success and -errno otherwise.
> We let the callers pass the pointer to the buffer where the
> value of the Flag Status Register will be written. This way
> we avoid the casts between int and u8, which can be confusing.
> 
> Prepend spi_nor_ to the function name, all functions should begin
> with that.
> 
> S/pr_err/dev_err and drop duplicated dev_err in callers, in case the
> function returns error.

Same comments as for patch 5.

Re: [PATCH v2 07/22] mtd: spi-nor: Rework read_cr()

2019-10-10 Thread Boris Brezillon

On Tue, 24 Sep 2019 07:46:15 +
 wrote:

> From: Tudor Ambarus 
> 
> static int read_cr(struct spi_nor *nor)
> becomes
> static int spi_nor_read_cr(struct spi_nor *nor, u8 *cr)
> 
> The new function returns 0 on success and -errno otherwise.
> We let the callers pass the pointer to the buffer where the
> value of the Configuration Register will be written. This way
> we avoid the casts between int and u8, which can be confusing.
> 
> Prepend spi_nor_ to the function name, all functions should begin
> with that.
> 

Same as for patch 5, this should be split in several patches.

> Vendors are using both the "Configuration Register" and the
> "Status Register 2" terminology when referring to the second byte
> of the Status Register. Indicate in the description of the function
> that we use the SPINOR_OP_RDCR (35h) command to interrogate the

  ^query

> Configuration Register.
>

Re: [PATCH] string.h: Mark 34 functions with __must_check

2019-10-10 Thread Rasmus Villemoes

On 09/10/2019 18.31, Nick Desaulniers wrote:
> On Wed, Oct 9, 2019 at 7:30 AM Dan Carpenter  wrote:
>>
>> On Wed, Oct 09, 2019 at 04:21:20PM +0200, Rasmus Villemoes wrote:
>>> On 09/10/2019 15.56, Dan Carpenter wrote:
 That's because glibc strlen is annotated with __attribute_pure__ which
 means it has no side effects.
>>>
>>> I know, except it has nothing to do with glibc headers. Just try the
>>> same thing in the kernel. gcc itself knows this about __builtin_strlen()
>>> etc. If anything, we could annotate some of our non-standard functions
>>> (say, memchr_inv) with __pure - then we'd both get the Wunused-value in
>>> the nonsense cases, and allow gcc to optimize or reorder the calls.
>>
>> Huh.  You're right.  GCC already knows.  So this patch is pointless like
>> you say.
> 
> Is it? None of the functions in include/linux/string.h are currently
> marked __pure today. 

As I said, gcc knows this about the standard C functions, the ones it
has a __builtin_* version of. So it doesn't matter if the user adds a
__pure annotation or not (be it in libc or kernel headers). For example,
here's a line from gcc's builtins.def

DEF_LIB_BUILTIN(BUILT_IN_STRCMP, "strcmp",
BT_FN_INT_CONST_STRING_CONST_STRING, ATTR_PURE_NOTHROW_NONNULL_LEAF)

so gcc knows that __builtin_strcmp is not just pure, but its argument
can also be assumed to be non-NULL, etc. (OK, the kernel disables the
use of that knowledge, but this is irrelevant to this discussion).

In constrast, e.g. abs and most libm functions is

DEF_LIB_BUILTIN(BUILT_IN_ABS, "abs", BT_FN_INT_INT,
ATTR_CONST_NOTHROW_LEAF_LIST)

i.e. "attribute(__const__)" (se below).

 (Side note, I'm surprised that any function that
> accepts a pointer could be considered pure. I could reassign pointed
> to value without changing the pointers value.

Probably depends on your definition of pure. What matters is gcc's, which is

'const'
 Many functions do not examine any values except their arguments,
 and have no effects except the return value.  Basically this is
 just slightly more strict class than the 'pure' attribute below,
 since function is not allowed to read global memory.

 Note that a function that has pointer arguments and examines the
 data pointed to must _not_ be declared 'const'.  Likewise, a
 function that calls a non-'const' function usually must not be
 'const'.  It does not make sense for a 'const' function to return
 'void'.

'pure'
 Many functions have no effects except the return value and their
 return value depends only on the parameters and/or global
 variables.  Such a function can be subject to common subexpression
 elimination and loop optimization just as an arithmetic operator
 would be.  These functions should be declared with the attribute
 'pure'.  For example,

  int square (int) __attribute__ ((pure));

 says that the hypothetical function 'square' is safe to call fewer
 times than the program says.

 Some common examples of pure functions are 'strlen' or 'memcmp'.
 Interesting non-pure functions are functions with infinite loops or
 those depending on volatile memory or other system resource, that
 may change between two consecutive calls (such as 'feof' in a
 multithreading environment).

And yes, gcc knows perfectly well that more or less any write to memory
(except in the very few cases where it can prove no aliasing)
"invalidates" the result of a pure function. In some very simple cases,
that means it can hoist a strlen() call out of a badly written loop like

for (i = 0; i < strlen(s); ++i)

but the loop body can't be very complicated before that fails. Example:
https://godbolt.org/z/f0PPRt . I gave up trying to decipher what clang
produced.

(BTW, the pure example in the docs is bad, because that function is
likely to be const and not just pure.)

I can see strlen being
> "pure" for string literals, but not for char[].  This is something
> I'll play with more, I've already spotted one missed optimization in
> LLVM: https://bugs.llvm.org/show_bug.cgi?id=43624).

I see that compiler_attributes unconditionally #defines __pure, but
there's no reference to clang docs. Do they exist?

> I think it would be an interesting study to see how often functions
> that have return codes are ok to not check vs aren't ok (in a large
> production codebase like the Linux kernel), similar to how 97% of
> cases fallthrough is unintentional (which to me sounds like maybe the
> default behavior of the language is incorrect).

Again, _please_ do not confuse the name __must_check with what it
actually does and means. It only means you get a warning if you throw
away the result completely, it does not mean the return value must
soonish be subject to some if-condition (because how'd you define or
implement that...).

To reiterate:

(1) for __pure functions, it makes no sense to add __must_check, because
gcc already warns
(2) all the standard-C strin

Re: [PATCH v2 08/22] mtd: spi-nor: Rework write_enable/disable()

2019-10-10 Thread Boris Brezillon

On Tue, 24 Sep 2019 07:46:18 +
 wrote:

> From: Tudor Ambarus 
> 
> static int write_enable(struct spi_nor *nor)
> static int write_disable(struct spi_nor *nor)
> become
> static int spi_nor_write_enable(struct spi_nor *nor)
> static int spi_nor_write_disable(struct spi_nor *nor)
> 
> Check for errors after each call to them. Move them up in the
> file as the first SPI NOR Register Operations, to avoid further
> forward declarations.

Same here, split that in 3 patches please.

> 
> Signed-off-by: Tudor Ambarus 
> ---
>  drivers/mtd/spi-nor/spi-nor.c | 175 
> +-
>  1 file changed, 120 insertions(+), 55 deletions(-)
> 
> diff --git a/drivers/mtd/spi-nor/spi-nor.c b/drivers/mtd/spi-nor/spi-nor.c
> index 0fb124bd2e77..0aee068a5835 100644
> --- a/drivers/mtd/spi-nor/spi-nor.c
> +++ b/drivers/mtd/spi-nor/spi-nor.c
> @@ -389,6 +389,64 @@ static ssize_t spi_nor_write_data(struct spi_nor *nor, 
> loff_t to, size_t len,
>  }
>  
>  /**
> + * spi_nor_write_enable() - Set write enable latch with Write Enable command.
> + * @nor:pointer to 'struct spi_nor'
> + *
> + * Return: 0 on success, -errno otherwise.
> + */
> +static int spi_nor_write_enable(struct spi_nor *nor)
> +{
> + int ret;
> +
> + if (nor->spimem) {
> + struct spi_mem_op op =
> + SPI_MEM_OP(SPI_MEM_OP_CMD(SPINOR_OP_WREN, 1),
> +SPI_MEM_OP_NO_ADDR,
> +SPI_MEM_OP_NO_DUMMY,
> +SPI_MEM_OP_NO_DATA);
> +
> + ret = spi_mem_exec_op(nor->spimem, &op);
> + } else {
> + ret = nor->controller_ops->write_reg(nor, SPINOR_OP_WREN,
> +  NULL, 0);
> + }
> +
> + if (ret)
> + dev_err(nor->dev, "error %d on Write Enable\n", ret);

Do we really need these error messages? I mean, if there's an error it
should be propagated to the upper layer, so maybe we should use
dev_dbg() here.

> +
> + return ret;
> +}
> +

Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure()

2019-10-10 Thread David Hildenbrand

On 10.10.19 02:26, Naoya Horiguchi wrote:
> On Wed, Oct 09, 2019 at 04:24:35PM +0200, David Hildenbrand wrote:
>> We should check for pfn_to_online_page() to not access uninitialized
>> memmaps. Reshuffle the code so we don't have to duplicate the error
>> message.
>>
>> Cc: Naoya Horiguchi 
>> Cc: Andrew Morton 
>> Cc: Michal Hocko 
>> Signed-off-by: David Hildenbrand 
>> ---
>>  mm/memory-failure.c | 14 --
>>  1 file changed, 8 insertions(+), 6 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 7ef849da8278..e866e6e5660b 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags)
>>  if (!sysctl_memory_failure_recovery)
>>  panic("Memory failure on page %lx", pfn);
>>  
>> -if (!pfn_valid(pfn)) {
>> +p = pfn_to_online_page(pfn);
>> +if (!p) {
>> +if (pfn_valid(pfn)) {
>> +pgmap = get_dev_pagemap(pfn, NULL);
>> +if (pgmap)
>> +return memory_failure_dev_pagemap(pfn, flags,
>> +  pgmap);
>> +}
>>  pr_err("Memory failure: %#lx: memory outside kernel control\n",
>>  pfn);
>>  return -ENXIO;
>>  }
>>  
>> -pgmap = get_dev_pagemap(pfn, NULL);
>> -if (pgmap)
>> -return memory_failure_dev_pagemap(pfn, flags, pgmap);
>> -
>> -p = pfn_to_page(pfn);
> 
> This change seems to assume that memory_failure_dev_pagemap() is never
> called for online pages. Is it an intended behavior?
> Or the concept "online pages" is not applicable to zone device pages?

Yes, that's the real culprit. ZONE_DEVICE/devmem pages are never online
(SECTION_IS_ONLINE). The terminology "online" only applies to pages that
were given to the buddy. And as we support sup-section hotadd for
devmem, we cannot easily make use of the section flag it. I already
proposed somewhere to convert SECTION_IS_ONLINE to a subsection bitmap
and call it something like pfn_active().

pfn_online() would then be "pfn_active() && zone != ZONE_DEVICE". And we
could use pfn_active() everywhere to test for initialized memmaps (well,
besides some special cases like device reserved memory that does not
span full sub-sections). Until now, nobody volunteered and I have other
things to do.

-- 

Thanks,

David / dhildenb

[RFC v2 1/2] kvm: Mechanism to copy host timekeeping parameters into guest.

2019-10-10 Thread Suleiman Souhlal

This is used to synchronize time between host and guest.
The guest can request the (guest) physical address it wants the
data in through the MSR_KVM_TIMEKEEPER_EN MSR.

It currently assumes the host timekeeper is "tsc".

Signed-off-by: Suleiman Souhlal 
---
 arch/x86/include/asm/kvm_host.h  |   3 +
 arch/x86/include/asm/pvclock-abi.h   |  27 ++
 arch/x86/include/uapi/asm/kvm_para.h |   1 +
 arch/x86/kvm/x86.c   | 121 +++
 4 files changed, 152 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 50eb430b0ad8..4d622450cb4a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -659,7 +659,10 @@ struct kvm_vcpu_arch {
struct pvclock_vcpu_time_info hv_clock;
unsigned int hw_tsc_khz;
struct gfn_to_hva_cache pv_time;
+   struct gfn_to_hva_cache pv_timekeeper_g2h;
+   struct pvclock_timekeeper pv_timekeeper;
bool pv_time_enabled;
+   bool pv_timekeeper_enabled;
/* set guest stopped flag in pvclock flags field */
bool pvclock_set_guest_stopped_request;
 
diff --git a/arch/x86/include/asm/pvclock-abi.h 
b/arch/x86/include/asm/pvclock-abi.h
index 1436226efe3e..2809008b9b26 100644
--- a/arch/x86/include/asm/pvclock-abi.h
+++ b/arch/x86/include/asm/pvclock-abi.h
@@ -40,6 +40,33 @@ struct pvclock_wall_clock {
u32   nsec;
 } __attribute__((__packed__));
 
+struct pvclock_read_base {
+   u64 mask;
+   u64 cycle_last;
+   u32 mult;
+   u32 shift;
+   u64 xtime_nsec;
+   u64 base;
+} __attribute__((__packed__));
+
+struct pvclock_timekeeper {
+   u64 gen;
+   u64 flags;
+   struct pvclock_read_base tkr_mono;
+   struct pvclock_read_base tkr_raw;
+   u64 xtime_sec;
+   u64 ktime_sec;
+   u64 wall_to_monotonic_sec;
+   u64 wall_to_monotonic_nsec;
+   u64 offs_real;
+   u64 offs_boot;
+   u64 offs_tai;
+   u64 raw_sec;
+   u64 tsc_offset;
+} __attribute__((__packed__));
+
+#definePVCLOCK_TIMEKEEPER_ENABLED (1 << 0)
+
 #define PVCLOCK_TSC_STABLE_BIT (1 << 0)
 #define PVCLOCK_GUEST_STOPPED  (1 << 1)
 /* PVCLOCK_COUNTS_FROM_ZERO broke ABI and can't be used anymore. */
diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 2a8e0b6b9805..3ebb1d87db3a 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -50,6 +50,7 @@
 #define MSR_KVM_STEAL_TIME  0x4b564d03
 #define MSR_KVM_PV_EOI_EN  0x4b564d04
 #define MSR_KVM_POLL_CONTROL   0x4b564d05
+#define MSR_KVM_TIMEKEEPER_EN  0x4b564d06
 
 struct kvm_steal_time {
__u64 steal;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 661e2bf38526..937f83cdda4b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -157,6 +157,8 @@ module_param(force_emulation_prefix, bool, S_IRUGO);
 int __read_mostly pi_inject_timer = -1;
 module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
 
+static atomic_t pv_timekeepers_nr;
+
 #define KVM_NR_SHARED_MSRS 16
 
 struct kvm_shared_msrs_global {
@@ -2729,6 +2731,16 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
 
break;
}
+   case MSR_KVM_TIMEKEEPER_EN:
+   if (kvm_gfn_to_hva_cache_init(vcpu->kvm,
+   &vcpu->arch.pv_timekeeper_g2h, data,
+   sizeof(struct pvclock_timekeeper)))
+   vcpu->arch.pv_timekeeper_enabled = false;
+   else {
+   vcpu->arch.pv_timekeeper_enabled = true;
+   atomic_inc(&pv_timekeepers_nr);
+   }
+   break;
case MSR_KVM_ASYNC_PF_EN:
if (kvm_pv_enable_async_pf(vcpu, data))
return 1;
@@ -7097,6 +7109,109 @@ static struct perf_guest_info_callbacks kvm_guest_cbs = 
{
 };
 
 #ifdef CONFIG_X86_64
+static DEFINE_SPINLOCK(shadow_pvtk_lock);
+static struct pvclock_timekeeper shadow_pvtk;
+
+static void
+pvclock_copy_read_base(struct pvclock_read_base *pvtkr,
+struct tk_read_base *tkr)
+{
+   pvtkr->cycle_last = tkr->cycle_last;
+   pvtkr->mult = tkr->mult;
+   pvtkr->shift = tkr->shift;
+   pvtkr->mask = tkr->mask;
+   pvtkr->xtime_nsec = tkr->xtime_nsec;
+   pvtkr->base = tkr->base;
+}
+
+static void
+kvm_copy_into_pvtk(struct kvm_vcpu *vcpu)
+{
+   struct pvclock_timekeeper *pvtk;
+   unsigned long flags;
+
+   if (!vcpu->arch.pv_timekeeper_enabled)
+   return;
+
+   pvtk = &vcpu->arch.pv_timekeeper;
+   if (pvclock_gtod_data.clock.vclock_mode == VCLOCK_TSC) {
+   pvtk->flags |= PVCLOCK_TIMEKEEPER_ENABLED;
+   spin_lock_irqsave(&shadow_pvtk_lock, flags);
+   pvtk->tkr_mono = shadow_pvtk.tkr_mono;
+   pvtk->tkr_raw = shadow_pvtk.tkr_raw;
+
+   pvtk->xtime_sec = shadow_pvtk.xtime_sec;
+   pvtk->k

[RFC v2 2/2] x86/kvmclock: Introduce kvm-hostclock clocksource.

2019-10-10 Thread Suleiman Souhlal

When kvm-hostclock is selected, and the host supports it, update our
timekeeping parameters to be the same as the host.
This lets us have our time synchronized with the host's,
even in the presence of host NTP or suspend.

Signed-off-by: Suleiman Souhlal 
---
 arch/x86/Kconfig|   9 ++
 arch/x86/include/asm/kvmclock.h |  12 +++
 arch/x86/kernel/Makefile|   2 +
 arch/x86/kernel/kvmclock.c  |   5 +-
 arch/x86/kernel/kvmhostclock.c  | 130 
 include/linux/timekeeper_internal.h |   8 ++
 kernel/time/timekeeping.c   |   2 +
 7 files changed, 167 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/kvmhostclock.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d6e1faa28c58..c5b1257ea969 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -839,6 +839,15 @@ config PARAVIRT_TIME_ACCOUNTING
 config PARAVIRT_CLOCK
bool
 
+config KVM_HOSTCLOCK
+   bool "kvmclock uses host timekeeping"
+   depends on KVM_GUEST
+   help
+ Select this option to make the guest use the same timekeeping
+ parameters as the host. This means that time will be almost
+ exactly the same between the two. Only works if the host uses "tsc"
+ clocksource.
+
 config JAILHOUSE_GUEST
bool "Jailhouse non-root cell support"
depends on X86_64 && PCI
diff --git a/arch/x86/include/asm/kvmclock.h b/arch/x86/include/asm/kvmclock.h
index eceea9299097..de1a590ff97e 100644
--- a/arch/x86/include/asm/kvmclock.h
+++ b/arch/x86/include/asm/kvmclock.h
@@ -2,6 +2,18 @@
 #ifndef _ASM_X86_KVM_CLOCK_H
 #define _ASM_X86_KVM_CLOCK_H
 
+#include 
+
 extern struct clocksource kvm_clock;
 
+unsigned long kvm_get_tsc_khz(void);
+
+#ifdef CONFIG_KVM_HOSTCLOCK
+void kvm_hostclock_init(void);
+#else
+static inline void kvm_hostclock_init(void)
+{
+}
+#endif /* KVM_HOSTCLOCK */
+
 #endif /* _ASM_X86_KVM_CLOCK_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 3578ad248bc9..bc7be935fc5e 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -17,6 +17,7 @@ CFLAGS_REMOVE_tsc.o = -pg
 CFLAGS_REMOVE_paravirt-spinlocks.o = -pg
 CFLAGS_REMOVE_pvclock.o = -pg
 CFLAGS_REMOVE_kvmclock.o = -pg
+CFLAGS_REMOVE_kvmhostclock.o = -pg
 CFLAGS_REMOVE_ftrace.o = -pg
 CFLAGS_REMOVE_early_printk.o = -pg
 CFLAGS_REMOVE_head64.o = -pg
@@ -112,6 +113,7 @@ obj-$(CONFIG_AMD_NB)+= amd_nb.o
 obj-$(CONFIG_DEBUG_NMI_SELFTEST) += nmi_selftest.o
 
 obj-$(CONFIG_KVM_GUEST)+= kvm.o kvmclock.o
+obj-$(CONFIG_KVM_HOSTCLOCK)+= kvmhostclock.o
 obj-$(CONFIG_PARAVIRT) += paravirt.o paravirt_patch.o
 obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= paravirt-spinlocks.o
 obj-$(CONFIG_PARAVIRT_CLOCK)   += pvclock.o
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 904494b924c1..4ab862de9777 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -125,7 +125,7 @@ static inline void kvm_sched_clock_init(bool stable)
  * poll of guests can be running and trouble each other. So we preset
  * lpj here
  */
-static unsigned long kvm_get_tsc_khz(void)
+unsigned long kvm_get_tsc_khz(void)
 {
setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
return pvclock_tsc_khz(this_cpu_pvti());
@@ -366,5 +366,8 @@ void __init kvmclock_init(void)
kvm_clock.rating = 299;
 
clocksource_register_hz(&kvm_clock, NSEC_PER_SEC);
+
+   kvm_hostclock_init();
+
pv_info.name = "KVM";
 }
diff --git a/arch/x86/kernel/kvmhostclock.c b/arch/x86/kernel/kvmhostclock.c
new file mode 100644
index ..9971343c2bed
--- /dev/null
+++ b/arch/x86/kernel/kvmhostclock.c
@@ -0,0 +1,130 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * KVM clocksource that uses host timekeeping.
+ * Copyright (c) 2019 Suleiman Souhlal, Google LLC
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct pvclock_timekeeper pv_timekeeper;
+
+static bool pv_timekeeper_enabled;
+static bool pv_timekeeper_present;
+static int old_vclock_mode;
+
+static u64
+kvm_hostclock_get_cycles(struct clocksource *cs)
+{
+   return rdtsc_ordered();
+}
+
+static int
+kvm_hostclock_enable(struct clocksource *cs)
+{
+   pv_timekeeper_enabled = 1;
+
+   old_vclock_mode = kvm_clock.archdata.vclock_mode;
+   kvm_clock.archdata.vclock_mode = VCLOCK_TSC;
+   return 0;
+}
+
+static void
+kvm_hostclock_disable(struct clocksource *cs)
+{
+   pv_timekeeper_enabled = 0;
+   kvm_clock.archdata.vclock_mode = old_vclock_mode;
+}
+
+struct clocksource kvm_hostclock = {
+   .name = "kvm-hostclock",
+   .read = kvm_hostclock_get_cycles,
+   .enable = kvm_hostclock_enable,
+   .disable = kvm_hostclock_disable,
+   .rating = 401, /* Higher than kvm-clock */
+   .mask = CLOCKSOURCE_MASK(64),
+   .flags = CLOCK_SOURCE_IS_CONTINUOUS,
+};
+
+static void
+pvclock_copy_into_read_base(struct pvclock_timeke

Re: [PATCH 0/3] eldie generated code for folded p4d/pud

2019-10-10 Thread Peter Zijlstra

On Wed, Oct 09, 2019 at 03:26:55PM -0700, Vineet Gupta wrote:
> Hi,
> 
> This series elides extraneous generate code for folded p4d/pud.
> This came up when trying to remove __ARCH_USE_5LEVEL_HACK from ARC port.
> The code saving are not a while lot, but still worthwhile IMHO.
> 
> bloat-o-meter2 vmlinux-A-baseline vmlinux-E-elide-p?d_clear_bad
> add/remove: 0/2 grow/shrink: 0/1 up/down: 0/-146 (-146)
> function old new   delta
> p4d_clear_bad  2   -  -2
> pud_clear_bad 20   - -20
> free_pgd_range   546 422-124
> Total: Before=4137148, After=4137002, chg -1.00%
> 

Works for me, thanks!

Re: [PATCH v2 09/22] mtd: spi-nor: Fix retlen handling in sst_write()

2019-10-10 Thread Boris Brezillon

On Tue, 24 Sep 2019 07:46:21 +
 wrote:

> From: Tudor Ambarus 
> 
> In case the write of the first byte failed, retlen was incorrectly
> incremented to *retlen += actual; on the exit path. retlen should be
> incremented when actual data was written to the flash.
> 
> Rename 'sst_write_err' label to 'out' as it is no longer generic for
> all the write errors in the sst_write() method, and may introduce
> confusion.

Renaming the label is indeed a good thing, but should be done in a
separate patch.

> 
> Signed-off-by: Tudor Ambarus 
> ---
>  drivers/mtd/spi-nor/spi-nor.c | 22 +++---
>  1 file changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/mtd/spi-nor/spi-nor.c b/drivers/mtd/spi-nor/spi-nor.c
> index 0aee068a5835..be5dee622d51 100644
> --- a/drivers/mtd/spi-nor/spi-nor.c
> +++ b/drivers/mtd/spi-nor/spi-nor.c
> @@ -2665,12 +2665,12 @@ static int sst_write(struct mtd_info *mtd, loff_t to, 
> size_t len,
>   /* write one byte. */
>   ret = spi_nor_write_data(nor, to, 1, buf);
>   if (ret < 0)
> - goto sst_write_err;
> + goto unlock_and_unprep;
>   WARN(ret != 1, "While writing 1 byte written %i bytes\n",
>(int)ret);
>   ret = spi_nor_wait_till_ready(nor);
>   if (ret)
> - goto sst_write_err;
> + goto unlock_and_unprep;
>   }
>   to += actual;

Not sure we need this new label, we can just have:

actual = 0;
/* Start write from odd address. */
if (to % 2) {
nor->program_opcode = SPINOR_OP_BP;

/* write one byte. */
ret = spi_nor_write_data(nor, to, 1, buf);
if (ret < 0)
goto out;
WARN(ret != 1, "While writing 1 byte written %i
bytes\n", (int)ret);
ret = spi_nor_wait_till_ready(nor);
if (ret)
goto out;

to++;
actual++;
}

>  
> @@ -2681,12 +2681,12 @@ static int sst_write(struct mtd_info *mtd, loff_t to, 
> size_t len,
>   /* write two bytes. */
>   ret = spi_nor_write_data(nor, to, 2, buf + actual);
>   if (ret < 0)
> - goto sst_write_err;
> + goto out;
>   WARN(ret != 2, "While writing 2 bytes written %i bytes\n",
>(int)ret);
>   ret = spi_nor_wait_till_ready(nor);
>   if (ret)
> - goto sst_write_err;
> + goto out;
>   to += 2;
>   nor->sst_write_second = true;
>   }
> @@ -2694,35 +2694,35 @@ static int sst_write(struct mtd_info *mtd, loff_t to, 
> size_t len,
>  
>   ret = spi_nor_write_disable(nor);
>   if (ret)
> - goto sst_write_err;
> + goto out;
>  
>   ret = spi_nor_wait_till_ready(nor);
>   if (ret)
> - goto sst_write_err;
> + goto out;
>  
>   /* Write out trailing byte if it exists. */
>   if (actual != len) {
>   ret = spi_nor_write_enable(nor);
>   if (ret)
> - goto sst_write_err;
> + goto out;
>  
>   nor->program_opcode = SPINOR_OP_BP;
>   ret = spi_nor_write_data(nor, to, 1, buf + actual);
>   if (ret < 0)
> - goto sst_write_err;
> + goto out;
>   WARN(ret != 1, "While writing 1 byte written %i bytes\n",
>(int)ret);
>   ret = spi_nor_wait_till_ready(nor);
>   if (ret)
> - goto sst_write_err;
> + goto out;
>  
>   ret = spi_nor_write_disable(nor);
>   if (ret)
> - goto sst_write_err;
> + goto out;
>  
>   actual += 1;
>   }
> -sst_write_err:
> +out:
>   *retlen += actual;
>  unlock_and_unprep:
>   spi_nor_unlock_and_unprep(nor, SPI_NOR_OPS_WRITE);

Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure()

2019-10-10 Thread Michal Hocko

On Thu 10-10-19 09:27:32, David Hildenbrand wrote:
> On 09.10.19 16:43, Michal Hocko wrote:
> > On Wed 09-10-19 16:24:35, David Hildenbrand wrote:
> >> We should check for pfn_to_online_page() to not access uninitialized
> >> memmaps. Reshuffle the code so we don't have to duplicate the error
> >> message.
> >>
> >> Cc: Naoya Horiguchi 
> >> Cc: Andrew Morton 
> >> Cc: Michal Hocko 
> >> Signed-off-by: David Hildenbrand 
> >> ---
> >>  mm/memory-failure.c | 14 --
> >>  1 file changed, 8 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >> index 7ef849da8278..e866e6e5660b 100644
> >> --- a/mm/memory-failure.c
> >> +++ b/mm/memory-failure.c
> >> @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags)
> >>if (!sysctl_memory_failure_recovery)
> >>panic("Memory failure on page %lx", pfn);
> >>  
> >> -  if (!pfn_valid(pfn)) {
> >> +  p = pfn_to_online_page(pfn);
> >> +  if (!p) {
> >> +  if (pfn_valid(pfn)) {
> >> +  pgmap = get_dev_pagemap(pfn, NULL);
> >> +  if (pgmap)
> >> +  return memory_failure_dev_pagemap(pfn, flags,
> >> +pgmap);
> >> +  }
> >>pr_err("Memory failure: %#lx: memory outside kernel control\n",
> >>pfn);
> >>return -ENXIO;
> > 
> > Don't we need that earlier at hwpoison_inject level?
> > 
> 
> Theoretically yes, this is another instance. But pfn_to_online_page(pfn)
> alone would not be sufficient as discussed. We would, again, have to
> special-case ZONE_DEVICE via things like get_dev_pagemap() ...
> 
> But mm/hwpoison-inject.c:hwpoison_inject() is a pure debug feature either way:
> 
>   /*
>* Note that the below poison/unpoison interfaces do not involve
>* hardware status change, hence do not require hardware support.
>* They are mainly for testing hwpoison in software level.
>*/
> 
> So it's not that bad compared to memory_failure() called from real HW or
> drivers/base/memory.c:soft_offline_page_store()/hard_offline_page_store()

Yes, this is just a toy. And yes we need to handle zone device pages
here because a) people likely want to test MCE behavior even on these
pages and b) HW can really trigger MCEs there as well. I was just
pointing that the patch is likely incomplete.

-- 
Michal Hocko
SUSE Labs

Re: [PATCH AUTOSEL 5.3 28/68] KVM: x86: Expose XSAVEERPTR to the guest

2019-10-10 Thread Sebastian Andrzej Siewior

On 2019-10-10 00:50:09 [+0200], Paolo Bonzini wrote:
> > Also, taking advantage of this feature requires changes which just
> > landed in qemu's master branch.
> 
> That's not a big deal, every QEMU supports every kernel and vice versa.

That is correct. My point was that the visibility of this change on
user's side is very limited.

> Paolo

Sebastian

Re: Potential NULL pointer deference in mm/memcontrol.c

2019-10-10 Thread Michal Hocko

On Wed 09-10-19 21:56:04, Yizhuo Zhai wrote:
> Hi All:
> mm/memcontrol.c:
> The function mem_cgroup_from_css() could return NULL, but some callers

This is the case only when the memory cgroup controller is disabled
which is a boot time option.

> in this file
> checks the return value but directly dereference it, which seems
> potentially unsafe.
> Such callers include mem_cgroup_hierarchy_read(),
> mem_cgroup_hierarchy_write(), mem_cgroup_read_u64(),
> mem_cgroup_reset(), etc.

And none of those should be ever called under that condition AFAICS.

Thanks!
-- 
Michal Hocko
SUSE Labs

Re: [PATCH] ftrace/module: Allow ftrace to make only loaded module text read-write

2019-10-10 Thread Peter Zijlstra

On Wed, Oct 09, 2019 at 10:36:38PM -0400, Steven Rostedt wrote:
> From: Steven Rostedt (VMware) 
> 
> In the process of using text_poke_bp() for ftrace on x86, when
> performing the following action:
> 
>  # rmmod snd_hda_codec_hdmi
>  # echo function > /sys/kernel/tracing/current_tracer
>  # modprobe snd_hda_codec_hdmi
> 
> It triggered this:
> 
>  BUG: unable to handle page fault for address: a03d
>  #PF: supervisor write access in kernel mode
>  #PF: error_code(0x0003) - permissions violation
>  PGD 2a12067 P4D 2a12067 PUD 2a13063 PMD c42bc067 PTE c58a0061
>  Oops: 0003 [#1] PREEMPT SMP KASAN PTI
>  CPU: 1 PID: 1182 Comm: modprobe Not tainted 5.4.0-rc2-test+ #50
>  Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 
> 07/14/2016
>  RIP: 0010:memcpy_erms+0x6/0x10
>  Code: 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 
> 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1  a4 c3 0f 1f 80 
> 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe
>  RSP: 0018:8880a10479e0 EFLAGS: 00010246
>  RAX: a03d RBX: a03d RCX: 0005
>  RDX: 0005 RSI: 8363e160 RDI: a03d
>  RBP: 88807e9ec000 R08: fbfff407a001 R09: fbfff407a001
>  R10: fbfff407a000 R11: a03d0004 R12: 8221f160
>  R13: a03d R14: 88807e9ec000 R15: a0481640
>  FS:  7eff92e28280() GS:8880d484() knlGS:
>  CS:  0010 DS:  ES:  CR0: 80050033
>  CR2: a03d CR3: a1048001 CR4: 001606e0
>  Call Trace:
>   ftrace_make_call+0x76/0x90
>   ftrace_module_enable+0x493/0x4f0
>   load_module+0x3a31/0x3e10
>   ? ring_buffer_read+0x70/0x70
>   ? module_frob_arch_sections+0x20/0x20
>   ? rb_commit+0xee/0x600
>   ? tracing_generic_entry_update+0xe1/0xf0
>   ? ring_buffer_unlock_commit+0xfb/0x220
>   ? 0xa061
>   ? __do_sys_finit_module+0x11a/0x1b0
>   __do_sys_finit_module+0x11a/0x1b0
>   ? __ia32_sys_init_module+0x40/0x40
>   ? ring_buffer_unlock_commit+0xfb/0x220
>   ? function_trace_call+0x179/0x260
>   ? __do_sys_finit_module+0x1b0/0x1b0
>   ? __do_sys_finit_module+0x1b0/0x1b0
>   ? do_syscall_64+0x58/0x1a0
>   do_syscall_64+0x68/0x1a0
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>  RIP: 0033:0x7eff92f42efd
> 
> The reason is that ftrace_module_enable() is called after the module
> has set its text to read-only. There's subtle reasons that this needs
> to be called afterward, and we need to continue to do so.

Please explain.

Re: [PATCH 10/16] x86/cpu: Detect VMX features on Intel, Centaur and Zhaoxin CPUs

2019-10-10 Thread Tony W Wang-oc

On Sat, Oct 5, 2019, Sean Christopherson wrote:
>Add an entry in struct cpuinfo_x86 to track VMX capabilities and fill
>the capabilities during IA32_FEATURE_CONTROL MSR initialization.
>
>Make the VMX capabilities dependent on X86_INTEL_FEATURE_CONTROL and
>X86_FEATURE_NAMES so as to avoid unnecessary overhead on CPUs that
>can't
>possibly support VMX, or when /proc/cpuinfo is not available.
>
>Signed-off-by: Sean Christopherson 
>---
> arch/x86/Kconfig.cpu  |  4 ++
> arch/x86/include/asm/processor.h  |  3 ++
> arch/x86/include/asm/vmxfeatures.h|  5 +++
> arch/x86/kernel/cpu/common.c  |  3 ++
> arch/x86/kernel/cpu/feature_control.c | 59
>+++
> 5 files changed, 74 insertions(+)
>
>diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
>index e78f39adae7b..e7571bd0f515 100644
>--- a/arch/x86/Kconfig.cpu
>+++ b/arch/x86/Kconfig.cpu
>@@ -391,6 +391,10 @@ config X86_FEATURE_CONTROL_MSR
>   def_bool y
>   depends on CPU_SUP_INTEL || CPU_SUP_CENTAUR ||
>CPU_SUP_ZHAOXIN
>
>+config X86_VMX_FEATURE_NAMES
>+  def_bool y
>+  depends on X86_FEATURE_CONTROL_MSR &&
>X86_FEATURE_NAMES
>+
> menuconfig PROCESSOR_SELECT
>   bool "Supported processor vendors" if EXPERT
>   ---help---
>diff --git a/arch/x86/include/asm/processor.h
>b/arch/x86/include/asm/processor.h
>index 4c3f41d7be5f..3b5dc9b1e7c4 100644
>--- a/arch/x86/include/asm/processor.h
>+++ b/arch/x86/include/asm/processor.h
>@@ -84,6 +84,9 @@ struct cpuinfo_x86 {
> #ifdef CONFIG_X86_64
>   /* Number of 4K pages in DTLB/ITLB combined(in pages): */
>   int x86_tlbsize;
>+#endif
>+#ifdef CONFIG_X86_VMX_FEATURE_NAMES
>+  __u32   vmx_capability[NVMXINTS];
> #endif
>   __u8x86_virt_bits;
>   __u8x86_phys_bits;
>diff --git a/arch/x86/include/asm/vmxfeatures.h
>b/arch/x86/include/asm/vmxfeatures.h
>index ab82e3643d0c..d33ea1c165fd 100644
>--- a/arch/x86/include/asm/vmxfeatures.h
>+++ b/arch/x86/include/asm/vmxfeatures.h
>@@ -2,6 +2,11 @@
> #ifndef _ASM_X86_VMXFEATURES_H
> #define _ASM_X86_VMXFEATURES_H
>
>+/*
>+ * Defines VMX CPU feature bits
>+ */
>+#define NVMXINTS  3 /* N 32-bit words worth of info */
>+
> /*
>  * Note: If the comment begins with a quoted string, that string is used
>  * in /proc/cpuinfo instead of the macro name.  If the string is "",
>diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
>index 9ae7d1bcd4f4..33537556dac6 100644
>--- a/arch/x86/kernel/cpu/common.c
>+++ b/arch/x86/kernel/cpu/common.c
>@@ -1421,6 +1421,9 @@ static void identify_cpu(struct cpuinfo_x86 *c)
> #endif
>   c->x86_cache_alignment = c->x86_clflush_size;
>   memset(&c->x86_capability, 0, sizeof(c->x86_capability));
>+#ifdef CONFIG_X86_VMX_FEATURE_NAMES
>+  memset(&c->vmx_capability, 0, sizeof(c->vmx_capability));
>+#endif
>
>   generic_identify(c);
>
>diff --git a/arch/x86/kernel/cpu/feature_control.c
>b/arch/x86/kernel/cpu/feature_control.c
>index 74c76159a046..43eb65e8cd18 100644
>--- a/arch/x86/kernel/cpu/feature_control.c
>+++ b/arch/x86/kernel/cpu/feature_control.c
>@@ -4,6 +4,61 @@
> #include 
> #include 
> #include 
>+#include 
>+
>+#ifdef CONFIG_X86_VMX_FEATURE_NAMES
>+enum vmx_feature_leafs {
>+  MISC_FEATURES = 0,
>+  PRIMARY_PROC_CONTROLS,
>+  SECONDARY_PROC_CONTROLS,
>+  NR_VMX_FEATURE_WORDS,
>+};
>+
>+#define EPT_BIT(x) BIT(VMX_FEATURE_##x & 0x1f)
>+
>+static void init_vmx_capabilities(struct cpuinfo_x86 *c)
>+{
>+  u32 supported, funcs, ept, vpid, ign;
>+
>+  BUILD_BUG_ON(NVMXINTS != NR_VMX_FEATURE_WORDS);
>+
>+  /*
>+   * The high bits contain the allowed-1 settings, i.e. features that can
>+   * be turned on.  The low bits contain the allowed-0 settings, i.e.
>+   * features that can be turned off.  Ignore the allowed-0 settings,
>+   * if a feature can be turned on then it's supported.
>+   */
>+  rdmsr(MSR_IA32_VMX_PINBASED_CTLS, ign, supported);
>+  rdmsr_safe(MSR_IA32_VMX_VMFUNC, &ign, &funcs);
>+
>+  /*
>+   * Except for EPT+VPID, which enumerates support for both in a single
>+   * MSR, low for EPT, high for VPID.
>+   */
>+  rdmsr_safe(MSR_IA32_VMX_EPT_VPID_CAP, &ept, &vpid);
>+
>+  /* Pin, EPT, VPID and VM-Func are merged into a single word. */
>+  WARN_ON_ONCE(supported >> 16);
>+  WARN_ON_ONCE(funcs >> 4);
>+  c->vmx_capability[MISC_FEATURES] = (supported & 0x) |
>+ ((vpid & 0x1) << 24) |
>+ ((funcs & 0xf) << 28);
>+
>+  /* EPT bits are scattered and must be manually handled. */
>+  if (ept & VMX_EPT_EXECUTE_ONLY_BIT)
>+  c->vmx_capability[MISC_FEATURES] |=
>EPT_BIT(EPT_EXECUTE_ONLY);
>+  if (ept & VMX_EPT_1GB_PAGE_BIT)

Typo? Should be: if (ept & VMX_EPT_AD_BIT)

TonyWWang-oc

>+  c->vmx_capability[MISC_FEATURES] |= E

Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure()

2019-10-10 Thread David Hildenbrand

On 09.10.19 16:43, Michal Hocko wrote:
> On Wed 09-10-19 16:24:35, David Hildenbrand wrote:
>> We should check for pfn_to_online_page() to not access uninitialized
>> memmaps. Reshuffle the code so we don't have to duplicate the error
>> message.
>>
>> Cc: Naoya Horiguchi 
>> Cc: Andrew Morton 
>> Cc: Michal Hocko 
>> Signed-off-by: David Hildenbrand 
>> ---
>>  mm/memory-failure.c | 14 --
>>  1 file changed, 8 insertions(+), 6 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 7ef849da8278..e866e6e5660b 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags)
>>  if (!sysctl_memory_failure_recovery)
>>  panic("Memory failure on page %lx", pfn);
>>  
>> -if (!pfn_valid(pfn)) {
>> +p = pfn_to_online_page(pfn);
>> +if (!p) {
>> +if (pfn_valid(pfn)) {
>> +pgmap = get_dev_pagemap(pfn, NULL);
>> +if (pgmap)
>> +return memory_failure_dev_pagemap(pfn, flags,
>> +  pgmap);
>> +}
>>  pr_err("Memory failure: %#lx: memory outside kernel control\n",
>>  pfn);
>>  return -ENXIO;
> 
> Don't we need that earlier at hwpoison_inject level?
> 

Theoretically yes, this is another instance. But pfn_to_online_page(pfn)
alone would not be sufficient as discussed. We would, again, have to
special-case ZONE_DEVICE via things like get_dev_pagemap() ...

But mm/hwpoison-inject.c:hwpoison_inject() is a pure debug feature either way:

/*
 * Note that the below poison/unpoison interfaces do not involve
 * hardware status change, hence do not require hardware support.
 * They are mainly for testing hwpoison in software level.
 */

So it's not that bad compared to memory_failure() called from real HW or
drivers/base/memory.c:soft_offline_page_store()/hard_offline_page_store()

-- 

Thanks,

David / dhildenb

Re: [PATCH] regulator: core: Skip balancing of the enabled regulators in regulator_enable()

2019-10-10 Thread Viresh Kumar

On 09-10-19, 15:13, Mark Brown wrote:
> On Wed, Oct 09, 2019 at 12:29:00PM +0200, Marek Szyprowski wrote:
> 
> > Okay, then what is the conclusion, as I got lost a bit? How do you want 
> > this issue to be fixed?
> 
> We should revert the enable call, it shouldn't be required, and ideally
> the default balancer could be updated to only make configuration changes
> if they're actually required which would help avoid triggering any such
> things in future if we don't absolutely have to.

Sorry for the delay in responding, just came back after vacations.

Should the OPP change be reverted ? Someone going to send that revert to me with
the required explanation ?

-- 
viresh

Re: [PATCH v2 for 5.4 2/4] media: hantro: Fix H264 max frmsize supported on RK3288

2019-10-10 Thread Tomasz Figa

On Tue, Oct 8, 2019 at 11:12 PM Jonas Karlman  wrote:
>
> On 2019-10-08 15:53, Tomasz Figa wrote:
> > On Tue, Oct 8, 2019 at 10:35 PM Tomasz Figa  wrote:
> >> On Tue, Oct 8, 2019 at 7:42 PM Tomasz Figa  wrote:
> >>> On Tue, Oct 8, 2019 at 3:31 PM Jonas Karlman  wrote:
>  On 2019-10-08 07:27, Tomasz Figa wrote:
> > Hi Ezequiel, Jonas,
> >
> > On Tue, Oct 8, 2019 at 2:46 AM Ezequiel Garcia  
> > wrote:
> >> From: Jonas Karlman 
> >>
> >> TRM specify supported image size 48x48 to 4096x2304 at step size 16 
> >> pixels,
> >> change frmsize max_width/max_height to match TRM.
> >>
> >> Fixes: 760327930e10 ("media: hantro: Enable H264 decoding on rk3288")
> >> Signed-off-by: Jonas Karlman 
> >> ---
> >> v2:
> >> * No changes.
> >>
> >>  drivers/staging/media/hantro/rk3288_vpu_hw.c | 4 ++--
> >>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/staging/media/hantro/rk3288_vpu_hw.c 
> >> b/drivers/staging/media/hantro/rk3288_vpu_hw.c
> >> index 6bfcc47d1e58..ebb017b8a334 100644
> >> --- a/drivers/staging/media/hantro/rk3288_vpu_hw.c
> >> +++ b/drivers/staging/media/hantro/rk3288_vpu_hw.c
> >> @@ -67,10 +67,10 @@ static const struct hantro_fmt 
> >> rk3288_vpu_dec_fmts[] = {
> >> .max_depth = 2,
> >> .frmsize = {
> >> .min_width = 48,
> >> -   .max_width = 3840,
> >> +   .max_width = 4096,
> >> .step_width = H264_MB_DIM,
> >> .min_height = 48,
> >> -   .max_height = 2160,
> >> +   .max_height = 2304,
> > This doesn't match the datasheet I have, which is RK3288 Datasheet Rev
> > 1.4 and which has the values as in current code. What's the one you
> > got the values from?
>  The RK3288 TRM vcodec chapter from [1], unknown revision and date, lists 
>  48x48 to 4096x2304 step size 16 pixels under 25.5.1 H.264 decoder.
> 
>  I can also confirm that one of my test samples (PUPPIES BATH IN 4K) is 
>  4096x2304 and can be decoded after this patch.
>  However the decoding speed is not optimal at 400Mhz, if I recall 
>  correctly you need to set the VPU1 clock to 600Mhz for 4K decoding on 
>  RK3288.
> 
>  I am not sure if I should include a v2 of this patch in my v2 series, 
>  as-is this patch do not apply on master (H264_MB_DIM has changed to 
>  MB_DIM in master).
> 
>  [1] 
>  http://www.t-firefly.com/download/firefly-rk3288/docs/TRM/rk3288-chapter-25-video-encoder-decoder-unit-(vcodec).pdf
> >>> I checked the RK3288 TRM V1.1 too and it refers to 3840x2160@24fps as
> >>> the maximum.
> >>>
> >>> As for performance, we've actually been getting around 33 fps at 400
> >>> MHz with 3840x2160 on our devices (the old RK3288 Asus Chromebook
> >>> Flip).
> >>>
> >>> I guess we might want to check that with Hantro.
> >> Could you check the value of bits 10:0 in register at 0x0c8? That
> >> should be the maximum supported stream width in the units of 16
> >> pixels.
> > Correction: The unit is 1 pixel and there are additional 2 most
> > significant bits at 0x0d8, 15:14.
>
> I will check this later tonight when I have access to my devices.
> The PUPPIES BATH IN 4K (4096x2304) sample decoded without issue using 
> rockchip 4.4 BSP kernel and mpp last time I tested.
>
> The vcodec driver in 4.4 BSP kernel use 300/400 Mhz as default clock rate and 
> will change to 600 Mhz when width is over 2560, see [1]:
>   raise frequency for resolution larger than 1440p avc
>
> [1] 
> https://github.com/rockchip-linux/kernel/blob/develop-4.4/drivers/video/rockchip/vcodec/vcodec_service.c#L2551-L2570

How comes it works for us well at 400 MHz? Better DRAM? Differences in
how Vcodec BSP handles the hardware that somehow make the decoding
slower?

Best regards,
Tomasz

Re: [PATCH] extcon: sm5502: Clear pending interrupts during initialization

2019-10-10 Thread Chanwoo Choi

On 19. 10. 8. 오후 7:54, Stephan Gerhold wrote:
> On some devices (e.g. Samsung Galaxy A5 (2015)), the bootloader
> seems to keep interrupts enabled for SM5502 when booting Linux.
> Changing the cable state (i.e. plugging in a cable) - until the driver
> is loaded - will therefore produce an interrupt that is never read.
> 
> In this situation, the cable state will be stuck forever on the
> initial state because SM5502 stops sending interrupts.
> This can be avoided by clearing those pending interrupts after
> the driver has been loaded.
> 
> Reading the interrupt status registers twice seems to be sufficient
> to make interrupts work in this situation.
> 
> Signed-off-by: Stephan Gerhold 
> ---
> This makes interrupts work on the Samsung Galaxy A5 (2015), which
> has recently gained mainline support [1].
> 
> I was not able to find a datasheet for SM5502, so this patch is
> merely based on testing and comparison with the downstream driver [2].
> 
> [1]: 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1329c1ab0730b521e6cd3051c56a2ff3d55f21e6
> [2]: 
> https://protect2.fireeye.com/url?k=029d0042-5ffa4464-029c8b0d-0cc47a31384a-14ac0bce09798d1f&u=https://github.com/msm8916-mainline/android_kernel_qcom_msm8916/blob/SM-A500FU/drivers/misc/sm5502.c#L1566-L1578
> ---
>  drivers/extcon/extcon-sm5502.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/extcon/extcon-sm5502.c b/drivers/extcon/extcon-sm5502.c
> index dc43847ad2b0..c897f1aa4bf5 100644
> --- a/drivers/extcon/extcon-sm5502.c
> +++ b/drivers/extcon/extcon-sm5502.c
> @@ -539,6 +539,12 @@ static void sm5502_init_dev_type(struct sm5502_muic_info 
> *info)
>   val = info->reg_data[i].val;
>   regmap_write(info->regmap, info->reg_data[i].reg, val);
>   }
> +
> + /* Clear pending interrupts */
> + regmap_read(info->regmap, SM5502_REG_INT1, ®_data);
> + regmap_read(info->regmap, SM5502_REG_INT2, ®_data);
> + regmap_read(info->regmap, SM5502_REG_INT1, ®_data);
> + regmap_read(info->regmap, SM5502_REG_INT2, ®_data);

It is not proper. Instead, initialize the SM5502_RET_INT1/2 with zero.

The reset value of SM5502_RET_INT1/2 are zero (0x00) as following:
If you can test it with h/w, please try to test it and then
send the modified patch. 

diff --git a/drivers/extcon/extcon-sm5502.c b/drivers/extcon/extcon-sm5502.c
index c897f1aa4bf5..e168f77a18ba 100644
--- a/drivers/extcon/extcon-sm5502.c
+++ b/drivers/extcon/extcon-sm5502.c
@@ -68,6 +68,14 @@ static struct reg_data sm5502_reg_data[] = {
.reg = SM5502_REG_CONTROL,
.val = SM5502_REG_CONTROL_MASK_INT_MASK,
.invert = false,
+   }, {
+   .reg = SM5502_REG_INT1,
+   .val = SM5502_RET_INT1_MASK,
+   .invert = true,
+   }, {
+   .reg = SM5502_REG_INT2,
+   .val = SM5502_RET_INT1_MASK,
+   .invert = true,
}, {
.reg = SM5502_REG_INTMASK1,
.val = SM5502_REG_INTM1_KP_MASK
diff --git a/drivers/extcon/extcon-sm5502.h b/drivers/extcon/extcon-sm5502.h
index 9dbb634d213b..5c4edb3e7fce 100644
--- a/drivers/extcon/extcon-sm5502.h
+++ b/drivers/extcon/extcon-sm5502.h
@@ -93,6 +93,8 @@ enum sm5502_reg {
 #define SM5502_REG_CONTROL_RAW_DATA_MASK   (0x1 << 
SM5502_REG_CONTROL_RAW_DATA_SHIFT)
 #define SM5502_REG_CONTROL_SW_OPEN_MASK(0x1 << 
SM5502_REG_CONTROL_SW_OPEN_SHIFT)
 
+#define SM5502_RET_INT1_MASK   (0xff)
+
 #define SM5502_REG_INTM1_ATTACH_SHIFT  0
 #define SM5502_REG_INTM1_DETACH_SHIFT  1
 #define SM5502_REG_INTM1_KP_SHIFT  2

>  }
>  
>  static int sm5022_muic_i2c_probe(struct i2c_client *i2c,
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

Re: string.h: Mark 34 functions with __must_check

2019-10-10 Thread Markus Elfring

>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/test_kasan.c?id=b92a953cb7f727c42a15ac2ea59bf3cf9c39370d#n595
>
> The *test* word must have given you a clue that the code you a looking
> at is not an ordinary one.

The proposed extension of function annotations can be tested also together
with this source file example, can't it?
Other system configuration variations might become more interesting
for further software components.

Regards,
Markus

[PATCH] mm/resource: Move child to new resource when release mem region.

2019-10-10 Thread lantianyu1986

From: Tianyu Lan 

When release mem region, old mem region may be splited to
two regions. Current allocate new struct resource for high
end mem region but not move child resources whose ranges are
in the high end range to new resource. When adjust old mem
region's range, adjust_resource() detects child region's range
is out of new range and return error. Move child resources to
high end resource before adjusting old mem range.

Signed-off-by: Tianyu Lan 
---
This patch is to prepare for memory hot-remove function
in Hyper-V balloon driver.
---
 kernel/resource.c | 38 ++
 1 file changed, 34 insertions(+), 4 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 158f04ec1d4f..7856347adfd2 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -181,6 +181,38 @@ static struct resource *alloc_resource(gfp_t flags)
return res;
 }
 
+static void move_child_to_newresource(struct resource *old,
+ struct resource *new)
+{
+   struct resource *tmp, **p, **np;
+
+   if (!old->child)
+   return;
+
+   p = &old->child;
+   np = &new->child;
+
+   for (;;) {
+   tmp = *p;
+   if (!tmp)
+   break;
+
+   if (tmp->start >= new->start && tmp->end <= new->end) {
+   tmp->parent = new;
+   *np = tmp;
+   np = &tmp->sibling;
+   *p = tmp->sibling;
+
+   if (!tmp->sibling)
+   *np = NULL;
+   continue;
+   }
+
+   p = &tmp->sibling;
+   }
+}
+
 /* Return the conflict entry if you can't request it */
 static struct resource * __request_resource(struct resource *root, struct 
resource *new)
 {
@@ -1231,9 +1263,6 @@ EXPORT_SYMBOL(__release_region);
  * Note:
  * - Additional release conditions, such as overlapping region, can be
  *   supported after they are confirmed as valid cases.
- * - When a busy memory resource gets split into two entries, the code
- *   assumes that all children remain in the lower address entry for
- *   simplicity.  Enhance this logic when necessary.
  */
 int release_mem_region_adjustable(struct resource *parent,
  resource_size_t start, resource_size_t size)
@@ -1316,11 +1345,12 @@ int release_mem_region_adjustable(struct resource 
*parent,
new_res->sibling = res->sibling;
new_res->child = NULL;
 
+   move_child_to_newresource(res, new_res);
+   res->sibling = new_res;
ret = __adjust_resource(res, res->start,
start - res->start);
if (ret)
break;
-   res->sibling = new_res;
new_res = NULL;
}
 
-- 
2.14.5

Re: [RFC PATCH 0/7] xfs: add reflink & dedupe support for fsdax.

2019-10-10 Thread Dave Chinner

On Wed, Oct 09, 2019 at 10:11:52AM -0700, Darrick J. Wong wrote:
> On Tue, Oct 08, 2019 at 11:31:44PM -0700, Christoph Hellwig wrote:
> > Btw, I just had a chat with Dan last week on this.  And he pointed out
> > that while this series deals with the read/write path issues of 
> > reflink on DAX it doesn't deal with the mmap side issue that
> > page->mapping and page->index can point back to exactly one file.
> > 
> > I think we want a few xfstests that reflink a file and then use the
> > different links using mmap, as that should blow up pretty reliably.
> 
> Hmm, you're right, we don't actually have a test that checks the
> behavior of mwriting all copies of a shared block.  Ok, I'll go write
> one.

I've pointed this problem out to everyone who has asked me "what do
we need to do to support reflink on DAX". I've even walked a couple
of people right through the problem that needs to be solved and
discussed the potential solutions to it.

Problems that I think need addressing:

- device dax and filesystem dax have fundamentally different
  needs in this space, so they need to be separated and not
  try to use the same solution.
- dax_lock_entry() being used as a substitute for
  page_lock() but it not being held on the page itself means
  it can't be extended to serialise access to the page
  across multiple mappings that are unaware of each other
- dax_lock_page/dax_unlock_page interface for hardware
  memory errors needs to report to the
  filesystem for processing and repair, not assume the page
  is user data and killing processes is the only possible
  recovery mechanism.
- dax_associate_entry/dax_disassociate_entry can only work
  for a 1:1 page:mapping,index relationship. It needs to go
  away and be replaced by a mechanism that allows
  tracking multiple page mapping/index/state tuples. This
  has much wider use than DAX (e.g. sharing page cache pages
  between reflinked files)

I've proposed shadow pages (based on a concept from Matethw Wilcox)
for each read-only reflink mapping with the real physical page being
owned by the filesystem and indexed by LBA in the filesystem buffer
cache. This would be based on whether the extent in the file the
page is mapped from has multiple references to it.

i.e. When a new page mapping occurs in a shared extent, we add the
page to the buffer cache (i.e. point a struct xfs_buf at it)i if it
isn't already present, then allocate a shadow page, point it at the
master, set it up with the new mapping,index tuple and add it to the
mapping tree. Then we can treat it as a unique page even though it
points to the read-only master page.

When the page get's COWed, we toss away the shadow page and the
master can be reclaimed with the reference count goes to zero or the
extent is no longer shared.  Think of it kind of like the way we
multiply reference the zero page for holes in mmap()d dax regions,
except we can have millions of them and they are found by physical
buffer cache index lookups. 

This works for both DAX and non-DAX sharing of read-only shared
filesytsem pages. i.e. it would form the basis of single-copy
read-only page cache pages for reflinked files.

There was quite a bit of talk at LSFMM 2018 about having a linked
list of mapping structures hanging off a struct page, one for each
mapping that references the page. Operations would then have to walk
all mappings that reference the page. This was useful to other
subsystems (HMM?) for some purpose I forget, but I'm not sure it's
particularly useful by itself for non-dax reflink purposes - I
suspect the filesystem would still need to track such pages itself
in it's buffer cache so it can find the cached page to link new
reflink copies to the same page...

ISTR a couple of other solutions were thrown around, but I don't
think anyone came up with a simple solution...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com

[RFC v2 0/2] kvm: Use host timekeeping in guest.

2019-10-10 Thread Suleiman Souhlal

This RFC is to try to solve the following problem:

We have some applications that are currently running in their
own namespace, that still talk to other processes on the
machine, using IPC, and expect to run on the same machine.

We want to move them into a virtual machine, for the usual
benefits of virtualization.

However, some of these programs use CLOCK_MONOTONIC and
CLOCK_BOOTTIME timestamps, as part of their protocol, when talking
to the host.

Generally speaking, we have multiple event sources, for example
sensors, input devices, display controller vsync, etc and we would
like to rely on them in the guest for various scenarios.

As a specific example, we are trying to run some wayland clients
(in the guest) who talk to the server (in the host), and the server
gives input events based on host time. Additionally, there are also
vsync events that the clients use for timing their rendering.

Another use case we have are timestamps from IIO sensors and cameras.
There are applications that need to determine how the timestamps
relate to the current time and the only way to get current time is
clock_gettime(), which would return a value from a different time
domain than the timestamps.

In this case, it is not feasible to change these programs, due to
the number of the places we would have to change.

We spent some time thinking about this, and the best solution we
could come up with was the following:

Make the guest kernel return the same CLOCK_MONOTONIC and
CLOCK_GETTIME timestamps as the host.

To do that, I am changing kvmclock to request to the host to copy
its timekeeping parameters (mult, base, cycle_last, etc), so that
the guest timekeeper can use the same values, so that time can
be synchronized between the guest and the host.

Any suggestions or feedback would be highly appreciated.

Changes in v2:
- Move out of kvmclock and into its own clocksource and file.
- Remove timekeeping.c #ifdefs.
- Fix i386 build.

Suleiman Souhlal (2):
  kvm: Mechanism to copy host timekeeping parameters into guest.
  x86/kvmclock: Introduce kvm-hostclock clocksource.

 arch/x86/Kconfig |   9 ++
 arch/x86/include/asm/kvm_host.h  |   3 +
 arch/x86/include/asm/kvmclock.h  |  12 +++
 arch/x86/include/asm/pvclock-abi.h   |  27 ++
 arch/x86/include/uapi/asm/kvm_para.h |   1 +
 arch/x86/kernel/Makefile |   2 +
 arch/x86/kernel/kvmclock.c   |   5 +-
 arch/x86/kernel/kvmhostclock.c   | 130 +++
 arch/x86/kvm/x86.c   | 121 +
 include/linux/timekeeper_internal.h  |   8 ++
 kernel/time/timekeeping.c|   2 +
 11 files changed, 319 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/kvmhostclock.c

-- 
2.23.0.581.g78d2f28ef7-goog

Re: [PATCH v2 for 5.4 2/4] media: hantro: Fix H264 max frmsize supported on RK3288

2019-10-10 Thread Tomasz Figa

On Wed, Oct 9, 2019 at 5:39 AM Jonas Karlman  wrote:
>
> On 2019-10-08 16:12, Jonas Karlman wrote:
> > On 2019-10-08 15:53, Tomasz Figa wrote:
> >> On Tue, Oct 8, 2019 at 10:35 PM Tomasz Figa  wrote:
> >>> On Tue, Oct 8, 2019 at 7:42 PM Tomasz Figa  wrote:
>  On Tue, Oct 8, 2019 at 3:31 PM Jonas Karlman  wrote:
> > On 2019-10-08 07:27, Tomasz Figa wrote:
> >> Hi Ezequiel, Jonas,
> >>
> >> On Tue, Oct 8, 2019 at 2:46 AM Ezequiel Garcia 
> >>  wrote:
> >>> From: Jonas Karlman 
> >>>
> >>> TRM specify supported image size 48x48 to 4096x2304 at step size 16 
> >>> pixels,
> >>> change frmsize max_width/max_height to match TRM.
> >>>
> >>> Fixes: 760327930e10 ("media: hantro: Enable H264 decoding on rk3288")
> >>> Signed-off-by: Jonas Karlman 
> >>> ---
> >>> v2:
> >>> * No changes.
> >>>
> >>>  drivers/staging/media/hantro/rk3288_vpu_hw.c | 4 ++--
> >>>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/staging/media/hantro/rk3288_vpu_hw.c 
> >>> b/drivers/staging/media/hantro/rk3288_vpu_hw.c
> >>> index 6bfcc47d1e58..ebb017b8a334 100644
> >>> --- a/drivers/staging/media/hantro/rk3288_vpu_hw.c
> >>> +++ b/drivers/staging/media/hantro/rk3288_vpu_hw.c
> >>> @@ -67,10 +67,10 @@ static const struct hantro_fmt 
> >>> rk3288_vpu_dec_fmts[] = {
> >>> .max_depth = 2,
> >>> .frmsize = {
> >>> .min_width = 48,
> >>> -   .max_width = 3840,
> >>> +   .max_width = 4096,
> >>> .step_width = H264_MB_DIM,
> >>> .min_height = 48,
> >>> -   .max_height = 2160,
> >>> +   .max_height = 2304,
> >> This doesn't match the datasheet I have, which is RK3288 Datasheet Rev
> >> 1.4 and which has the values as in current code. What's the one you
> >> got the values from?
> > The RK3288 TRM vcodec chapter from [1], unknown revision and date, 
> > lists 48x48 to 4096x2304 step size 16 pixels under 25.5.1 H.264 decoder.
> >
> > I can also confirm that one of my test samples (PUPPIES BATH IN 4K) is 
> > 4096x2304 and can be decoded after this patch.
> > However the decoding speed is not optimal at 400Mhz, if I recall 
> > correctly you need to set the VPU1 clock to 600Mhz for 4K decoding on 
> > RK3288.
> >
> > I am not sure if I should include a v2 of this patch in my v2 series, 
> > as-is this patch do not apply on master (H264_MB_DIM has changed to 
> > MB_DIM in master).
> >
> > [1] 
> > http://www.t-firefly.com/download/firefly-rk3288/docs/TRM/rk3288-chapter-25-video-encoder-decoder-unit-(vcodec).pdf
>  I checked the RK3288 TRM V1.1 too and it refers to 3840x2160@24fps as
>  the maximum.
> 
>  As for performance, we've actually been getting around 33 fps at 400
>  MHz with 3840x2160 on our devices (the old RK3288 Asus Chromebook
>  Flip).
> 
>  I guess we might want to check that with Hantro.
> >>> Could you check the value of bits 10:0 in register at 0x0c8? That
> >>> should be the maximum supported stream width in the units of 16
> >>> pixels.
> >> Correction: The unit is 1 pixel and there are additional 2 most
> >> significant bits at 0x0d8, 15:14.
> > I will check this later tonight when I have access to my devices.
>
> My Asus Tinker Board S (RK3288-C) is reporting support for 0x780 / 1920 
> pixels:
>
> 0x000  (0) = 0x67313688
> 0x0c8 (50) = 0xfbb56f80
> 0x0d8 (54) = 0xe5da
>

Looks like that register doesn't work very well in Rockchip's
implementation... Thanks for checking anyway.

I guess we can allow 4096x2304 for the time being and see what happens.

Reviewed-by: Tomasz Figa 

Best regards,
Tomasz

Re: [PATCH v1] mm: Fix access of uninitialized memmaps in fs/proc/page.c

2019-10-10 Thread David Hildenbrand

On 09.10.19 11:57, Naoya Horiguchi wrote:
> Hi David,
> 
> On Wed, Oct 09, 2019 at 11:12:04AM +0200, David Hildenbrand wrote:
>> There are various places where we access uninitialized memmaps, namely:
>> - /proc/kpagecount
>> - /proc/kpageflags
>> - /proc/kpagecgroup
>> - memory_failure() - which reuses stable_page_flags() from fs/proc/page.c
> 
> Ah right, memory_failure is another victim of this bug.
> 
>>
>> We have initialized memmaps either when the section is online or when
>> the page was initialized to the ZONE_DEVICE. Uninitialized memmaps contain
>> garbage and in the worst case trigger kernel BUGs, especially with
>> CONFIG_PAGE_POISONING.
>>
>> For example, not onlining a DIMM during boot and calling /proc/kpagecount
>> with CONFIG_PAGE_POISONING:
>> :/# cat /proc/kpagecount > tmp.test
>> [   95.600592] BUG: unable to handle page fault for address: fffe
>> [   95.601238] #PF: supervisor read access in kernel mode
>> [   95.601675] #PF: error_code(0x) - not-present page
>> [   95.602116] PGD 114616067 P4D 114616067 PUD 114618067 PMD 0
>> [   95.602596] Oops:  [#1] SMP NOPTI
>> [   95.602920] CPU: 0 PID: 469 Comm: cat Not tainted 
>> 5.4.0-rc1-next-20191004+ #11
>> [   95.603547] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
>> rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
>> [   95.604521] RIP: 0010:kpagecount_read+0xce/0x1e0
>> [   95.604917] Code: e8 09 83 e0 3f 48 0f a3 02 73 2d 4c 89 e7 48 c1 e7 06 
>> 48 03 3d ab 51 01 01 74 1d 48 8b 57 08 480
>> [   95.606450] RSP: 0018:a14e409b7e78 EFLAGS: 00010202
>> [   95.606904] RAX: fffe RBX: 0002 RCX: 
>> 
>> [   95.607519] RDX: 0001 RSI: 7f76b5595000 RDI: 
>> f3564500
>> [   95.608128] RBP: 7f76b5595000 R08: 0001 R09: 
>> 
>> [   95.608731] R10:  R11:  R12: 
>> 0014
>> [   95.609327] R13: 0002 R14: 7f76b5595000 R15: 
>> a14e409b7f08
>> [   95.609924] FS:  7f76b577d580() GS:8f41bd40() 
>> knlGS:
>> [   95.610599] CS:  0010 DS:  ES:  CR0: 80050033
>> [   95.611083] CR2: fffe CR3: 7896 CR4: 
>> 06f0
>> [   95.611686] Call Trace:
>> [   95.611906]  proc_reg_read+0x3c/0x60
>> [   95.612228]  vfs_read+0xc5/0x180
>> [   95.612505]  ksys_read+0x68/0xe0
>> [   95.612785]  do_syscall_64+0x5c/0xa0
>> [   95.613092]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>
>> Note that there are still two possible races as far as I can see:
>> - pfn_to_online_page() succeeding but the memory getting offlined and
>>   removed. get_online_mems() could help once we run into this.
>> - pfn_zone_device() succeeding but the memmap not being fully
>>   initialized yet. As the memmap is initialized outside of the memory
>>   hoptlug lock, get_online_mems() can't help.
>>
>> Let's keep the existing interfaces working with ZONE_DEVICE memory. We
>> can later come back and fix these rare races and eventually speed-up the
>> ZONE_DEVICE detection.
> 
> Actually, Toshiki is writing code to refactor and optimize the pfn walking
> part, where we find the pfn ranges covered by zone devices by running over
> xarray pgmap_array and use the range info to reduce pointer dereferences
> to speed up pfn walk. I hope he will share it soon.

AFAIKT, Michal is not a friend of special-casing PFN walkers in that
way. We should have a mechanism to detect if a memmap was initialized
without having to go via pgmap, special-casing. See my other mail where
I draft one basic approach.


-- 

Thanks,

David / dhildenb

Re: [PATCH v2 for 5.4 3/4] media: hantro: Fix motion vectors usage condition

2019-10-10 Thread Tomasz Figa

On Tue, Oct 8, 2019 at 10:57 PM Jonas Karlman  wrote:
>
> On 2019-10-08 12:26, Tomasz Figa wrote:
> > On Tue, Oct 8, 2019 at 3:23 PM Jonas Karlman  wrote:
> >> On 2019-10-08 05:29, Tomasz Figa wrote:
> >>> Hi Jonas,
> >>>
> >>> On Tue, Oct 8, 2019 at 3:33 AM Jonas Karlman  wrote:
>  On 2019-10-07 19:45, Ezequiel Garcia wrote:
> > From: Francois Buergisser 
> >
> > The setting of the motion vectors usage and the setting of motion
> > vectors address are currently done under different conditions.
> >
> > When decoding pre-recorded videos, this results of leaving the motion
> > vectors address unset, resulting in faulty memory accesses. Fix it
> > by using the same condition everywhere, which matches the profiles
> > that support motion vectors.
>  This does not fully match hantro sdk:
> 
>    enable direct MV writing and POC tables for high/main streams.
>    enable it also for any "baseline" stream which have main/high tools 
>  enabled.
> 
>    (sps->profile_idc > 66 && sps->constrained_set0_flag == 0) ||
>    sps->frame_mbs_only_flag != 1 ||
>    sps->chroma_format_idc != 1 ||
>    sps->scaling_matrix_present_flag != 0 ||
>    pps->entropy_coding_mode_flag != 0 ||
>    pps->weighted_pred_flag != 0 ||
>    pps->weighted_bi_pred_idc != 0 ||
>    pps->transform8x8_flag != 0 ||
>    pps->scaling_matrix_present_flag != 0
> >>> Thanks for double checking this. I can confirm that it's what Hantro
> >>> SDK does indeed.
> >>>
> >>> However, would a stream with sps->profile_idc <= 66 and those other
> >>> conditions met be still a compliant stream?
> >> You are correct, if a non-compliant video is having decoding problems it 
> >> should probably be handled
> >> on userspace side (by not reporting baseline profile) and not in kernel.
> >> All my video samples that was having the issue fixed in this patch are now 
> >> decoded correctly.
> >>
>  Above check is used when DIR_MV_BASE should be written.
>  And WRITE_MVS_E is set to nal_ref_idc != 0 when above is true.
> 
>  I think it may be safer to always set DIR_MV_BASE and keep the existing 
>  nal_ref_idc check for WRITE_MVS_E.
> >>> That might have a performance penalty or some other side effects,
> >>> though. Otherwise Hantro SDK wouldn't have enable it conditionally.
> >>>
>  (That is what I did in my "media: hantro: H264 fixes and improvements" 
>  series, v2 is incoming)
>  Or have you found any video that is having issues in such case?
> >>> We've been running the code with sps->profile_idc > 66 in production
> >>> for 4 years and haven't seen any reports of a stream that wasn't
> >>> decoded correctly.
> >>>
> >>> If we decide to go with a different behavior, I'd suggest thoroughly
> >>> verifying the behavior on a big number of streams, including some
> >>> performance measurements.
> >> I agree, I would however suggest to change the if statement to the 
> >> following (or similar)
> >> as that should be the optimal for performance reasons and match the hantro 
> >> sdk.
> >>
> >> if (sps->profile_idc > 66 && dec_param->nal_ref_idc)
> > Sorry for my ignorance, but could you elaborate on this? What's the
> > meaning of nal_ref_idc? I don't see it being checked in the Hantro SDK
> > condition you mentioned earlier.
>
> My somewhat limited understanding of h264 spec is that nal_ref_idc should be 
> 0 for non-reference field/frame/pictures
> and because of this there should not be any need to write motion vector data 
> as the field/frame should not be referenced.
>
> My base for the hantro sdk code is the imx8 imx-vpu-hantro package and it 
> uses (simplified):
>   SetDecRegister(h264_regs, HWIF_WRITE_MVS_E, nal_ref_idc != 0)
> for MVC there is an additional condition.

Aha, completely makes sense. Thanks for clarifying.

Best regards,
Tomasz

>
> Regards,
> Jonas
>
> >
> >> Regards,
> >> Jonas
> >>
> >>> Best regards,
> >>> Tomasz
> >>>
>  Regards,
>  Jonas
> 
> > Fixes: dea0a82f3d22 ("media: hantro: Add support for H264 decoding on 
> > G1")
> > Signed-off-by: Francois Buergisser 
> > Signed-off-by: Ezequiel Garcia 
> > ---
> > v2:
> > * New patch.
> >
> >  drivers/staging/media/hantro/hantro_g1_h264_dec.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/staging/media/hantro/hantro_g1_h264_dec.c 
> > b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
> > index 7ab534936843..c92460407613 100644
> > --- a/drivers/staging/media/hantro/hantro_g1_h264_dec.c
> > +++ b/drivers/staging/media/hantro/hantro_g1_h264_dec.c
> > @@ -35,7 +35,7 @@ static void set_params(struct hantro_ctx *ctx)
> >   if (sps->flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD)
> >   reg |= G1_REG_DEC_CTRL0_SEQ_MBAFF_E;
> >   reg |= G1_REG_DEC_CTRL0_PICORD_COUNT_E;
> > - if (dec_param->nal_ref_idc)
> >>

Re: [PATCH RFC] perf_event: Add support for LSM and SELinux checks

2019-10-10 Thread Alexey Budankov

Hi Joel,

Thanks for the patch.

On 09.10.2019 23:36, Joel Fernandes (Google) wrote:
> In currentl mainline, the degree of access to perf_event_open(2) system
> call depends on the perf_event_paranoid sysctl.  This has a number of
> limitations:
> 
> 1. The sysctl is only a single value. Many types of accesses are controlled
>based on the single value thus making the control very limited and
>coarse grained.
> 2. The sysctl is global, so if the sysctl is changed, then that means
>all processes get access to perf_event_open(2) opening the door to
>security issues.

Please also see considerations here: Documentation/admin-guide/perf-security.rst
According to the document Perf based monitoring can be configured for usage 
by Perf user groups. This option extends perf_event access control beyond 
perf_event_paranoid kernel setting.

One of the possible future steps from that could be to move perf_events
monitoring from under overloaded CAP_SYS_ADMIN to a dedicated CAP_SYS_PERFMON.

It would be appreciated if Perf user groups approach continue be applicable 
as currently documented. Existing Perf tool implementation already relies on it.

It would be also beneficial to augment the document with related information 
regarding Perf extensions related to kernel security.

Thanks,
Alexey

> 
> This patch adds LSM and SELinux access checking which will be used in
> Android to access perf_event_open(2) for the purposes of attaching BPF
> programs to tracepoints, perf profiling and other operations from
> userspace. These operations are intended for production systems.
> 
> 5 new LSM hooks are added:
> 1. perf_event_open: This controls access during the perf_event_open(2)
>syscall itself. The hook is called from all the places that the
>perf_event_paranoid sysctl is checked to keep it consistent with the
>systctl. The hook gets passed a 'type' argument which controls CPU,
>kernel and tracepoint accesses (in this context, CPU, kernel and
>tracepoint have the same semantics as the perf_event_paranoid sysctl).
>Additionally, I added an 'open' type which is similar to
>perf_event_paranoid sysctl == 3 patch carried in Android and several other
>distros but was rejected in mainline [1] in 2016.
> 
> 2. perf_event_alloc: This allocates a new security object for the event
>which stores the current SID within the event. It will be useful when
>the perf event's FD is passed through IPC to another process which may
>try to read the FD. Appropriate security checks will limit access.
> 
> 3. perf_event_free: Called when the event is closed.
> 
> 4. perf_event_read: Called from the read(2) system call path for the event.
> 
> 5. perf_event_write: Called from the read(2) system call path for the event.
> 
> [1] https://lwn.net/Articles/696240/
> 
> Since Peter had suggest LSM hooks in 2016 [1], I am adding his
> Suggested-by tag below.
> 
> To use this patch, we set the perf_event_paranoid sysctl to -1 and then
> apply selinux checking as appropriate (default deny everything, and then
> add policy rules to give access to domains that need it). In the future
> we can remove the perf_event_paranoid sysctl altogether.
> 
> Suggested-by: Peter Zijlstra 
> Cc: Peter Zijlstra 
> Cc: rost...@goodmis.org
> Cc: primi...@google.com
> Cc: rsavit...@google.com
> Cc: je...@google.com
> Cc: kernel-t...@android.com
> Signed-off-by: Joel Fernandes (Google) 
> 
> ---
>  arch/x86/events/intel/bts.c |  5 +++
>  arch/x86/events/intel/core.c|  5 +++
>  arch/x86/events/intel/p4.c  |  5 +++
>  include/linux/lsm_hooks.h   | 15 +++
>  include/linux/perf_event.h  |  3 ++
>  include/linux/security.h| 39 +++-
>  include/uapi/linux/perf_event.h | 13 ++
>  kernel/events/core.c| 59 +---
>  kernel/trace/trace_event_perf.c | 15 ++-
>  security/security.c | 33 ++
>  security/selinux/hooks.c| 69 +
>  security/selinux/include/classmap.h |  2 +
>  security/selinux/include/objsec.h   |  6 ++-
>  13 files changed, 259 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/events/intel/bts.c b/arch/x86/events/intel/bts.c
> index 5ee3fed881d3..9796fc094dad 100644
> --- a/arch/x86/events/intel/bts.c
> +++ b/arch/x86/events/intel/bts.c
> @@ -14,6 +14,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -553,6 +554,10 @@ static int bts_event_init(struct perf_event *event)
>   !capable(CAP_SYS_ADMIN))
>   return -EACCES;
>  
> + ret = security_perf_event_open(&event->attr, PERF_SECURITY_KERNEL);
> + if (ret)
> + return ret;
> +
>   if (x86_add_exclusive(x86_lbr_exclusive_bts))
>   return -EBUSY;
>  
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 27ee47a7be66..75b6b9b239ae 100644
> --- a/arch/x86/events

Re: [PATCH] ARM: dts: zynq: enablement of coresight topology

2019-10-10 Thread qwang2




On 10/9/19 10:07 PM, Michal Simek wrote:

From: Zumeng Chen 

This patch is to build the coresight topology structure of zynq-7000
series according to the docs of coresight and userguide of zynq-7000.

Signed-off-by: Zumeng Chen 
Signed-off-by: Quanyang Wang 
Signed-off-by: Michal Simek 
---

  arch/arm/boot/dts/zynq-7000.dtsi | 158 +++
  1 file changed, 158 insertions(+)

diff --git a/arch/arm/boot/dts/zynq-7000.dtsi b/arch/arm/boot/dts/zynq-7000.dtsi
index ca6425ad794c..86430ad76fee 100644
--- a/arch/arm/boot/dts/zynq-7000.dtsi
+++ b/arch/arm/boot/dts/zynq-7000.dtsi
@@ -59,6 +59,40 @@
regulator-always-on;
};
  
+	replicator {

+   compatible = "arm,coresight-static-replicator";
+   clocks = <&clkc 27>, <&clkc 46>, <&clkc 47>;
+   clock-names = "apb_pclk", "dbg_trc", "dbg_apb";
+
+   out-ports {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   /* replicator output ports */
+   port@0 {
+   reg = <0>;
+   replicator_out_port0: endpoint {
+   remote-endpoint = <&tpiu_in_port>;
+   };
+   };
+   port@1 {
+   reg = <1>;
+   replicator_out_port1: endpoint {
+   remote-endpoint = <&etb_in_port>;
+   };
+   };
+   };
+   in-ports {
+   /* replicator input port */
+   port {
+   replicator_in_port0: endpoint {
+   slave-mode;
+   remote-endpoint = <&funnel_out_port>;
+   };
+   };
+   };
+   };
+
amba: amba {
compatible = "simple-bus";
#address-cells = <1>;
@@ -365,5 +399,129 @@
reg = <0xf8005000 0x1000>;
timeout-sec = <10>;
};
+
+   etb@f8801000 {
+   compatible = "arm,coresight-etb10", "arm,primecell";
+   reg = <0xf8801000 0x1000>;
+   clocks = <&clkc 27>, <&clkc 46>, <&clkc 47>;
+   clock-names = "apb_pclk", "dbg_trc", "dbg_apb";
+   in-ports {
+   port {
+   etb_in_port: endpoint {
+   remote-endpoint = 
<&replicator_out_port1>;
+   };
+   };
+   };
+   };
+
+   tpiu@f8803000 {
+   compatible = "arm,coresight-tpiu", "arm,primecell";
+   reg = <0xf8803000 0x1000>;
+   clocks = <&clkc 27>, <&clkc 46>, <&clkc 47>;
+   clock-names = "apb_pclk", "dbg_trc", "dbg_apb";
+   in-ports {
+   port {
+   tpiu_in_port: endpoint {
+   remote-endpoint = 
<&replicator_out_port0>;
+   };
+   };
+   };
+   };
+
+   funnel@f8804000 {
+   compatible = "arm,coresight-static-funnel", 
"arm,primecell";
+   reg = <0xf8804000 0x1000>;
+   clocks = <&clkc 27>, <&clkc 46>, <&clkc 47>;
+   clock-names = "apb_pclk", "dbg_trc", "dbg_apb";
+
+   /* funnel output ports */
+   out-ports {
+   port {
+   funnel_out_port: endpoint {
+   remote-endpoint =
+   <&replicator_in_port0>;
+   };
+   };
+   };
+
+   in-ports {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   /* funnel input ports */
+   port@0 {
+   reg = <0>;
+   funnel0_in_port0: endpoint {
+   remote-endpoint = 
<&ptm0_out_port>;
+   };
+   };
+
+   port@1 {
+   reg = <1>;
+   f

Re: [PATCH bpf-next 2/2] bpf/stackmap: fix A-A deadlock in bpf_get_stack()

2019-10-10 Thread Peter Zijlstra

On Wed, Oct 09, 2019 at 11:19:16PM -0700, Song Liu wrote:
> bpf stackmap with build-id lookup (BPF_F_STACK_BUILD_ID) can trigger A-A
> deadlock on rq_lock():
> 
> rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [...]
> Call Trace:
>  try_to_wake_up+0x1ad/0x590
>  wake_up_q+0x54/0x80
>  rwsem_wake+0x8a/0xb0
>  bpf_get_stack+0x13c/0x150
>  bpf_prog_fbdaf42eded9fe46_on_event+0x5e3/0x1000
>  bpf_overflow_handler+0x60/0x100
>  __perf_event_overflow+0x4f/0xf0
>  perf_swevent_overflow+0x99/0xc0
>  ___perf_sw_event+0xe7/0x120
>  __schedule+0x47d/0x620
>  schedule+0x29/0x90
>  futex_wait_queue_me+0xb9/0x110
>  futex_wait+0x139/0x230
>  do_futex+0x2ac/0xa50
>  __x64_sys_futex+0x13c/0x180
>  do_syscall_64+0x42/0x100
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 

> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index 052580c33d26..3b278f6b0c3e 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -287,7 +287,7 @@ static void stack_map_get_build_id_offset(struct 
> bpf_stack_build_id *id_offs,
>   bool irq_work_busy = false;
>   struct stack_map_irq_work *work = NULL;
> 
> - if (in_nmi()) {
> + if (in_nmi() || this_rq_is_locked()) {
>   work = this_cpu_ptr(&up_read_work);
>   if (work->irq_work.flags & IRQ_WORK_BUSY)
>   /* cannot queue more up_read, fallback */

This is horrific crap. Just say no to that get_build_id_offset()
trainwreck.

Re: [PATCH v11 0/6] mm / virtio: Provide support for unused page reporting

2019-10-10 Thread David Hildenbrand

On 09.10.19 21:46, Nitesh Narayan Lal wrote:
> 
> On 10/9/19 12:35 PM, Alexander Duyck wrote:
>> On Wed, 2019-10-09 at 11:21 -0400, Nitesh Narayan Lal wrote:
>>> On 10/7/19 1:06 PM, Nitesh Narayan Lal wrote:
>>> [...]
> So what was the size of your guest? One thing that just occurred to me is
> that you might be running a much smaller guest than I was.
 I am running a 30 GB guest.

>>>  If so I would have expected a much higher difference versus
>>> baseline as zeroing/faulting the pages in the host gets expensive fairly
>>> quick. What is the host kernel you are running your test on? I'm just
>>> wondering if there is some additional overhead currently limiting your
>>> setup. My host kernel was just the same kernel I was running in the 
>>> guest,
>>> just built without the patches applied.
>> Right now I have a different host-kernel. I can install the same kernel 
>> to the
>> host as well and see if that changes anything.
> The host kernel will have a fairly significant impact as I recall. For
> example running a stock CentOS kernel lowered the performance compared to
> running a linux-next kernel. As a result the numbers looked better since
> the overall baseline was lower to begin with as the host OS was
> introducing additional overhead.
 I see in that case I will try by installing the same guest kernel
 to the host as well.
>>> As per your suggestion, I tried replacing the host kernel with an
>>> upstream kernel without my patches i.e., my host has a kernel built on top
>>> of the upstream kernel's master branch which has Sept 23rd commit and the 
>>> guest
>>> has the same kernel for the no-hinting case and same kernel + my patches
>>> for the page reporting case.
>>>
>>> With the changes reported earlier on top of v12, I am not seeing any further
>>> degradation (other than what I have previously reported).
>>>
>>> To be sure that THP is actively used, I did an experiment where I changed 
>>> the
>>> MEMSIZE in the page_fault. On doing so THP usage checked via /proc/meminfo 
>>> also
>>> increased as I expected.
>>>
>>> In any case, if you find something else please let me know and I will look 
>>> into it
>>> again.
>>>
>>>
>>> I am still looking into your suggestion about cache line bouncing and will 
>>> reply
>>> to it, if I have more questions.
>>>
>>>
>>> [...]
>> I really feel like this discussion has gone off course. The idea here is
>> to review this patch set[1] and provide working alternatives if there are
>> issues with the current approach.
> 
> 
> Agreed.
> 
>>
>> The bitmap based approach still has a number of outstanding issues
>> including sparse memory and hotplug which have yet to be addressed.
> 
> True, but I don't think those two are a blocker.
> 
> For sparse zone as we are maintaining the bitmap on a granularity of
> (MAX_ORDER - 2) / (MAX_ORDER - 1) etc. the memory wastage should be
> negligible in most of the cases.
> 
> For memory hotplug/hotremove, I did make sure that I don't break anything.
> Even if a user starts using this feature with page-reporting enabled.
> However, it is true that I don't report or capture any memory added/removed
> thought it.
> 
> Fixing these issues will be an optimization which I will do as I get my basic
> framework ready and in shape.
> 
>>  We can
>> gloss over that, but there is a good chance that resolving those would
>> have potential performance implications. With this most recent change
>> there is now also the fact that it can only really support reporting at
>> one page order so the solution is now much more prone to issues with
>> memory fragmentation than it was before. I would consider the fact that my
>> solution works with multiple page orders while the bitmap approach
>> requires MAX_ORDER - 1 seems like another obvious win for my solution.
> 
> This is just a configuration change and only requires to update
> the macro 'PAGE_REPORTING_MIN_ORDER' to what you are using.
> 
> What order do we want to report could vary based on the
> use case where we are deploying the solution.
> 
> Ideally, this should be configurable maybe at the compile time
> or we can stick with pageblock_order which is originally suggested
> and used by you.
> 
>> Until we can get back to the point where we are comparing apples to apples
>> I would prefer not to benchmark the bitmap solution as without the extra
>> order limitation it was over 20% worse then my solution performance wise..
> 
> Understood.
> However, as I reported previously after making the configuration changes
> on top of v12 posting, I don't see the degradation.
> 
> I will be happy to try out more suggestions to see if the issue is really 
> fixed.
> 
> I have started looking into your concern of cacheline bouncing after
> which I will look into Michal's suggestion of using page-isolation APIs to
> isolate and release pages back. After that, I can decide on
> posting my next series (if it is required

Re: [PATCH for-next 2/5] erofs: remove dead code since managed cache is now built-in

2019-10-10 Thread Chao Yu

On 2019/10/8 20:56, Gao Xiang wrote:
> After commit 4279f3f9889f ("staging: erofs: turn cache
> strategies into mount options"), cache strategies are
> changed into mount options rather than old build configs.
> 
> Let's kill useless code for obsoleted build options.
> 
> Signed-off-by: Gao Xiang 

Reviewed-by: Chao Yu 

Thanks,

Re: [PATCH v2] mm/page_isolation: fix a deadlock with printk()

2019-10-10 Thread Michal Hocko

On Thu 10-10-19 14:12:01, Sergey Senozhatsky wrote:
> On (10/09/19 16:26), Michal Hocko wrote:
> > On Wed 09-10-19 15:56:32, Peter Oberparleiter wrote:
> > [...]
> > > A generic solution would be preferable from my point of view though,
> > > because otherwise each console driver owner would need to ensure that any
> > > lock taken in their console.write implementation is never held while
> > > memory is allocated/released.
> >
> > Considering that console.write is called from essentially arbitrary code
> > path IIUC then all the locks used in this path should be pretty much
> > tail locks or console internal ones without external dependencies.
> 
> That's a good expectation, but I guess it's not always the case.
> 
> One example might be NET console - net subsystem locks, net device
> drivers locks, maybe even some MM locks (skb allocations?).

I am not familiar with the netconsole code TBH. If there is absolutely
no way around that then we might have to bite a bullet and consider some
of MM locks a land of no printk. I have already said that in this
thread. I am mostly pushing back on "let's just go the simplest way"
approach.

> But even more "commonly used" consoles sometimes break that
> expectation. E.g. 8250
> 
> serial8250_console_write()
>  serial8250_modem_status()
>   wake_up_interruptible()

By that expectation you mean they are using external locks or that they
really _need_ to allocate. Because if you are pointing to
wake_up_interruptible and therefore the rq then this is a well known
thing and I was under impression even documented but I can only see
LOGLEVEL_SCHED that is arguably a very obscure way to document the fact.

-- 
Michal Hocko
SUSE Labs

Re: [PATCH for-next 4/5] erofs: clean up decompress queue stuffs

2019-10-10 Thread Chao Yu

On 2019/10/8 20:56, Gao Xiang wrote:
> Previously, both z_erofs_unzip_io and z_erofs_unzip_io_sb
> record decompress queues for backend to use.
> 
> The only difference is that z_erofs_unzip_io is used for
> on-stack sync decompression so that it doesn't have a super
> block field (since the caller can pass it in its context),
> but it increases complexity with only a pointer saving.
> 
> Rename z_erofs_unzip_io to z_erofs_decompressqueue with
> a fixed super_block member and kill the other entirely,
> and it can fallback to sync decompression if memory
> allocation failure.
> 
> Signed-off-by: Gao Xiang 

Reviewed-by: Chao Yu 

Thanks,

Re: [PATCH v17 01/14] bitops: Introduce the for_each_set_clump8 macro

2019-10-10 Thread Andy Shevchenko

On Thu, Oct 10, 2019 at 9:29 AM Geert Uytterhoeven  wrote:
> On Thu, Oct 10, 2019 at 7:49 AM Andy Shevchenko
>  wrote:
> > On Thu, Oct 10, 2019 at 5:31 AM Masahiro Yamada
> >  wrote:
> > > On Thu, Oct 10, 2019 at 3:54 AM Geert Uytterhoeven  
> > > wrote:
> > > > On Wed, Oct 9, 2019 at 7:09 PM Andy Shevchenko
> > > >  wrote:
> > > > > On Thu, Oct 10, 2019 at 01:28:08AM +0900, Masahiro Yamada wrote:
> > > > > > On Thu, Oct 10, 2019 at 12:27 AM William Breathitt Gray
> > > > > >  wrote:
> > > > > > >
> > > > > > > This macro iterates for each 8-bit group of bits (clump) with set 
> > > > > > > bits,
> > > > > > > within a bitmap memory region. For each iteration, "start" is set 
> > > > > > > to the
> > > > > > > bit offset of the found clump, while the respective clump value is
> > > > > > > stored to the location pointed by "clump". Additionally, the
> > > > > > > bitmap_get_value8 and bitmap_set_value8 functions are introduced 
> > > > > > > to
> > > > > > > respectively get and set an 8-bit value in a bitmap memory region.
> > > > >
> > > > > > Why is the return type "unsigned long" where you know
> > > > > > it return the 8-bit value ?
> > > > >
> > > > > Because bitmap API operates on unsigned long type. This is not only
> > > > > consistency, but for sake of flexibility in case we would like to 
> > > > > introduce
> > > > > more calls like clump16 or so.
> > > >
> > > > TBH, that doesn't convince me: those functions explicitly take/return an
> > > > 8-bit value, and have "8" in their name.  The 8-bit value is never
> > > > really related to, retrieved from, or stored in a full "unsigned long"
> > > > element of a bitmap, only to/from/in a part (byte) of it.
> > > >
> > > > Following your rationale, all of iowrite{8,16,32,64}*() should take an
> > > > "unsigned long" value, too.
> > > >
> > >
> > > +1
> > >
> > > Using u8/u16/u32/u64 looks more consistent with other bitmap helpers.
> > >
> > > void bitmap_from_arr32(unsigned long *bitmap, const u32 *buf, unsigned
> > > int nbits);
> > > void bitmap_to_arr32(u32 *buf, const unsigned long *bitmap, unsigned int 
> > > nbits);
> > > static inline void bitmap_from_u64(unsigned long *dst, u64 mask);
> > >
> > >
> > >
> > > If you want to see more examples from other parts,
> >
> > Geert's and yours examples both are not related. They are about
> > fixed-width properies when we know that is the part of protocol.
> > Here we have no protocol which stricts us to the mentioned fixed-width 
> > types.
>
> Yes you have: they are functions to store/retrieve an 8-bit value from
> the middle of the bitmap, which is reflected in their names ("clump8",
> "value8").
> The input/output value is clearly separated from the actual bitmap,
> which is referenced by the "unsigned long *".
>
> If you add new "value16" functions, they will be intended to store/retrieve
> 16-bit values.

And if I add 4-bit, 12-bit or 24-bit values, what should I use?

> Besides, if retrieving an 8-bit value requires passing an
> "unsigned long *", the caller needs two variables: one unsigned long to
> pass the address of, and one u8 to copy the returned value into.

Why do you need a temporary variable? In some cases it might make
sense, but in general simple cases I don't see what you may achieve
with it.

I looked at bitmap.h and see few functions may have benefited of
actually eliminating a use of long -> u8 -> long conversion.

Here is the question what we are mostly doing after we got a clump out
of bitmap.

> > So, I can tell an opposite, your arguments didn't convince me.
> >
> > Imagine the function which does an or / and / xor operation on bitmap.
> > Now, when I supply unsigned long, I will see
> > operations on one type in _one_ function independently of the size.
> > Your proposal will make an unneded churn.
>
> Depends on what kind of value you will use to do the logical operation
> with the bitmap:
>   - Full bitmap => unsigned long * + size,
>   - Single bitmap "word" => unsigned long,
>   - 8-bit value => u8,
>   - 16-bit value => u16

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH] extcon: sm5502: Clear pending interrupts during initialization

2019-10-10 Thread Chanwoo Choi

On 19. 10. 10. 오후 4:33, Chanwoo Choi wrote:
> On 19. 10. 8. 오후 7:54, Stephan Gerhold wrote:
>> On some devices (e.g. Samsung Galaxy A5 (2015)), the bootloader
>> seems to keep interrupts enabled for SM5502 when booting Linux.
>> Changing the cable state (i.e. plugging in a cable) - until the driver
>> is loaded - will therefore produce an interrupt that is never read.
>>
>> In this situation, the cable state will be stuck forever on the
>> initial state because SM5502 stops sending interrupts.
>> This can be avoided by clearing those pending interrupts after
>> the driver has been loaded.
>>
>> Reading the interrupt status registers twice seems to be sufficient
>> to make interrupts work in this situation.
>>
>> Signed-off-by: Stephan Gerhold 
>> ---
>> This makes interrupts work on the Samsung Galaxy A5 (2015), which
>> has recently gained mainline support [1].
>>
>> I was not able to find a datasheet for SM5502, so this patch is
>> merely based on testing and comparison with the downstream driver [2].
>>
>> [1]: 
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1329c1ab0730b521e6cd3051c56a2ff3d55f21e6
>> [2]: 
>> https://protect2.fireeye.com/url?k=029d0042-5ffa4464-029c8b0d-0cc47a31384a-14ac0bce09798d1f&u=https://github.com/msm8916-mainline/android_kernel_qcom_msm8916/blob/SM-A500FU/drivers/misc/sm5502.c#L1566-L1578
>> ---
>>  drivers/extcon/extcon-sm5502.c | 6 ++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/extcon/extcon-sm5502.c b/drivers/extcon/extcon-sm5502.c
>> index dc43847ad2b0..c897f1aa4bf5 100644
>> --- a/drivers/extcon/extcon-sm5502.c
>> +++ b/drivers/extcon/extcon-sm5502.c
>> @@ -539,6 +539,12 @@ static void sm5502_init_dev_type(struct 
>> sm5502_muic_info *info)
>>  val = info->reg_data[i].val;
>>  regmap_write(info->regmap, info->reg_data[i].reg, val);
>>  }
>> +
>> +/* Clear pending interrupts */
>> +regmap_read(info->regmap, SM5502_REG_INT1, ®_data);
>> +regmap_read(info->regmap, SM5502_REG_INT2, ®_data);
>> +regmap_read(info->regmap, SM5502_REG_INT1, ®_data);
>> +regmap_read(info->regmap, SM5502_REG_INT2, ®_data);
> 
> It is not proper. Instead, initialize the SM5502_RET_INT1/2 with zero.
> 
> The reset value of SM5502_RET_INT1/2 are zero (0x00) as following:
> If you can test it with h/w, please try to test it and then
> send the modified patch. 
> 
> diff --git a/drivers/extcon/extcon-sm5502.c b/drivers/extcon/extcon-sm5502.c
> index c897f1aa4bf5..e168f77a18ba 100644
> --- a/drivers/extcon/extcon-sm5502.c
> +++ b/drivers/extcon/extcon-sm5502.c
> @@ -68,6 +68,14 @@ static struct reg_data sm5502_reg_data[] = {
> .reg = SM5502_REG_CONTROL,
> .val = SM5502_REG_CONTROL_MASK_INT_MASK,
> .invert = false,
> +   }, {
> +   .reg = SM5502_REG_INT1,
> +   .val = SM5502_RET_INT1_MASK,
> +   .invert = true,
> +   }, {
> +   .reg = SM5502_REG_INT2,
> +   .val = SM5502_RET_INT1_MASK,
> +   .invert = true,
> }, {
> .reg = SM5502_REG_INTMASK1,
> .val = SM5502_REG_INTM1_KP_MASK
> diff --git a/drivers/extcon/extcon-sm5502.h b/drivers/extcon/extcon-sm5502.h
> index 9dbb634d213b..5c4edb3e7fce 100644
> --- a/drivers/extcon/extcon-sm5502.h
> +++ b/drivers/extcon/extcon-sm5502.h
> @@ -93,6 +93,8 @@ enum sm5502_reg {
>  #define SM5502_REG_CONTROL_RAW_DATA_MASK   (0x1 << 
> SM5502_REG_CONTROL_RAW_DATA_SHIFT)
>  #define SM5502_REG_CONTROL_SW_OPEN_MASK(0x1 << 
> SM5502_REG_CONTROL_SW_OPEN_SHIFT)
>  
> +#define SM5502_RET_INT1_MASK   (0xff)
> +
>  #define SM5502_REG_INTM1_ATTACH_SHIFT  0
>  #define SM5502_REG_INTM1_DETACH_SHIFT  1
>  #define SM5502_REG_INTM1_KP_SHIFT  2
> 
>>  }
>>  
>>  static int sm5022_muic_i2c_probe(struct i2c_client *i2c,
>>
> 
> 

When write 0x1 to SM5502_REG_RESET, reset the device.
So, you can reset the all registers by writing 1 to SM5502_REG_RESET as 
following:


diff --git a/drivers/extcon/extcon-sm5502.c b/drivers/extcon/extcon-sm5502.c
index c897f1aa4bf5..96a578d2533a 100644
--- a/drivers/extcon/extcon-sm5502.c
+++ b/drivers/extcon/extcon-sm5502.c
@@ -65,9 +65,22 @@ struct sm5502_muic_info {
 /* Default value of SM5502 register to bring up MUIC device. */
 static struct reg_data sm5502_reg_data[] = {
{
+   {
+   .reg = SM5502_REG_RESET,
+   .val = SM5502_REG_RESET_MASK,
+   .invert = true,
+   }, {
.reg = SM5502_REG_CONTROL,
.val = SM5502_REG_CONTROL_MASK_INT_MASK,
.invert = false,
+   }, {
+   .reg = SM5502_REG_INT1,
+   .val = SM5502_REG_INT1_MASK,
+   .invert = true,
+   }, {
+   .reg = SM5502_REG_INT2,
+   .val = SM5502_REG_INT1_MASK,
+   .invert = true,
}, {

Re: For review: rewritten pivot_root(2) manual page

2019-10-10 Thread Michael Kerrisk (man-pages)

Hello Eric,

I think I just understood something. See below.

On 10/9/19 11:01 PM, Michael Kerrisk (man-pages) wrote:
> Hello Eric,
> 
> On 10/9/19 6:00 PM, Eric W. Biederman wrote:
>> "Michael Kerrisk (man-pages)"  writes:
>>
>>> Hello Eric,
>>>
>>> Thank you. I was hoping you might jump in on this thread.
>>>
>>> Please see below.
>>>
>>> On 10/9/19 10:46 AM, Eric W. Biederman wrote:
 "Michael Kerrisk (man-pages)"  writes:

> Hello Philipp,
>
> My apologies that it has taken a while to reply. (I had been hoping
> and waiting that a few more people might weigh in on this thread.)
>
> On 9/23/19 3:42 PM, Philipp Wendler wrote:
>> Hello Michael,
>>
>> Am 23.09.19 um 14:04 schrieb Michael Kerrisk (man-pages):
>>
>>> I'm considering to rewrite these pieces to exactly
>>> describe what the system call does (which I already
>>> do in the third paragraph) and remove the "may or may not"
>>> pieces in the second paragraph. I'd welcome comments
>>> on making that change.
>>>
>>> What did you think about my proposal above? To put it in context,
>>> this was my initial comment in the mail:
>>>
>>> [[
>>> One area of the page that I'm still not really happy with
>>> is the "vague" wording in the second paragraph and the note
>>> in the third paragraph about the system call possibly
>>> changing. These pieces survive (in somewhat modified form)
>>> from the original page, which was written before the
>>> system call was released, and it seems there was some
>>> question about whether the system call might still change
>>> its behavior with respect to the root directory and current
>>> working directory of other processes. However, after 19
>>> years, nothing has changed, and surely it will not in the
>>> future, since that would constitute an ABI breakage.
>>> I'm considering to rewrite these pieces to exactly
>>> describe what the system call does (which I already
>>> do in the third paragraph) and remove the "may or may not"
>>> pieces in the second paragraph. I'd welcome comments
>>> on making that change.
>>> ]]
>>>
>>> And the second and third paragraphs of the manual page currently
>>> read:
>>>
>>> [[
>>>pivot_root()  may  or may not change the current root and the cur‐
>>>rent working directory of any processes or threads  that  use  the
>>>old  root  directory  and which are in the same mount namespace as
>>>the caller of pivot_root().  The  caller  of  pivot_root()  should
>>>ensure  that  processes  with root or current working directory at
>>>the old root operate correctly in either case.   An  easy  way  to
>>>ensure  this is to change their root and current working directory
>>>to  new_root  before  invoking  pivot_root().   Note   also   that
>>>pivot_root()  may  or may not affect the calling process's current
>>>working directory.  It is therefore recommended to call chdir("/")
>>>immediately after pivot_root().
>>>
>>>The  paragraph  above  is  intentionally vague because at the time
>>>when pivot_root() was first implemented, it  was  unclear  whether
>>>its  affect  on  other process's root and current working directo‐
>>>ries—and the caller's current working  directory—might  change  in
>>>the  future.   However, the behavior has remained consistent since
>>>this system call was first implemented: pivot_root()  changes  the
>>>root  directory  and the current working directory of each process
>>>or thread in the same mount namespace to new_root if they point to
>>>the  old  root  directory.   (See also NOTES.)  On the other hand,
>>>pivot_root() does not change the caller's current  working  direc‐
>>>tory  (unless it is on the old root directory), and thus it should
>>>be followed by a chdir("/") call.
>>> ]]
>>
>> Apologies I saw that concern I didn't realize it was a questio
>>
>> I think it is very reasonable to remove warning the behavior might
>> change.  We have pivot_root(8) in common use that to use it requires
>> the semantic of changing processes other than the current process.
>> Which means any attempt to noticably change the behavior of
>> pivot_root(2) will break userspace.
> 
> Thanks for the confirmation that this change would be okay.
> I will make this change soon, unless I hear a counterargument.
> 
>> Now the documented semantics in behavior above are not quite what
>> pivot_root(2) does.  It walks all processes on the system and if the
>> working directory or the root directory refer to the root mount that is
>> being replaced, then pivot_root(2) will update them.
>>
>> In practice the above is limited to a mount namespace.  But something as
>> simple as "cd /proc//root" can allow a process to have a
>> working directory in a different mount namespace.
> 
> So, I'm not quite clear. Do you mean that something in the existing
> ma

Re: [PATCH v2] HID: core: check whether usage page item is after usage id item

2019-10-10 Thread Nicolas Saenz Julienne

On Thu, 2019-10-10 at 11:05 +0800, Candle Sun wrote:
> > > -static void hid_concatenate_usage_page(struct hid_parser *parser)
> > > +static void hid_concatenate_last_usage_page(struct hid_parser *parser)
> > >  {
> > >   int i;
> > > + unsigned int usage;
> > > + unsigned int usage_page = parser->global.usage_page;
> > > +
> > > + if (!parser->local.usage_page_last)
> > > + return;
> > > 
> > >   for (i = 0; i < parser->local.usage_index; i++)
> > 
> > Technically correct but it's preferred if you use braces here.
> > 
> 
> Nicolas, do you mean this:
> 
> @@ -563,12 +563,13 @@ static void
> hid_concatenate_last_usage_page(struct hid_parser *parser)
> if (!parser->local.usage_page_last)
> return;
> 
> -   for (i = 0; i < parser->local.usage_index; i++)
> +   for (i = 0; i < parser->local.usage_index; i++) {
> if (parser->local.usage_size[i] <= 2) {
> usage = parser->local.usage[i];
> parser->local.usage[i] =
> GET_COMPLETE_USAGE(usage_page, usage);
> }
> +   }
>  }

Yes, thanks!

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part

[PATCH] usb: mtk-xhci: Set the XHCI_NO_64BIT_SUPPORT quirk

2019-10-10 Thread Tomasz Figa

MediaTek XHCI host controller does not support 64-bit addressing despite
the AC64 bit of HCCPARAMS1 register being set. The platform-specific
glue sets the DMA mask to 32 bits on its own, but it has no effect,
because xhci_gen_setup() overrides it according to hardware
capabilities.

Use the XHCI_NO_64BIT_SUPPORT quirk to tell the XHCI core to force
32-bit DMA mask instead.

Signed-off-by: Tomasz Figa 
---
 drivers/usb/host/xhci-mtk.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/usb/host/xhci-mtk.c b/drivers/usb/host/xhci-mtk.c
index b18a6baef204a..4d101d52cc11b 100644
--- a/drivers/usb/host/xhci-mtk.c
+++ b/drivers/usb/host/xhci-mtk.c
@@ -395,6 +395,11 @@ static void xhci_mtk_quirks(struct device *dev, struct 
xhci_hcd *xhci)
xhci->quirks |= XHCI_SPURIOUS_SUCCESS;
if (mtk->lpm_support)
xhci->quirks |= XHCI_LPM_SUPPORT;
+   /*
+* MTK host controller does not support 64-bit addressing, despite
+* having the AC64 bit of the HCCPARAMS1 register set.
+*/
+   xhci->quirks |= XHCI_NO_64BIT_SUPPORT;
 }
 
 /* called during probe() after chip reset completes */
@@ -488,11 +493,6 @@ static int xhci_mtk_probe(struct platform_device *pdev)
goto disable_clk;
}
 
-   /* Initialize dma_mask and coherent_dma_mask to 32-bits */
-   ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
-   if (ret)
-   goto disable_clk;
-
hcd = usb_create_hcd(driver, dev, dev_name(dev));
if (!hcd) {
ret = -ENOMEM;
-- 
2.23.0.581.g78d2f28ef7-goog

Re: [PATCH v17 01/14] bitops: Introduce the for_each_set_clump8 macro

2019-10-10 Thread Geert Uytterhoeven

Hi Andy,

On Thu, Oct 10, 2019 at 9:42 AM Andy Shevchenko
 wrote:
> On Thu, Oct 10, 2019 at 9:29 AM Geert Uytterhoeven  
> wrote:
> > On Thu, Oct 10, 2019 at 7:49 AM Andy Shevchenko
> >  wrote:
> > > On Thu, Oct 10, 2019 at 5:31 AM Masahiro Yamada
> > >  wrote:
> > > > On Thu, Oct 10, 2019 at 3:54 AM Geert Uytterhoeven 
> > > >  wrote:
> > > > > On Wed, Oct 9, 2019 at 7:09 PM Andy Shevchenko
> > > > >  wrote:
> > > > > > On Thu, Oct 10, 2019 at 01:28:08AM +0900, Masahiro Yamada wrote:
> > > > > > > On Thu, Oct 10, 2019 at 12:27 AM William Breathitt Gray
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > This macro iterates for each 8-bit group of bits (clump) with 
> > > > > > > > set bits,
> > > > > > > > within a bitmap memory region. For each iteration, "start" is 
> > > > > > > > set to the
> > > > > > > > bit offset of the found clump, while the respective clump value 
> > > > > > > > is
> > > > > > > > stored to the location pointed by "clump". Additionally, the
> > > > > > > > bitmap_get_value8 and bitmap_set_value8 functions are 
> > > > > > > > introduced to
> > > > > > > > respectively get and set an 8-bit value in a bitmap memory 
> > > > > > > > region.
> > > > > >
> > > > > > > Why is the return type "unsigned long" where you know
> > > > > > > it return the 8-bit value ?
> > > > > >
> > > > > > Because bitmap API operates on unsigned long type. This is not only
> > > > > > consistency, but for sake of flexibility in case we would like to 
> > > > > > introduce
> > > > > > more calls like clump16 or so.
> > > > >
> > > > > TBH, that doesn't convince me: those functions explicitly take/return 
> > > > > an
> > > > > 8-bit value, and have "8" in their name.  The 8-bit value is never
> > > > > really related to, retrieved from, or stored in a full "unsigned long"
> > > > > element of a bitmap, only to/from/in a part (byte) of it.
> > > > >
> > > > > Following your rationale, all of iowrite{8,16,32,64}*() should take an
> > > > > "unsigned long" value, too.
> > > > >
> > > >
> > > > +1
> > > >
> > > > Using u8/u16/u32/u64 looks more consistent with other bitmap helpers.
> > > >
> > > > void bitmap_from_arr32(unsigned long *bitmap, const u32 *buf, unsigned
> > > > int nbits);
> > > > void bitmap_to_arr32(u32 *buf, const unsigned long *bitmap, unsigned 
> > > > int nbits);
> > > > static inline void bitmap_from_u64(unsigned long *dst, u64 mask);
> > > >
> > > >
> > > >
> > > > If you want to see more examples from other parts,
> > >
> > > Geert's and yours examples both are not related. They are about
> > > fixed-width properies when we know that is the part of protocol.
> > > Here we have no protocol which stricts us to the mentioned fixed-width 
> > > types.
> >
> > Yes you have: they are functions to store/retrieve an 8-bit value from
> > the middle of the bitmap, which is reflected in their names ("clump8",
> > "value8").
> > The input/output value is clearly separated from the actual bitmap,
> > which is referenced by the "unsigned long *".
> >
> > If you add new "value16" functions, they will be intended to store/retrieve
> > 16-bit values.
>
> And if I add 4-bit, 12-bit or 24-bit values, what should I use?

Whatever is needed to store that?
I agree "unsigned long" is appropriate for a generic function to extract a
bit field of 1 to BITS_PER_LONG bits.

> > Besides, if retrieving an 8-bit value requires passing an
> > "unsigned long *", the caller needs two variables: one unsigned long to
> > pass the address of, and one u8 to copy the returned value into.
>
> Why do you need a temporary variable? In some cases it might make
> sense, but in general simple cases I don't see what you may achieve
> with it.

Because find_next_clump8() takes a pointer to store the output value.

> I looked at bitmap.h and see few functions may have benefited of
> actually eliminating a use of long -> u8 -> long conversion.
>
> Here is the question what we are mostly doing after we got a clump out
> of bitmap.

If I call find_next_clump8() to extract a byte, I guess I want to process an u8
aftwerwards?

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure()

2019-10-10 Thread David Hildenbrand

On 10.10.19 09:35, Michal Hocko wrote:
> On Thu 10-10-19 09:27:32, David Hildenbrand wrote:
>> On 09.10.19 16:43, Michal Hocko wrote:
>>> On Wed 09-10-19 16:24:35, David Hildenbrand wrote:
 We should check for pfn_to_online_page() to not access uninitialized
 memmaps. Reshuffle the code so we don't have to duplicate the error
 message.

 Cc: Naoya Horiguchi 
 Cc: Andrew Morton 
 Cc: Michal Hocko 
 Signed-off-by: David Hildenbrand 
 ---
  mm/memory-failure.c | 14 --
  1 file changed, 8 insertions(+), 6 deletions(-)

 diff --git a/mm/memory-failure.c b/mm/memory-failure.c
 index 7ef849da8278..e866e6e5660b 100644
 --- a/mm/memory-failure.c
 +++ b/mm/memory-failure.c
 @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags)
if (!sysctl_memory_failure_recovery)
panic("Memory failure on page %lx", pfn);
  
 -  if (!pfn_valid(pfn)) {
 +  p = pfn_to_online_page(pfn);
 +  if (!p) {
 +  if (pfn_valid(pfn)) {
 +  pgmap = get_dev_pagemap(pfn, NULL);
 +  if (pgmap)
 +  return memory_failure_dev_pagemap(pfn, flags,
 +pgmap);
 +  }
pr_err("Memory failure: %#lx: memory outside kernel control\n",
pfn);
return -ENXIO;
>>>
>>> Don't we need that earlier at hwpoison_inject level?
>>>
>>
>> Theoretically yes, this is another instance. But pfn_to_online_page(pfn)
>> alone would not be sufficient as discussed. We would, again, have to
>> special-case ZONE_DEVICE via things like get_dev_pagemap() ...
>>
>> But mm/hwpoison-inject.c:hwpoison_inject() is a pure debug feature either 
>> way:
>>
>>  /*
>>   * Note that the below poison/unpoison interfaces do not involve
>>   * hardware status change, hence do not require hardware support.
>>   * They are mainly for testing hwpoison in software level.
>>   */
>>
>> So it's not that bad compared to memory_failure() called from real HW or
>> drivers/base/memory.c:soft_offline_page_store()/hard_offline_page_store()
> 
> Yes, this is just a toy. And yes we need to handle zone device pages
> here because a) people likely want to test MCE behavior even on these
> pages and b) HW can really trigger MCEs there as well. I was just
> pointing that the patch is likely incomplete.
> 

I rather think this deserves a separate patch as it is a separate
interface :)

I do wonder why hwpoison_inject() has to perform so much extra work
compared to other memory_failure() users. This smells like legacy
leftovers to me, but I might be wrong. The interface is fairly old,
though. Does anybody know why we need this magic? I can spot quite some
duplicate checks/things getting performed.

Naiive me would just make the interface perform the same as
hard_offline_page_store(). But most probably I am not getting the real
purpose of both different interfaces.

HWPOISON_INJECT is only selected for DEBUG_KERNEL, so I would have
guessed that fixing this is not urgent.

BTW: mm/memory-failure.c:soft_offline_page() also looks wrong and needs
fixing to make sure we access initialized memmaps.

-- 

Thanks,

David / dhildenb

Re: [PATCH v8 4/4] ftrace: Add an option for tracing console latencies

2019-10-10 Thread Viktor Rosendahl

Den ons 9 okt. 2019 kl 20:08 skrev Steven Rostedt :
>
> On Wed, 9 Oct 2019 16:51:07 +0200
> Viktor Rosendahl  wrote:
>
> > Apologies, I should have replied there but I have been a bit busy the
> > last few days with other topics, and I felt a bit indecisive about
> > that point, so I just sloppily addressed the issue in the cover letter
> > of this new series:
> >
> > "I have retained the fourth patch, although it was suggested that is 
> > becoming
> > obsolete soon. I have retained it only because I do not know the status of
> > the code that will make it obsolete. It's the last patch of the series and
> > if there indeed is some code that will remove the latency issues from the
> > printk code, then of course it makes sense to drop it. The first three 
> > patches
> > will work without it."
> >
> > I thought that, since it's the last in the series, it would be
> > possible for maintainers to just take the first three if the last one
> > is not wanted.
> >
> > For me it solves a rather important problem though, so if the code
> > that will make it obsolete isn't available for some time, then perhaps
> > it should be considered as a temporary solution.
> >
> > Of course, if there is a commitment to never remove any knobs from
> > /sys/kernel/debug/tracing/trace_options, then I can easily understand
> > that it's not wanted as a temporary fix.
>
> There isn't quite a commitment to not remove knobs, but if a tool
> starts relying on a knob, then, the answer is yes there is :-p
>
> As this will hopefully become something we don't worry about in the
> future, I would rather have this be a kernel command line option. This
> way, it wont be something that tools can really check for.
>
> If you add a trace_console_latency option to the kernel command line,
> we can enable it that way. And then when it becomes obsolete, we can
> simply remove it, without breaking any tools.
>
> Would that work for you?
>

Sounds good to me. I will try to adjust the patch accordingly within a few days.

best regards,

Viktor

> -- Steve

Re: [PATCH v2] mm/page_isolation: fix a deadlock with printk()

2019-10-10 Thread Petr Mladek

On Wed 2019-10-09 10:46:14, Qian Cai wrote:
> On Wed, 2019-10-09 at 16:24 +0200, Petr Mladek wrote:
> > On Wed 2019-10-09 09:43:13, Qian Cai wrote:
> > > On Wed, 2019-10-09 at 15:27 +0200, Michal Hocko wrote:
> > > > On Wed 09-10-19 09:06:42, Qian Cai wrote:
> > > > [...]
> > > > > https://lore.kernel.org/linux-mm/1570460350.5576.290.ca...@lca.pw/
> > > > > 
> > > > > [  297.425964] -> #1 (&port_lock_key){-.-.}:
> > > > > [  297.425967]__lock_acquire+0x5b3/0xb40
> > > > > [  297.425967]lock_acquire+0x126/0x280
> > > > > [  297.425968]_raw_spin_lock_irqsave+0x3a/0x50
> > > > > [  297.425969]serial8250_console_write+0x3e4/0x450
> > > > > [  297.425970]univ8250_console_write+0x4b/0x60
> > > > > [  297.425970]console_unlock+0x501/0x750
> > > > > [  297.425971]vprintk_emit+0x10d/0x340
> > > > > [  297.425972]vprintk_default+0x1f/0x30
> > > > > [  297.425972]vprintk_func+0x44/0xd4
> > > > > [  297.425973]printk+0x9f/0xc5
> > > > > [  297.425974]register_console+0x39c/0x520
> > > > > [  297.425975]univ8250_console_init+0x23/0x2d
> > > > > [  297.425975]console_init+0x338/0x4cd
> > > > > [  297.425976]start_kernel+0x534/0x724
> > > > > [  297.425977]x86_64_start_reservations+0x24/0x26
> > > > > [  297.425977]x86_64_start_kernel+0xf4/0xfb
> > > > > [  297.425978]secondary_startup_64+0xb6/0xc0
> > > > > 
> > > > > where the report again show the early boot call trace for the locking
> > > > > dependency,
> > > > > 
> > > > > console_owner --> port_lock_key
> > > > > 
> > > > > but that dependency clearly not only happen in the early boot.
> > > > 
> > > > Can you provide an example of the runtime dependency without any early
> > > > boot artifacts? Because this discussion really doens't make much sense
> > > > without a clear example of a _real_ lockdep report that is not a false
> > > > possitive. All of them so far have been concluded to be false possitive
> > > > AFAIU.
> > > 
> > > An obvious one is in the above link. Just replace the trace in #1 above 
> > > with
> > > printk() from anywhere, i.e., just ignore the early boot calls there as 
> > > they are
> > >  not important.
> > > 
> > > printk()
> > >   console_unlock()
> > > console_lock_spinning_enable() --> console_owner_lock
> > >   call_console_drivers()
> > > serial8250_console_write() --> port->lock
> > 
> > Please, find the location where this really happens and then suggests
> > how the real deadlock could get fixed. So far, we have seen only
> > false positives and theoretical scenarios.
> 
> Now the bar is higher again. You are now asking me to actually trigger this
> potential deadlock live. I am probably better off buying some lottery tickets
> then if I could be that lucky.

No, we just do not want to comlicate the code too much just to hide
false positives from lockdep.

I do not ask you to reproduce the deadlock. I ask you to find
a code path where the deadlock might really happen. It seems
that you actually found one in the tty code in the other mail.

Best Regards,
Petr

Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure()

2019-10-10 Thread David Hildenbrand

On 10.10.19 09:52, David Hildenbrand wrote:
> On 10.10.19 09:35, Michal Hocko wrote:
>> On Thu 10-10-19 09:27:32, David Hildenbrand wrote:
>>> On 09.10.19 16:43, Michal Hocko wrote:
 On Wed 09-10-19 16:24:35, David Hildenbrand wrote:
> We should check for pfn_to_online_page() to not access uninitialized
> memmaps. Reshuffle the code so we don't have to duplicate the error
> message.
>
> Cc: Naoya Horiguchi 
> Cc: Andrew Morton 
> Cc: Michal Hocko 
> Signed-off-by: David Hildenbrand 
> ---
>  mm/memory-failure.c | 14 --
>  1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 7ef849da8278..e866e6e5660b 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags)
>   if (!sysctl_memory_failure_recovery)
>   panic("Memory failure on page %lx", pfn);
>  
> - if (!pfn_valid(pfn)) {
> + p = pfn_to_online_page(pfn);
> + if (!p) {
> + if (pfn_valid(pfn)) {
> + pgmap = get_dev_pagemap(pfn, NULL);
> + if (pgmap)
> + return memory_failure_dev_pagemap(pfn, flags,
> +   pgmap);
> + }
>   pr_err("Memory failure: %#lx: memory outside kernel control\n",
>   pfn);
>   return -ENXIO;

 Don't we need that earlier at hwpoison_inject level?

>>>
>>> Theoretically yes, this is another instance. But pfn_to_online_page(pfn)
>>> alone would not be sufficient as discussed. We would, again, have to
>>> special-case ZONE_DEVICE via things like get_dev_pagemap() ...
>>>
>>> But mm/hwpoison-inject.c:hwpoison_inject() is a pure debug feature either 
>>> way:
>>>
>>> /*
>>>  * Note that the below poison/unpoison interfaces do not involve
>>>  * hardware status change, hence do not require hardware support.
>>>  * They are mainly for testing hwpoison in software level.
>>>  */
>>>
>>> So it's not that bad compared to memory_failure() called from real HW or
>>> drivers/base/memory.c:soft_offline_page_store()/hard_offline_page_store()
>>
>> Yes, this is just a toy. And yes we need to handle zone device pages
>> here because a) people likely want to test MCE behavior even on these
>> pages and b) HW can really trigger MCEs there as well. I was just
>> pointing that the patch is likely incomplete.
>>
> 
> I rather think this deserves a separate patch as it is a separate
> interface :)
> 
> I do wonder why hwpoison_inject() has to perform so much extra work
> compared to other memory_failure() users. This smells like legacy
> leftovers to me, but I might be wrong. The interface is fairly old,
> though. Does anybody know why we need this magic? I can spot quite some
> duplicate checks/things getting performed.
> 
> Naiive me would just make the interface perform the same as
> hard_offline_page_store(). But most probably I am not getting the real
> purpose of both different interfaces.
> 
> HWPOISON_INJECT is only selected for DEBUG_KERNEL, so I would have
> guessed that fixing this is not urgent.
> 
> BTW: mm/memory-failure.c:soft_offline_page() also looks wrong and needs
> fixing to make sure we access initialized memmaps.
> 

To be more precise, soft_offline_page_store() needs a
pfn_to_online_page() check. Will send a patch.

-- 

Thanks,

David / dhildenb

Re: [PATCH v1 0/2] perf stat: Support --all-kernel and --all-user

2019-10-10 Thread Jiri Olsa

On Thu, Oct 10, 2019 at 02:46:36PM +0800, Jin, Yao wrote:
> 
> 
> On 10/1/2019 10:17 AM, Andi Kleen wrote:
> > > > I think it's useful. Makes it easy to do kernel/user break downs.
> > > > perf record should support the same.
> > > 
> > > Don't we have this already with:
> > > 
> > > [root@quaco ~]# perf stat -e 
> > > cycles:u,instructions:u,cycles:k,instructions:k -a -- sleep 1
> > 
> > This only works for simple cases. Try it for --topdown or multiple -M 
> > metrics.
> > 
> > -Andi
> > 
> 
> Hi Arnaldo, Jiri,
> 
> We think it should be very useful if --all-user / --all-kernel can be
> specified together, so that we can get a break down between user and kernel
> easily.
> 
> But yes, the patches for supporting this new semantics is much complicated
> than the patch which just follows original perf-record behavior. I fully
> understand this concern.
> 
> So if this new semantics can be accepted, that would be very good. But if
> you think the new semantics is too complicated, I'm also fine for posting a
> new patch which just follows the perf-record behavior.

I still need to think a bit more about this.. did you consider
other options like cloning of the perf_evlist/perf_evsel and
changing just the exclude* bits? might be event worse actualy ;-)

or maybe if we add modifier we could add extra events/groups
within the parser.. like:

  "{cycles,instructions}:A,{cache-misses,cache-references}:A,cycles:A"

but that might be still more complicated then what you did

also please add the perf record changes so we have same code
and logic for both if we are going to change it

jirka

[PATCH] mtd: maps: l440gx: Avoid print address to dmesg

2019-10-10 Thread Fuqian Huang

Avoid print the address of l440gx_map.virt every time l440gx init.

Signed-off-by: Fuqian Huang 
---
 drivers/mtd/maps/l440gx.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/mtd/maps/l440gx.c b/drivers/mtd/maps/l440gx.c
index 876f12f40018..e7e40bca82d1 100644
--- a/drivers/mtd/maps/l440gx.c
+++ b/drivers/mtd/maps/l440gx.c
@@ -86,7 +86,6 @@ static int __init init_l440gx(void)
return -ENOMEM;
}
simple_map_init(&l440gx_map);
-   printk(KERN_NOTICE "window_addr = 0x%08lx\n", (unsigned 
long)l440gx_map.virt);
 
/* Setup the pm iobase resource
 * This code should move into some kind of generic bridge
-- 
2.11.0

Re: [PATCH v17 01/14] bitops: Introduce the for_each_set_clump8 macro

2019-10-10 Thread Andy Shevchenko

On Thu, Oct 10, 2019 at 09:49:51AM +0200, Geert Uytterhoeven wrote:
> On Thu, Oct 10, 2019 at 9:42 AM Andy Shevchenko
>  wrote:
> > On Thu, Oct 10, 2019 at 9:29 AM Geert Uytterhoeven  
> > wrote:
> > > On Thu, Oct 10, 2019 at 7:49 AM Andy Shevchenko
> > >  wrote:
> > > > On Thu, Oct 10, 2019 at 5:31 AM Masahiro Yamada
> > > >  wrote:
> > > > > On Thu, Oct 10, 2019 at 3:54 AM Geert Uytterhoeven 
> > > > >  wrote:
> > > > > > On Wed, Oct 9, 2019 at 7:09 PM Andy Shevchenko
> > > > > >  wrote:
> > > > > > > On Thu, Oct 10, 2019 at 01:28:08AM +0900, Masahiro Yamada wrote:
> > > > > > > > On Thu, Oct 10, 2019 at 12:27 AM William Breathitt Gray
> > > > > > > >  wrote:

> > > > > > > > Why is the return type "unsigned long" where you know
> > > > > > > > it return the 8-bit value ?
> > > > > > >
> > > > > > > Because bitmap API operates on unsigned long type. This is not 
> > > > > > > only
> > > > > > > consistency, but for sake of flexibility in case we would like to 
> > > > > > > introduce
> > > > > > > more calls like clump16 or so.
> > > > > >
> > > > > > TBH, that doesn't convince me: those functions explicitly 
> > > > > > take/return an
> > > > > > 8-bit value, and have "8" in their name.  The 8-bit value is never
> > > > > > really related to, retrieved from, or stored in a full "unsigned 
> > > > > > long"
> > > > > > element of a bitmap, only to/from/in a part (byte) of it.
> > > > > >
> > > > > > Following your rationale, all of iowrite{8,16,32,64}*() should take 
> > > > > > an
> > > > > > "unsigned long" value, too.
> > > > >
> > > > > Using u8/u16/u32/u64 looks more consistent with other bitmap helpers.
> > > > >
> > > > > void bitmap_from_arr32(unsigned long *bitmap, const u32 *buf, unsigned
> > > > > int nbits);
> > > > > void bitmap_to_arr32(u32 *buf, const unsigned long *bitmap, unsigned 
> > > > > int nbits);
> > > > > static inline void bitmap_from_u64(unsigned long *dst, u64 mask);
> > > > >
> > > > > If you want to see more examples from other parts,
> > > >
> > > > Geert's and yours examples both are not related. They are about
> > > > fixed-width properies when we know that is the part of protocol.
> > > > Here we have no protocol which stricts us to the mentioned fixed-width 
> > > > types.
> > >
> > > Yes you have: they are functions to store/retrieve an 8-bit value from
> > > the middle of the bitmap, which is reflected in their names ("clump8",
> > > "value8").
> > > The input/output value is clearly separated from the actual bitmap,
> > > which is referenced by the "unsigned long *".
> > >
> > > If you add new "value16" functions, they will be intended to 
> > > store/retrieve
> > > 16-bit values.
> >
> > And if I add 4-bit, 12-bit or 24-bit values, what should I use?
> 
> Whatever is needed to store that?
> I agree "unsigned long" is appropriate for a generic function to extract a
> bit field of 1 to BITS_PER_LONG bits.
> 
> > > Besides, if retrieving an 8-bit value requires passing an
> > > "unsigned long *", the caller needs two variables: one unsigned long to
> > > pass the address of, and one u8 to copy the returned value into.
> >
> > Why do you need a temporary variable? In some cases it might make
> > sense, but in general simple cases I don't see what you may achieve
> > with it.
> 
> Because find_next_clump8() takes a pointer to store the output value.

So does regmap_read().

8 appeared there during review when it has been proposed to optimize to 8-bit
clumps as most of the current users utilize it. The initial idea was to be
bit-width agnostic. And with current API it's possible to easy convert to other
formats later if we need.

> > I looked at bitmap.h and see few functions may have benefited of
> > actually eliminating a use of long -> u8 -> long conversion.
> >
> > Here is the question what we are mostly doing after we got a clump out
> > of bitmap.
> 
> If I call find_next_clump8() to extract a byte, I guess I want to process an 
> u8
> aftwerwards?

Some functions may expect a width-(semi-)dependent types, like regmap_write().
Yes, it's possible to supply u8 there and have an implicit type cast.

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH RFC] perf_event: Add support for LSM and SELinux checks

2019-10-10 Thread Peter Zijlstra

On Wed, Oct 09, 2019 at 04:36:57PM -0400, Joel Fernandes (Google) wrote:
> In currentl mainline, the degree of access to perf_event_open(2) system
> call depends on the perf_event_paranoid sysctl.  This has a number of
> limitations:
> 
> 1. The sysctl is only a single value. Many types of accesses are controlled
>based on the single value thus making the control very limited and
>coarse grained.
> 2. The sysctl is global, so if the sysctl is changed, then that means
>all processes get access to perf_event_open(2) opening the door to
>security issues.
> 
> This patch adds LSM and SELinux access checking which will be used in
> Android to access perf_event_open(2) for the purposes of attaching BPF
> programs to tracepoints, perf profiling and other operations from
> userspace. These operations are intended for production systems.
> 
> 5 new LSM hooks are added:
> 1. perf_event_open: This controls access during the perf_event_open(2)
>syscall itself. The hook is called from all the places that the
>perf_event_paranoid sysctl is checked to keep it consistent with the
>systctl. The hook gets passed a 'type' argument which controls CPU,
>kernel and tracepoint accesses (in this context, CPU, kernel and
>tracepoint have the same semantics as the perf_event_paranoid sysctl).
>Additionally, I added an 'open' type which is similar to
>perf_event_paranoid sysctl == 3 patch carried in Android and several other
>distros but was rejected in mainline [1] in 2016.
> 
> 2. perf_event_alloc: This allocates a new security object for the event
>which stores the current SID within the event. It will be useful when
>the perf event's FD is passed through IPC to another process which may
>try to read the FD. Appropriate security checks will limit access.
> 
> 3. perf_event_free: Called when the event is closed.
> 
> 4. perf_event_read: Called from the read(2) system call path for the event.

+ mmap()
> 
> 5. perf_event_write: Called from the read(2) system call path for the event.

- read() + ioctl()

fresh from the keyboard.. but maybe consoldate things a little.

---
--- a/arch/x86/events/intel/bts.c
+++ b/arch/x86/events/intel/bts.c
@@ -14,7 +14,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include 
 #include 
@@ -550,13 +549,11 @@ static int bts_event_init(struct perf_ev
 * Note that the default paranoia setting permits unprivileged
 * users to profile the kernel.
 */
-   if (event->attr.exclude_kernel && perf_paranoid_kernel() &&
-   !capable(CAP_SYS_ADMIN))
-   return -EACCES;
-
-   ret = security_perf_event_open(&event->attr, PERF_SECURITY_KERNEL);
-   if (ret)
-   return ret;
+   if (event->attr.exclude_kernel) {
+   ret = perf_allow_kernel(&event->attr);
+   if (ret)
+   return ret;
+   }
 
if (x86_add_exclusive(x86_lbr_exclusive_bts))
return -EBUSY;
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -11,7 +11,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -3316,10 +3315,7 @@ static int intel_pmu_hw_config(struct pe
if (x86_pmu.version < 3)
return -EINVAL;
 
-   if (perf_paranoid_cpu() && !capable(CAP_SYS_ADMIN))
-   return -EACCES;
-
-   ret = security_perf_event_open(&event->attr, PERF_SECURITY_CPU);
+   ret = perf_allow_cpu(&event->attr);
if (ret)
return ret;
 
--- a/arch/x86/events/intel/p4.c
+++ b/arch/x86/events/intel/p4.c
@@ -8,7 +8,6 @@
  */
 
 #include 
-#include 
 
 #include 
 #include 
@@ -777,10 +776,7 @@ static int p4_validate_raw_event(struct
 * the user needs special permissions to be able to use it
 */
if (p4_ht_active() && p4_event_bind_map[v].shared) {
-   if (perf_paranoid_cpu() && !capable(CAP_SYS_ADMIN))
-   return -EACCES;
-
-   v = security_perf_event_open(&event->attr, PERF_SECURITY_CPU);
+   v = perf_allow_cpu(&event->attr);
if (v)
return v;
}
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -56,6 +56,7 @@ struct perf_guest_info_callbacks {
 #include 
 #include 
 #include 
+#include 
 #include 
 
 struct perf_callchain_entry {
@@ -1244,19 +1245,28 @@ extern int perf_cpu_time_max_percent_han
 int perf_event_max_stack_handler(struct ctl_table *table, int write,
 void __user *buffer, size_t *lenp, loff_t 
*ppos);
 
-static inline bool perf_paranoid_tracepoint_raw(void)
+static inline int perf_allow_kernel(struct perf_event_attr *attr)
 {
-   return sysctl_perf_event_paranoid > -1;
+   if (sysctl_perf_event_paranoid > 1 && !capable(CAP_SYS_ADMIN))
+   return -EACCES;
+
+   return security_perf_event_open(attr, PERF_SECURITY_KERNEL);
 }
 
-static in

Re: [PATCH v2] mm/page_isolation: fix a deadlock with printk()

2019-10-10 Thread Sergey Senozhatsky

On (10/10/19 09:40), Michal Hocko wrote:
[..]
> > > Considering that console.write is called from essentially arbitrary code
> > > path IIUC then all the locks used in this path should be pretty much
> > > tail locks or console internal ones without external dependencies.
> > 
> > That's a good expectation, but I guess it's not always the case.
> > 
> > One example might be NET console - net subsystem locks, net device
> > drivers locks, maybe even some MM locks (skb allocations?).
> 
> I am not familiar with the netconsole code TBH. If there is absolutely
> no way around that then we might have to bite a bullet and consider some
> of MM locks a land of no printk.

So this is what netconsole does (before we pass on udp to net device
driver code, which *may be* can do more allocations, I don't know):

write_msg()
 netpoll_send_udp()
  find_skb()
   alloc_skb(len, GFP_ATOMIC)
kmem_cache_alloc_node()

You are the right person to ask this question to - how many MM
locks are involved when we do GFP_ATOMIC kmem_cache allocation?
Is there anything to be concerned about?

> I have already said that in this thread. I am mostly pushing back
> on "let's just go the simplest way" approach.

I understand this. And don't have objections. Merely trying to get
better understanding.

> > But even more "commonly used" consoles sometimes break that
> > expectation. E.g. 8250
> >
> > serial8250_console_write()
> >  serial8250_modem_status()
> >   wake_up_interruptible()
>
> By that expectation you mean they are using external locks or that they
> really _need_ to allocate.

Sort of both. netconsole does allocations and has external lock
dependencies. Some of serial console drivers have external lock
dependencies (but don't do allocations, as far as I'm concerned).

> Because if you are pointing to wake_up_interruptible and therefore the rq
> then this is a well known thing and I was under impression even documented
> but I can only see LOGLEVEL_SCHED that is arguably a very obscure way to
> document the fact.

That's right. All I was commenting on is that console.write() callbacks
have external lock dependencies.

-ss

[PATCH v2 3/4] soc: amlogic: Add support for Secure power domains controller

2019-10-10 Thread Jianxin Pan

Add support for the Amlogic Secure Power controller. In A1/C1 series, power
control registers are in secure domain, and should be accessed by smc.

Signed-off-by: Jianxin Pan 
---
 drivers/soc/amlogic/Kconfig |  13 ++
 drivers/soc/amlogic/Makefile|   1 +
 drivers/soc/amlogic/meson-secure-pwrc.c | 203 
 3 files changed, 217 insertions(+)
 create mode 100644 drivers/soc/amlogic/meson-secure-pwrc.c

diff --git a/drivers/soc/amlogic/Kconfig b/drivers/soc/amlogic/Kconfig
index bc2c912..6cb06e7 100644
--- a/drivers/soc/amlogic/Kconfig
+++ b/drivers/soc/amlogic/Kconfig
@@ -48,6 +48,19 @@ config MESON_EE_PM_DOMAINS
  Say yes to expose Amlogic Meson Everything-Else Power Domains as
  Generic Power Domains.
 
+config MESON_SECURE_PM_DOMAINS
+   bool "Amlogic Meson Secure Power Domains driver"
+   depends on ARCH_MESON || COMPILE_TEST
+   depends on PM && OF
+   depends on HAVE_ARM_SMCCC
+   default ARCH_MESON
+   select PM_GENERIC_DOMAINS
+   select PM_GENERIC_DOMAINS_OF
+   help
+ Support for the power controller on Amlogic A1/C1 series.
+ Say yes to expose Amlogic Meson Secure Power Domains as Generic
+ Power Domains.
+
 config MESON_MX_SOCINFO
bool "Amlogic Meson MX SoC Information driver"
depends on ARCH_MESON || COMPILE_TEST
diff --git a/drivers/soc/amlogic/Makefile b/drivers/soc/amlogic/Makefile
index de79d044..7b8c5d3 100644
--- a/drivers/soc/amlogic/Makefile
+++ b/drivers/soc/amlogic/Makefile
@@ -5,3 +5,4 @@ obj-$(CONFIG_MESON_GX_SOCINFO) += meson-gx-socinfo.o
 obj-$(CONFIG_MESON_GX_PM_DOMAINS) += meson-gx-pwrc-vpu.o
 obj-$(CONFIG_MESON_MX_SOCINFO) += meson-mx-socinfo.o
 obj-$(CONFIG_MESON_EE_PM_DOMAINS) += meson-ee-pwrc.o
+obj-$(CONFIG_MESON_SECURE_PM_DOMAINS) += meson-secure-pwrc.o
diff --git a/drivers/soc/amlogic/meson-secure-pwrc.c 
b/drivers/soc/amlogic/meson-secure-pwrc.c
new file mode 100644
index ..25951cb
--- /dev/null
+++ b/drivers/soc/amlogic/meson-secure-pwrc.c
@@ -0,0 +1,203 @@
+// SPDX-License-Identifier: (GPL-2.0+ OR MIT)
+/*
+ * Copyright (c) 2019 Amlogic, Inc.
+ * Author: Jianxin Pan 
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define PWRC_ON1
+#define PWRC_OFF   0
+
+struct meson_secure_pwrc_domain {
+   struct generic_pm_domain base;
+   unsigned int index;
+   struct meson_secure_pwrc *pwrc;
+};
+
+struct meson_secure_pwrc {
+   struct meson_secure_pwrc_domain *domains;
+   struct genpd_onecell_data xlate;
+   struct meson_sm_firmware *fw;
+};
+
+struct meson_secure_pwrc_domain_desc {
+   unsigned int index;
+   unsigned int flags;
+   char *name;
+   bool (*is_off)(struct meson_secure_pwrc_domain *pwrc_domain);
+};
+
+struct meson_secure_pwrc_domain_data {
+   unsigned int count;
+   struct meson_secure_pwrc_domain_desc *domains;
+};
+
+static bool pwrc_secure_is_off(struct meson_secure_pwrc_domain *pwrc_domain)
+{
+   int sts = 1;
+
+   if (meson_sm_call(pwrc_domain->pwrc->fw, SM_PWRC_GET, &sts,
+ pwrc_domain->index, 0, 0, 0, 0) < 0)
+   pr_err("failed to get power domain status\n");
+
+   return !!sts;
+}
+
+static int meson_secure_pwrc_off(struct generic_pm_domain *domain)
+{
+   int sts = 0;
+   struct meson_secure_pwrc_domain *pwrc_domain =
+   container_of(domain, struct meson_secure_pwrc_domain, base);
+
+   if (meson_sm_call(pwrc_domain->pwrc->fw, SM_PWRC_SET, NULL,
+   pwrc_domain->index, PWRC_OFF, 0, 0, 0) < 0) {
+   pr_err("failed to set power domain off\n");
+   sts = -EINVAL;
+   }
+
+   return sts;
+}
+
+static int meson_secure_pwrc_on(struct generic_pm_domain *domain)
+{
+   int sts = 0;
+   struct meson_secure_pwrc_domain *pwrc_domain =
+   container_of(domain, struct meson_secure_pwrc_domain, base);
+
+   if (meson_sm_call(pwrc_domain->pwrc->fw, SM_PWRC_SET, NULL,
+   pwrc_domain->index, PWRC_ON, 0, 0, 0) < 0) {
+   pr_err("failed to set power domain on\n");
+   sts = -EINVAL;
+   }
+
+   return sts;
+}
+
+#define SEC_PD(__name, __flag) \
+{  \
+   .name = #__name,\
+   .index = PWRC_##__name##_ID,\
+   .is_off = pwrc_secure_is_off,   \
+   .flags = __flag,\
+}
+
+static struct meson_secure_pwrc_domain_desc a1_pwrc_domains[] = {
+   SEC_PD(DSPA,0),
+   SEC_PD(DSPB,0),
+   /* UART should keep working in ATF after suspend and before resume */
+   SEC_PD(UART,GENPD_FLAG_ALWAYS_ON),
+   /* DMC is for DDR PHY ana/dig and DMC, and should be always on */
+   SEC_PD(DMC, GENPD_FLAG_ALWAYS_ON),
+   SEC_PD(I2C, 0),
+

[PATCH v2 2/4] firmware: meson_sm: Add secure power domain support

2019-10-10 Thread Jianxin Pan

The Amlogic Meson A1/C1 Secure Monitor implements calls to control power
domain.

Signed-off-by: Jianxin Pan 
---
 drivers/firmware/meson/meson_sm.c   | 2 ++
 include/linux/firmware/meson/meson_sm.h | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/drivers/firmware/meson/meson_sm.c 
b/drivers/firmware/meson/meson_sm.c
index 1d5b4d7..7ec09f5 100644
--- a/drivers/firmware/meson/meson_sm.c
+++ b/drivers/firmware/meson/meson_sm.c
@@ -44,6 +44,8 @@ static const struct meson_sm_chip gxbb_chip = {
CMD(SM_EFUSE_WRITE, 0x8231),
CMD(SM_EFUSE_USER_MAX,  0x8233),
CMD(SM_GET_CHIP_ID, 0x8244),
+   CMD(SM_PWRC_SET,0x8293),
+   CMD(SM_PWRC_GET,0x8295),
{ /* sentinel */ },
},
 };
diff --git a/include/linux/firmware/meson/meson_sm.h 
b/include/linux/firmware/meson/meson_sm.h
index 6669e2a..4ed3989 100644
--- a/include/linux/firmware/meson/meson_sm.h
+++ b/include/linux/firmware/meson/meson_sm.h
@@ -12,6 +12,8 @@ enum {
SM_EFUSE_WRITE,
SM_EFUSE_USER_MAX,
SM_GET_CHIP_ID,
+   SM_PWRC_SET,
+   SM_PWRC_GET,
 };
 
 struct meson_sm_firmware;
-- 
2.7.4

[PATCH v2 4/4] arm64: dts: meson: a1: add secure power domain controller

2019-10-10 Thread Jianxin Pan

Enable power domain controller for Meson A1 SoC.

Signed-off-by: Jianxin Pan 
---
 arch/arm64/boot/dts/amlogic/meson-a1.dtsi | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm64/boot/dts/amlogic/meson-a1.dtsi 
b/arch/arm64/boot/dts/amlogic/meson-a1.dtsi
index 7210ad0..5547913 100644
--- a/arch/arm64/boot/dts/amlogic/meson-a1.dtsi
+++ b/arch/arm64/boot/dts/amlogic/meson-a1.dtsi
@@ -93,6 +93,13 @@
clock-names = "xtal", "pclk", "baud";
status = "disabled";
};
+
+   pwrc: power-controller {
+   compatible = "amlogic,meson-a1-pwrc";
+   #power-domain-cells = <1>;
+   secure-monitor = <&sm>;
+   status = "okay";
+   };
};
 
gic: interrupt-controller@ff901000 {
-- 
2.7.4

[PATCH v2 0/4] arm64: meson: add support for A1 Power Domains

2019-10-10 Thread Jianxin Pan

This patchset introduces a "Secure Power Doamin Controller". In A1/C1, power
controller registers such as PWRCTRL_FOCRSTN, PWRCTRL_PWR_OFF, PWRCTRL_MEM_PD
and PWRCTRL_ISO_EN, are in the secure domain, and should be accessed from ATF
by smc.

Changes since v1 at [0]:
- use APIs from sm driver
- rename pwrc_secure_get_power as Kevin suggested
- add comments for always on domains
- replace arch_initcall_sync with builtin_platform_driver
- fix coding style

[0]  
https://lore.kernel.org/linux-amlogic/1568895064-4116-1-git-send-email-jianxin@amlogic.com

Jianxin Pan (4):
  dt-bindings: power: add Amlogic secure power domains bindings
  firmware: meson_sm: Add secure power domain support
  soc: amlogic: Add support for Secure power domains controller
  arm64: dts: meson: a1: add secure power domain controller

 .../bindings/power/amlogic,meson-sec-pwrc.yaml |  42 +
 arch/arm64/boot/dts/amlogic/meson-a1.dtsi  |   7 +
 drivers/firmware/meson/meson_sm.c  |   2 +
 drivers/soc/amlogic/Kconfig|  13 ++
 drivers/soc/amlogic/Makefile   |   1 +
 drivers/soc/amlogic/meson-secure-pwrc.c| 203 +
 include/dt-bindings/power/meson-a1-power.h |  32 
 include/linux/firmware/meson/meson_sm.h|   2 +
 8 files changed, 302 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/power/amlogic,meson-sec-pwrc.yaml
 create mode 100644 drivers/soc/amlogic/meson-secure-pwrc.c
 create mode 100644 include/dt-bindings/power/meson-a1-power.h

-- 
2.7.4

[PATCH v2 1/4] dt-bindings: power: add Amlogic secure power domains bindings

2019-10-10 Thread Jianxin Pan

Add the bindings for the Amlogic Secure power domains, controlling the
secure power domains.

The bindings targets the Amlogic A1 and C1 compatible SoCs, in which the
power domain registers are in secure world.

Signed-off-by: Jianxin Pan 
---
 .../bindings/power/amlogic,meson-sec-pwrc.yaml | 42 ++
 include/dt-bindings/power/meson-a1-power.h | 32 +
 2 files changed, 74 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/power/amlogic,meson-sec-pwrc.yaml
 create mode 100644 include/dt-bindings/power/meson-a1-power.h

diff --git 
a/Documentation/devicetree/bindings/power/amlogic,meson-sec-pwrc.yaml 
b/Documentation/devicetree/bindings/power/amlogic,meson-sec-pwrc.yaml
new file mode 100644
index ..88d8261
--- /dev/null
+++ b/Documentation/devicetree/bindings/power/amlogic,meson-sec-pwrc.yaml
@@ -0,0 +1,42 @@
+# SPDX-License-Identifier: (GPL-2.0+ OR MIT)
+# Copyright (c) 2019 Amlogic, Inc
+# Author: Jianxin Pan 
+%YAML 1.2
+---
+$id: "http://devicetree.org/schemas/power/amlogic,meson-sec-pwrc.yaml#";
+$schema: "http://devicetree.org/meta-schemas/core.yaml#";
+
+title: Amlogic Meson Secure Power Domains
+
+maintainers:
+  - Jianxin Pan 
+
+description: |+
+  Meson Secure Power Domains used in A1/C1 SoCs.
+
+properties:
+  compatible:
+enum:
+  - amlogic,meson-a1-pwrc
+
+  "#power-domain-cells":
+const: 1
+
+  secure-monitor:
+description: phandle to the secure-monitor node
+$ref: /schemas/types.yaml#/definitions/phandle
+
+required:
+  - compatible
+  - "#power-domain-cells"
+  - secure-monitor
+
+examples:
+  - |
+pwrc: power-controller {
+  compatible = "amlogic,meson-a1-pwrc";
+  #power-domain-cells = <1>;
+  secure-monitor = <&sm>;
+};
+
+
diff --git a/include/dt-bindings/power/meson-a1-power.h 
b/include/dt-bindings/power/meson-a1-power.h
new file mode 100644
index ..6cf50bf
--- /dev/null
+++ b/include/dt-bindings/power/meson-a1-power.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: (GPL-2.0+ or MIT) */
+/*
+ * Copyright (c) 2019 Amlogic, Inc.
+ * Author: Jianxin Pan 
+ */
+
+#ifndef _DT_BINDINGS_MESON_A1_POWER_H
+#define _DT_BINDINGS_MESON_A1_POWER_H
+
+#define PWRC_DSPA_ID   8
+#define PWRC_DSPB_ID   9
+#define PWRC_UART_ID   10
+#define PWRC_DMC_ID11
+#define PWRC_I2C_ID12
+#define PWRC_PSRAM_ID  13
+#define PWRC_ACODEC_ID 14
+#define PWRC_AUDIO_ID  15
+#define PWRC_OTP_ID16
+#define PWRC_DMA_ID17
+#define PWRC_SD_EMMC_ID18
+#define PWRC_RAMA_ID   19
+#define PWRC_RAMB_ID   20
+#define PWRC_IR_ID 21
+#define PWRC_SPICC_ID  22
+#define PWRC_SPIFC_ID  23
+#define PWRC_USB_ID24
+#define PWRC_NIC_ID25
+#define PWRC_PDMIN_ID  26
+#define PWRC_RSA_ID27
+#define PWRC_MAX_ID28
+
+#endif
-- 
2.7.4

Re: [PATCH RT 5/8] sched/deadline: Reclaim cpuset bandwidth in .migrate_task_rq()

2019-10-10 Thread Juri Lelli

On 09/10/19 14:12, Scott Wood wrote:
> On Wed, 2019-10-09 at 09:27 +0200, Juri Lelli wrote:
> > On 09/10/19 01:25, Scott Wood wrote:
> > > On Tue, 2019-10-01 at 10:52 +0200, Juri Lelli wrote:
> > > > On 30/09/19 11:24, Scott Wood wrote:
> > > > > On Mon, 2019-09-30 at 09:12 +0200, Juri Lelli wrote:
> > > > 
> > > > [...]
> > > > 
> > > > > > Hummm, I was actually more worried about the fact that we call
> > > > > > free_old_
> > > > > > cpuset_bw_dl() only if p->state != TASK_WAKING.
> > > > > 
> > > > > Oh, right. :-P  Not sure what I had in mind there; we want to call
> > > > > it
> > > > > regardless.
> > > > > 
> > > > > I assume we need rq->lock in free_old_cpuset_bw_dl()?  So something
> > > > > like
> > > > 
> > > > I think we can do with rcu_read_lock_sched() (see
> > > > dl_task_can_attach()).
> > > 
> > > RCU will keep dl_bw from being freed under us (we're implicitly in an
> > > RCU
> > > sched read section due to atomic context).  It won't stop rq->rd from
> > > changing, but that could have happened before we took rq->lock.  If the
> > > cpu
> > > the task was running on was removed from the cpuset, and that raced with
> > > the
> > > task being moved to a different cpuset, couldn't we end up erroneously
> > > subtracting from the cpu's new root domain (or failing to subtract at
> > > all if
> > > the old cpu's new cpuset happens to be the task's new cpuset)?  I don't
> > > see
> > > anything that forces tasks off of the cpu when a cpu is removed from a
> > > cpuset (though maybe I'm not looking in the right place), so the race
> > > window
> > > could be quite large.  In any case, that's an existing problem that's
> > > not
> > > going to get solved in this patchset.
> > 
> > OK. So, mainline has got cpuset_read_lock() which should be enough to
> > guard against changes to rd(s).
> > 
> > I agree that rq->lock is needed here.
> 
> My point was that rq->lock isn't actually helping, because rq->rd could have
> changed before rq->lock is acquired (and it's still the old rd that needs
> the bandwidth subtraction).  cpuset_mutex/cpuset_rwsem helps somewhat,
> though there's still a problem due to the subtraction not happening until
> the task is woken up (by which time cpuset_mutex may have been released and
> further reconfiguration could have happened).  This would be an issue even
> without lazy migrate, since in that case ->set_cpus_allowed() can get
> deferred, but this patch makes the window much bigger.
> 
> The right solution is probably to explicitly track the rd for which a task
> has a pending bandwidth subtraction (if any), and to continue doing it from
> set_cpus_allowed() if the task is not migrate-disabled.  In the meantime, I
> think we should drop this patch from the patchset -- without it, lazy
> migrate disable doesn't really make the race situation any worse than it
> already was.

I'm OK with dropping it for now (as we also have other possible issues
as you point out below), but I really wonder what would be a solution
here. Problem is that if a domain(s) reconfiguration happened while the
task was migrate disabled, and we let the reconf destroy/rebuild
domains, the old rd could be gone by the time the task gets migrate
enabled again and the task could continue running, w/o its bandwidth
been accounted for, in a new rd since the migrate enable instant, no?

:-/

> BTW, what happens to the bw addition in dl_task_can_attach() if a subsequent
> can_attach fails and the whole operation is cancelled?

Oh, yeah, that doesn't look good. :-(

Maybe we can use cancel_attach() to fix things up?

Thanks,

Juri

Re: [PATCH v3 0/8] sched/fair: rework the CFS load balance

2019-10-10 Thread Vincent Guittot

On Wed, 9 Oct 2019 at 21:33, Phil Auld  wrote:
>
> On Tue, Oct 08, 2019 at 05:53:11PM +0200 Vincent Guittot wrote:
> > Hi Phil,
> >
>
> ...
>
> > While preparing v4, I have noticed that I have probably oversimplified
> > the end of find_idlest_group() in patch "sched/fair: optimize
> > find_idlest_group" when it compares local vs the idlest other group.
> > Especially, there were a NUMA specific test that I removed in v3 and
> > re added in v4.
> >
> > Then, I'm also preparing a full rework that find_idlest_group() which
> > will behave more closely to load_balance; I mean : collect statistics,
> > classify group then selects the idlest
> >
>
> Okay, I'll watch for V4 and restest. It really seems to be limited to
> the 8-node system. None of the other systems are showing this.

Thanks for the information. This system might generate statistics at
boundaries between 2 behaviors and task placement misbehave

>
>
> > What is the behavior of lu.C Thread ? are they waking up a lot  ?and
> > could trigger the slow wake path ?
>
> Yes, probably a fair bit of waking. It's an interative equation solving
> code. It's fairly CPU intensive but requires communication for dependent
> calculations.  That's part of why having them mis-balanced causes such a
> large slow down. I think at times everyone else waits for the slow guys.
>
>
> >
> > >
> > > The initial fixes I made for this issue did not exhibit this behavior. 
> > > They
> > > would have had other issues dealing with overload cases though. In this 
> > > case
> > > however there are only 154 or 158 threads on 160 CPUs so not overloaded.
> > >
> > > I'll try to get my hands on this system and poke into it. I just wanted 
> > > to get
> > > your thoughts and let you know where we are.
> >
> > Thanks for testing
> >
>
> Sure!
>
>
> Thanks,
> Phil
>
>
>
>
> > Vincent
> >
> > >
> > >
> > >
> > > Thanks,
> > > Phil
> > >
> > >
> > > System details:
> > >
> > > Architecture:x86_64
> > > CPU op-mode(s):  32-bit, 64-bit
> > > Byte Order:  Little Endian
> > > CPU(s):  160
> > > On-line CPU(s) list: 0-159
> > > Thread(s) per core:  2
> > > Core(s) per socket:  10
> > > Socket(s):   8
> > > NUMA node(s):8
> > > Vendor ID:   GenuineIntel
> > > CPU family:  6
> > > Model:   47
> > > Model name:  Intel(R) Xeon(R) CPU E7- 4870  @ 2.40GHz
> > > Stepping:2
> > > CPU MHz: 1063.934
> > > BogoMIPS:4787.73
> > > Virtualization:  VT-x
> > > L1d cache:   32K
> > > L1i cache:   32K
> > > L2 cache:256K
> > > L3 cache:30720K
> > > NUMA node0 CPU(s):   0-9,80-89
> > > NUMA node1 CPU(s):   10-19,90-99
> > > NUMA node2 CPU(s):   20-29,100-109
> > > NUMA node3 CPU(s):   30-39,110-119
> > > NUMA node4 CPU(s):   40-49,120-129
> > > NUMA node5 CPU(s):   50-59,130-139
> > > NUMA node6 CPU(s):   60-69,140-149
> > > NUMA node7 CPU(s):   70-79,150-159
> > > Flags:   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
> > > mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe 
> > > syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good 
> > > nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor 
> > > ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt 
> > > aes lahf_lm epb pti tpr_shadow vnmi flexpriority ept vpid dtherm ida arat
> > >
> > > $ numactl --hardware
> > > available: 8 nodes (0-7)
> > > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 80 81 82 83 84 85 86 87 88 89
> > > node 0 size: 64177 MB
> > > node 0 free: 60866 MB
> > > node 1 cpus: 10 11 12 13 14 15 16 17 18 19 90 91 92 93 94 95 96 97 98 99
> > > node 1 size: 64507 MB
> > > node 1 free: 61167 MB
> > > node 2 cpus: 20 21 22 23 24 25 26 27 28 29 100 101 102 103 104 105 106 
> > > 107 108
> > > 109
> > > node 2 size: 64507 MB
> > > node 2 free: 61250 MB
> > > node 3 cpus: 30 31 32 33 34 35 36 37 38 39 110 111 112 113 114 115 116 
> > > 117 118
> > > 119
> > > node 3 size: 64507 MB
> > > node 3 free: 61327 MB
> > > node 4 cpus: 40 41 42 43 44 45 46 47 48 49 120 121 122 123 124 125 126 
> > > 127 128
> > > 129
> > > node 4 size: 64507 MB
> > > node 4 free: 60993 MB
> > > node 5 cpus: 50 51 52 53 54 55 56 57 58 59 130 131 132 133 134 135 136 
> > > 137 138
> > > 139
> > > node 5 size: 64507 MB
> > > node 5 free: 60892 MB
> > > node 6 cpus: 60 61 62 63 64 65 66 67 68 69 140 141 142 143 144 145 146 
> > > 147 148
> > > 149
> > > node 6 size: 64507 MB
> > > node 6 free: 61139 MB
> > > node 7 cpus: 70 71 72 73 74 75 76 77 78 79 150 151 152 153 154 155 156 
> > > 157 158
> > > 159
> > > node 7 size: 64480 MB
> > > node 7 free: 61188 MB
> > > node distances:
> > > node   0   1   2   3   4   5   6   7
> > >  0:  10  12  17  17  19  19  19  19
> > >  1:  12  10  17  17  19  19  19  19
> > >  2:  17  17  10  12  19  19  19  19
> > >  3:  17  17  12  10  19  19  19  19
> > >  4:  19  19  19  19  10  12  17  17
> >

Re: [PATCH v2] mm/page_isolation: fix a deadlock with printk()

2019-10-10 Thread Petr Mladek

On Thu 2019-10-10 14:12:01, Sergey Senozhatsky wrote:
> On (10/09/19 16:26), Michal Hocko wrote:
> > On Wed 09-10-19 15:56:32, Peter Oberparleiter wrote:
> > [...]
> > > A generic solution would be preferable from my point of view though,
> > > because otherwise each console driver owner would need to ensure that any
> > > lock taken in their console.write implementation is never held while
> > > memory is allocated/released.
> >
> > Considering that console.write is called from essentially arbitrary code
> > path IIUC then all the locks used in this path should be pretty much
> > tail locks or console internal ones without external dependencies.
> 
> That's a good expectation, but I guess it's not always the case.
> 
> One example might be NET console - net subsystem locks, net device
> drivers locks, maybe even some MM locks (skb allocations?).
> 
> But even more "commonly used" consoles sometimes break that
> expectation. E.g. 8250
> 
> serial8250_console_write()
>  serial8250_modem_status()
>   wake_up_interruptible()
> 
> And so on.

I think that the only maintainable solution is to call the console
drivers in a well defined context (kthread). We have finally opened
doors to do this change.

Using printk_deferred() or removing the problematic printk() is
a short term workaround. I am not going to block such patches.
But the final decision will be on maintainers of the affected subsystems.

Deferring console work should prevent the deadlocks. Another story
is allocating memory from console->write() callback. It makes the
console less reliable when there is a memory contention. Preventing
this would be very valuable.

Best Regards,
Petr

[PATCH RESED v2 2/4] firmware: meson_sm: Add secure power domain support

2019-10-10 Thread Jianxin Pan

The Amlogic Meson A1/C1 Secure Monitor implements calls to control power
domain.

Signed-off-by: Jianxin Pan 
---
 drivers/firmware/meson/meson_sm.c   | 2 ++
 include/linux/firmware/meson/meson_sm.h | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/drivers/firmware/meson/meson_sm.c 
b/drivers/firmware/meson/meson_sm.c
index 1d5b4d7..7ec09f5 100644
--- a/drivers/firmware/meson/meson_sm.c
+++ b/drivers/firmware/meson/meson_sm.c
@@ -44,6 +44,8 @@ static const struct meson_sm_chip gxbb_chip = {
CMD(SM_EFUSE_WRITE, 0x8231),
CMD(SM_EFUSE_USER_MAX,  0x8233),
CMD(SM_GET_CHIP_ID, 0x8244),
+   CMD(SM_PWRC_SET,0x8293),
+   CMD(SM_PWRC_GET,0x8295),
{ /* sentinel */ },
},
 };
diff --git a/include/linux/firmware/meson/meson_sm.h 
b/include/linux/firmware/meson/meson_sm.h
index 6669e2a..4ed3989 100644
--- a/include/linux/firmware/meson/meson_sm.h
+++ b/include/linux/firmware/meson/meson_sm.h
@@ -12,6 +12,8 @@ enum {
SM_EFUSE_WRITE,
SM_EFUSE_USER_MAX,
SM_GET_CHIP_ID,
+   SM_PWRC_SET,
+   SM_PWRC_GET,
 };
 
 struct meson_sm_firmware;
-- 
2.7.4

[PATCH RESEND v2 1/4] dt-bindings: power: add Amlogic secure power domains bindings

2019-10-10 Thread Jianxin Pan

Add the bindings for the Amlogic Secure power domains, controlling the
secure power domains.

The bindings targets the Amlogic A1 and C1 compatible SoCs, in which the
power domain registers are in secure world.

Signed-off-by: Jianxin Pan 
---
 .../bindings/power/amlogic,meson-sec-pwrc.yaml | 42 ++
 include/dt-bindings/power/meson-a1-power.h | 32 +
 2 files changed, 74 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/power/amlogic,meson-sec-pwrc.yaml
 create mode 100644 include/dt-bindings/power/meson-a1-power.h

diff --git 
a/Documentation/devicetree/bindings/power/amlogic,meson-sec-pwrc.yaml 
b/Documentation/devicetree/bindings/power/amlogic,meson-sec-pwrc.yaml
new file mode 100644
index ..88d8261
--- /dev/null
+++ b/Documentation/devicetree/bindings/power/amlogic,meson-sec-pwrc.yaml
@@ -0,0 +1,42 @@
+# SPDX-License-Identifier: (GPL-2.0+ OR MIT)
+# Copyright (c) 2019 Amlogic, Inc
+# Author: Jianxin Pan 
+%YAML 1.2
+---
+$id: "http://devicetree.org/schemas/power/amlogic,meson-sec-pwrc.yaml#";
+$schema: "http://devicetree.org/meta-schemas/core.yaml#";
+
+title: Amlogic Meson Secure Power Domains
+
+maintainers:
+  - Jianxin Pan 
+
+description: |+
+  Meson Secure Power Domains used in A1/C1 SoCs.
+
+properties:
+  compatible:
+enum:
+  - amlogic,meson-a1-pwrc
+
+  "#power-domain-cells":
+const: 1
+
+  secure-monitor:
+description: phandle to the secure-monitor node
+$ref: /schemas/types.yaml#/definitions/phandle
+
+required:
+  - compatible
+  - "#power-domain-cells"
+  - secure-monitor
+
+examples:
+  - |
+pwrc: power-controller {
+  compatible = "amlogic,meson-a1-pwrc";
+  #power-domain-cells = <1>;
+  secure-monitor = <&sm>;
+};
+
+
diff --git a/include/dt-bindings/power/meson-a1-power.h 
b/include/dt-bindings/power/meson-a1-power.h
new file mode 100644
index ..6cf50bf
--- /dev/null
+++ b/include/dt-bindings/power/meson-a1-power.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: (GPL-2.0+ or MIT) */
+/*
+ * Copyright (c) 2019 Amlogic, Inc.
+ * Author: Jianxin Pan 
+ */
+
+#ifndef _DT_BINDINGS_MESON_A1_POWER_H
+#define _DT_BINDINGS_MESON_A1_POWER_H
+
+#define PWRC_DSPA_ID   8
+#define PWRC_DSPB_ID   9
+#define PWRC_UART_ID   10
+#define PWRC_DMC_ID11
+#define PWRC_I2C_ID12
+#define PWRC_PSRAM_ID  13
+#define PWRC_ACODEC_ID 14
+#define PWRC_AUDIO_ID  15
+#define PWRC_OTP_ID16
+#define PWRC_DMA_ID17
+#define PWRC_SD_EMMC_ID18
+#define PWRC_RAMA_ID   19
+#define PWRC_RAMB_ID   20
+#define PWRC_IR_ID 21
+#define PWRC_SPICC_ID  22
+#define PWRC_SPIFC_ID  23
+#define PWRC_USB_ID24
+#define PWRC_NIC_ID25
+#define PWRC_PDMIN_ID  26
+#define PWRC_RSA_ID27
+#define PWRC_MAX_ID28
+
+#endif
-- 
2.7.4

[PATCH RESEND v2 3/4] soc: amlogic: Add support for Secure power domains controller

2019-10-10 Thread Jianxin Pan

Add support for the Amlogic Secure Power controller. In A1/C1 series, power
control registers are in secure domain, and should be accessed by smc.

Signed-off-by: Jianxin Pan 
---
 drivers/soc/amlogic/Kconfig |  13 ++
 drivers/soc/amlogic/Makefile|   1 +
 drivers/soc/amlogic/meson-secure-pwrc.c | 203 
 3 files changed, 217 insertions(+)
 create mode 100644 drivers/soc/amlogic/meson-secure-pwrc.c

diff --git a/drivers/soc/amlogic/Kconfig b/drivers/soc/amlogic/Kconfig
index bc2c912..6cb06e7 100644
--- a/drivers/soc/amlogic/Kconfig
+++ b/drivers/soc/amlogic/Kconfig
@@ -48,6 +48,19 @@ config MESON_EE_PM_DOMAINS
  Say yes to expose Amlogic Meson Everything-Else Power Domains as
  Generic Power Domains.
 
+config MESON_SECURE_PM_DOMAINS
+   bool "Amlogic Meson Secure Power Domains driver"
+   depends on ARCH_MESON || COMPILE_TEST
+   depends on PM && OF
+   depends on HAVE_ARM_SMCCC
+   default ARCH_MESON
+   select PM_GENERIC_DOMAINS
+   select PM_GENERIC_DOMAINS_OF
+   help
+ Support for the power controller on Amlogic A1/C1 series.
+ Say yes to expose Amlogic Meson Secure Power Domains as Generic
+ Power Domains.
+
 config MESON_MX_SOCINFO
bool "Amlogic Meson MX SoC Information driver"
depends on ARCH_MESON || COMPILE_TEST
diff --git a/drivers/soc/amlogic/Makefile b/drivers/soc/amlogic/Makefile
index de79d044..7b8c5d3 100644
--- a/drivers/soc/amlogic/Makefile
+++ b/drivers/soc/amlogic/Makefile
@@ -5,3 +5,4 @@ obj-$(CONFIG_MESON_GX_SOCINFO) += meson-gx-socinfo.o
 obj-$(CONFIG_MESON_GX_PM_DOMAINS) += meson-gx-pwrc-vpu.o
 obj-$(CONFIG_MESON_MX_SOCINFO) += meson-mx-socinfo.o
 obj-$(CONFIG_MESON_EE_PM_DOMAINS) += meson-ee-pwrc.o
+obj-$(CONFIG_MESON_SECURE_PM_DOMAINS) += meson-secure-pwrc.o
diff --git a/drivers/soc/amlogic/meson-secure-pwrc.c 
b/drivers/soc/amlogic/meson-secure-pwrc.c
new file mode 100644
index ..25951cb
--- /dev/null
+++ b/drivers/soc/amlogic/meson-secure-pwrc.c
@@ -0,0 +1,203 @@
+// SPDX-License-Identifier: (GPL-2.0+ OR MIT)
+/*
+ * Copyright (c) 2019 Amlogic, Inc.
+ * Author: Jianxin Pan 
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define PWRC_ON1
+#define PWRC_OFF   0
+
+struct meson_secure_pwrc_domain {
+   struct generic_pm_domain base;
+   unsigned int index;
+   struct meson_secure_pwrc *pwrc;
+};
+
+struct meson_secure_pwrc {
+   struct meson_secure_pwrc_domain *domains;
+   struct genpd_onecell_data xlate;
+   struct meson_sm_firmware *fw;
+};
+
+struct meson_secure_pwrc_domain_desc {
+   unsigned int index;
+   unsigned int flags;
+   char *name;
+   bool (*is_off)(struct meson_secure_pwrc_domain *pwrc_domain);
+};
+
+struct meson_secure_pwrc_domain_data {
+   unsigned int count;
+   struct meson_secure_pwrc_domain_desc *domains;
+};
+
+static bool pwrc_secure_is_off(struct meson_secure_pwrc_domain *pwrc_domain)
+{
+   int sts = 1;
+
+   if (meson_sm_call(pwrc_domain->pwrc->fw, SM_PWRC_GET, &sts,
+ pwrc_domain->index, 0, 0, 0, 0) < 0)
+   pr_err("failed to get power domain status\n");
+
+   return !!sts;
+}
+
+static int meson_secure_pwrc_off(struct generic_pm_domain *domain)
+{
+   int sts = 0;
+   struct meson_secure_pwrc_domain *pwrc_domain =
+   container_of(domain, struct meson_secure_pwrc_domain, base);
+
+   if (meson_sm_call(pwrc_domain->pwrc->fw, SM_PWRC_SET, NULL,
+   pwrc_domain->index, PWRC_OFF, 0, 0, 0) < 0) {
+   pr_err("failed to set power domain off\n");
+   sts = -EINVAL;
+   }
+
+   return sts;
+}
+
+static int meson_secure_pwrc_on(struct generic_pm_domain *domain)
+{
+   int sts = 0;
+   struct meson_secure_pwrc_domain *pwrc_domain =
+   container_of(domain, struct meson_secure_pwrc_domain, base);
+
+   if (meson_sm_call(pwrc_domain->pwrc->fw, SM_PWRC_SET, NULL,
+   pwrc_domain->index, PWRC_ON, 0, 0, 0) < 0) {
+   pr_err("failed to set power domain on\n");
+   sts = -EINVAL;
+   }
+
+   return sts;
+}
+
+#define SEC_PD(__name, __flag) \
+{  \
+   .name = #__name,\
+   .index = PWRC_##__name##_ID,\
+   .is_off = pwrc_secure_is_off,   \
+   .flags = __flag,\
+}
+
+static struct meson_secure_pwrc_domain_desc a1_pwrc_domains[] = {
+   SEC_PD(DSPA,0),
+   SEC_PD(DSPB,0),
+   /* UART should keep working in ATF after suspend and before resume */
+   SEC_PD(UART,GENPD_FLAG_ALWAYS_ON),
+   /* DMC is for DDR PHY ana/dig and DMC, and should be always on */
+   SEC_PD(DMC, GENPD_FLAG_ALWAYS_ON),
+   SEC_PD(I2C, 0),
+

[PATCH RESEND v2 0/4] arm64: meson: add support for A1 Power Domains

2019-10-10 Thread Jianxin Pan

This patchset introduces a "Secure Power Doamin Controller". In A1/C1, power
controller registers such as PWRCTRL_FOCRSTN, PWRCTRL_PWR_OFF, PWRCTRL_MEM_PD
and PWRCTRL_ISO_EN, are in the secure domain, and should be accessed from ATF
by smc.

Changes since v1 at [0]:
- use APIs from sm driver
- rename pwrc_secure_get_power as Kevin suggested
- add comments for always on domains
- replace arch_initcall_sync with builtin_platform_driver
- fix coding style

[0]  
https://lore.kernel.org/linux-amlogic/1568895064-4116-1-git-send-email-jianxin@amlogic.com

Jianxin Pan (4):
  dt-bindings: power: add Amlogic secure power domains bindings
  firmware: meson_sm: Add secure power domain support
  soc: amlogic: Add support for Secure power domains controller
  arm64: dts: meson: a1: add secure power domain controller

 .../bindings/power/amlogic,meson-sec-pwrc.yaml |  42 +
 arch/arm64/boot/dts/amlogic/meson-a1.dtsi  |   7 +
 drivers/firmware/meson/meson_sm.c  |   2 +
 drivers/soc/amlogic/Kconfig|  13 ++
 drivers/soc/amlogic/Makefile   |   1 +
 drivers/soc/amlogic/meson-secure-pwrc.c| 203 +
 include/dt-bindings/power/meson-a1-power.h |  32 
 include/linux/firmware/meson/meson_sm.h|   2 +
 8 files changed, 302 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/power/amlogic,meson-sec-pwrc.yaml
 create mode 100644 drivers/soc/amlogic/meson-secure-pwrc.c
 create mode 100644 include/dt-bindings/power/meson-a1-power.h

-- 
2.7.4

[PATCH RESEND v2 4/4] arm64: dts: meson: a1: add secure power domain controller

2019-10-10 Thread Jianxin Pan

Enable power domain controller for Meson A1 SoC.

Signed-off-by: Jianxin Pan 
---
 arch/arm64/boot/dts/amlogic/meson-a1.dtsi | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm64/boot/dts/amlogic/meson-a1.dtsi 
b/arch/arm64/boot/dts/amlogic/meson-a1.dtsi
index 7210ad0..5547913 100644
--- a/arch/arm64/boot/dts/amlogic/meson-a1.dtsi
+++ b/arch/arm64/boot/dts/amlogic/meson-a1.dtsi
@@ -93,6 +93,13 @@
clock-names = "xtal", "pclk", "baud";
status = "disabled";
};
+
+   pwrc: power-controller {
+   compatible = "amlogic,meson-a1-pwrc";
+   #power-domain-cells = <1>;
+   secure-monitor = <&sm>;
+   status = "okay";
+   };
};
 
gic: interrupt-controller@ff901000 {
-- 
2.7.4

Re: [PATCH v17 01/14] bitops: Introduce the for_each_set_clump8 macro

2019-10-10 Thread Geert Uytterhoeven

Hi Andy,

On Thu, Oct 10, 2019 at 10:08 AM Andy Shevchenko
 wrote:
> On Thu, Oct 10, 2019 at 09:49:51AM +0200, Geert Uytterhoeven wrote:
> > On Thu, Oct 10, 2019 at 9:42 AM Andy Shevchenko
> >  wrote:
> > > On Thu, Oct 10, 2019 at 9:29 AM Geert Uytterhoeven  
> > > wrote:
> > > > On Thu, Oct 10, 2019 at 7:49 AM Andy Shevchenko
> > > >  wrote:
> > > > > On Thu, Oct 10, 2019 at 5:31 AM Masahiro Yamada
> > > > >  wrote:
> > > > > > On Thu, Oct 10, 2019 at 3:54 AM Geert Uytterhoeven 
> > > > > >  wrote:
> > > > > > > On Wed, Oct 9, 2019 at 7:09 PM Andy Shevchenko
> > > > > > >  wrote:
> > > > > > > > On Thu, Oct 10, 2019 at 01:28:08AM +0900, Masahiro Yamada wrote:
> > > > > > > > > On Thu, Oct 10, 2019 at 12:27 AM William Breathitt Gray
> > > > > > > > >  wrote:
>
> > > > > > > > > Why is the return type "unsigned long" where you know
> > > > > > > > > it return the 8-bit value ?
> > > > > > > >
> > > > > > > > Because bitmap API operates on unsigned long type. This is not 
> > > > > > > > only
> > > > > > > > consistency, but for sake of flexibility in case we would like 
> > > > > > > > to introduce
> > > > > > > > more calls like clump16 or so.
> > > > > > >
> > > > > > > TBH, that doesn't convince me: those functions explicitly 
> > > > > > > take/return an
> > > > > > > 8-bit value, and have "8" in their name.  The 8-bit value is never
> > > > > > > really related to, retrieved from, or stored in a full "unsigned 
> > > > > > > long"
> > > > > > > element of a bitmap, only to/from/in a part (byte) of it.
> > > > > > >
> > > > > > > Following your rationale, all of iowrite{8,16,32,64}*() should 
> > > > > > > take an
> > > > > > > "unsigned long" value, too.
> > > > > >
> > > > > > Using u8/u16/u32/u64 looks more consistent with other bitmap 
> > > > > > helpers.
> > > > > >
> > > > > > void bitmap_from_arr32(unsigned long *bitmap, const u32 *buf, 
> > > > > > unsigned
> > > > > > int nbits);
> > > > > > void bitmap_to_arr32(u32 *buf, const unsigned long *bitmap, 
> > > > > > unsigned int nbits);
> > > > > > static inline void bitmap_from_u64(unsigned long *dst, u64 mask);
> > > > > >
> > > > > > If you want to see more examples from other parts,
> > > > >
> > > > > Geert's and yours examples both are not related. They are about
> > > > > fixed-width properies when we know that is the part of protocol.
> > > > > Here we have no protocol which stricts us to the mentioned 
> > > > > fixed-width types.
> > > >
> > > > Yes you have: they are functions to store/retrieve an 8-bit value from
> > > > the middle of the bitmap, which is reflected in their names ("clump8",
> > > > "value8").
> > > > The input/output value is clearly separated from the actual bitmap,
> > > > which is referenced by the "unsigned long *".
> > > >
> > > > If you add new "value16" functions, they will be intended to 
> > > > store/retrieve
> > > > 16-bit values.
> > >
> > > And if I add 4-bit, 12-bit or 24-bit values, what should I use?
> >
> > Whatever is needed to store that?
> > I agree "unsigned long" is appropriate for a generic function to extract a
> > bit field of 1 to BITS_PER_LONG bits.
> >
> > > > Besides, if retrieving an 8-bit value requires passing an
> > > > "unsigned long *", the caller needs two variables: one unsigned long to
> > > > pass the address of, and one u8 to copy the returned value into.
> > >
> > > Why do you need a temporary variable? In some cases it might make
> > > sense, but in general simple cases I don't see what you may achieve
> > > with it.
> >
> > Because find_next_clump8() takes a pointer to store the output value.
>
> So does regmap_read().

I believe that one is different, as it is a generic function, and the
width of the
returned value depends on the regmap config.

> 8 appeared there during review when it has been proposed to optimize to 8-bit
> clumps as most of the current users utilize it. The initial idea was to be
> bit-width agnostic. And with current API it's possible to easy convert to 
> other
> formats later if we need.

"optimized for 8-bit clumps" and "out-of-line function that takes an
unsigned long pointer for an output parameter" don't match well, IMHO.

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH 3/4] media: af9035: add support for Logilink VG0022A

2019-10-10 Thread Gon Solo

Hi!

I rebased Mauros patch on top of mine and this patch [3/4] is the first bad
commit. I believe these lines are the culprit:

+   if (le16_to_cpu(d->udev->descriptor.idVendor) == USB_VID_DEXATEK &&
+   le16_to_cpu(d->udev->descriptor.idProduct) == 0x0100)
+   si2168_config.dont_load_firmware = true;

> From: JP 
> Mauro just took the wrong firmware to skip. demod instead of tuner.
> It would not be hard to fix that.

It seems so.

g

(RESEND) [PATCH] ocfs2: Fix error handling in ocfs2_setattr()

2019-10-10 Thread Chengguang Xu

Should set transfer_to[USRQUOTA/GRPQUOTA] to NULL
on error case before jump to do dqput().

Signed-off-by: Chengguang Xu 
---
 fs/ocfs2/file.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 2e982db3e1ae..53939bf9d7d2 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1230,6 +1230,7 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
*attr)
transfer_to[USRQUOTA] = dqget(sb, 
make_kqid_uid(attr->ia_uid));
if (IS_ERR(transfer_to[USRQUOTA])) {
status = PTR_ERR(transfer_to[USRQUOTA]);
+   transfer_to[USRQUOTA] = NULL;
goto bail_unlock;
}
}
@@ -1239,6 +1240,7 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
*attr)
transfer_to[GRPQUOTA] = dqget(sb, 
make_kqid_gid(attr->ia_gid));
if (IS_ERR(transfer_to[GRPQUOTA])) {
status = PTR_ERR(transfer_to[GRPQUOTA]);
+   transfer_to[GRPQUOTA] = NULL;
goto bail_unlock;
}
}
-- 
2.20.1

Re: [PATCH] mtd: maps: l440gx: Avoid print address to dmesg

2019-10-10 Thread Miquel Raynal

Hi Fuqian,

Fuqian Huang  wrote on Thu, 10 Oct 2019
16:01:30 +0800:

> Avoid print the address of l440gx_map.virt every time l440gx init.
> 
> Signed-off-by: Fuqian Huang 
> ---
>  drivers/mtd/maps/l440gx.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/mtd/maps/l440gx.c b/drivers/mtd/maps/l440gx.c
> index 876f12f40018..e7e40bca82d1 100644
> --- a/drivers/mtd/maps/l440gx.c
> +++ b/drivers/mtd/maps/l440gx.c
> @@ -86,7 +86,6 @@ static int __init init_l440gx(void)
>   return -ENOMEM;
>   }
>   simple_map_init(&l440gx_map);
> - printk(KERN_NOTICE "window_addr = 0x%08lx\n", (unsigned 
> long)l440gx_map.virt);
>  
>   /* Setup the pm iobase resource
>* This code should move into some kind of generic bridge


It looks more like a debug message, maybe turn it into a KERN_DEBUG?
Usually people do not run their kernels with such a low trace limit.

Thanks,
Miquèl

Re: [PATCH 0/8] clocksource/drivers/timer-atmel-tcb: add sama5d2 support

2019-10-10 Thread Sebastian Andrzej Siewior

On 2019-10-10 00:39:58 [+0200], Alexandre Belloni wrote:
> This series mainly adds sama5d2 support where we need to avoid using
> clock index 0 because that clock is never enabled by the driver.
> 
> There is also a rework of the 32khz clock handling so it is not used for
> clockevents on 32 bit counter because the increased rate improves the
> resolution and doesn't have any drawback with that counter width. This
> replaces a patch that has been carried in the linux-rt tree for a while.

Thank you for doing this!

Sebastian

Re: [RFC v4] zswap: Add CONFIG_ZSWAP_IO_SWITCH to handle swap IO issue

2019-10-10 Thread Dan Streetman

On Tue, Oct 8, 2019 at 4:07 AM Hui Zhu  wrote:
>
> This is the fourth version of this patch.  The perious versions
> are in [1], [2] and [3].
>
> The parameters read_in_flight_limit and write_in_flight_limit were
> replaced by io_switch_enabled_enabled in this verion to make this
> function more clear.
>
> Currently, I use a VM that has 1 CPU, 4G memory and 4G swap file.
> I found that swap will affect the IO performance when it is running.
> So I open zswap to handle it because it just use CPU cycles but not
> disk IO.
>
> It work OK but I found that zswap is slower than normal swap in this
> VM.  zswap is about 300M/s and normal swap is about 500M/s. (The reason
> is the swap disk device config is "cache=none,aio=native".)
> So open zswap is make memory shrinker slower but good for IO performance
> in this VM.
> So I just want zswap work when the disk of the swap file is under high
> IO load.

I'm still not excited about this, I feel like this will only be useful
in situations where zswap probably wasn't a good idea in the first
place.  And I'm still not clear on why you're using zswap *at all*, if
your disk I/O is faster than zswap can compress pages - you clearly
should have zswap disabled, if that's the case.

Can you run some more tests and make sure this param really, actually,
helps your workload?  Please also check that you aren't filling up
zswap as well; if your problem is you're filling up zswap and that's
when you want to divert more pages into swap, then I think that would
be much better handled by adding hysteresis logic instead of checking
the swap device io load.

>
> This commit is designed for this idea.
> When this function is enabled by the swap parameter
> io_switch_enabled_enabled, zswap will just work when the swap disk has
> outstanding I/O requests.
>
> [1] https://lkml.org/lkml/2019/9/11/935
> [2] https://lkml.org/lkml/2019/9/20/90
> [3] https://lkml.org/lkml/2019/9/22/927
>
> Signed-off-by: Hui Zhu 
> ---
>  include/linux/swap.h |  3 +++
>  mm/Kconfig   | 14 ++
>  mm/page_io.c | 16 
>  mm/zswap.c   | 25 +
>  4 files changed, 58 insertions(+)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index de2c67a..82b621f 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -389,6 +389,9 @@ extern void end_swap_bio_write(struct bio *bio);
>  extern int __swap_writepage(struct page *page, struct writeback_control *wbc,
> bio_end_io_t end_write_func);
>  extern int swap_set_page_dirty(struct page *page);
> +#ifdef CONFIG_ZSWAP_IO_SWITCH
> +extern void swap_io_in_flight(struct page *page, unsigned int inflight[2]);
> +#endif
>
>  int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page,
> unsigned long nr_pages, sector_t start_block);
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 56cec63..f5740e3 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -546,6 +546,20 @@ config ZSWAP
>   they have not be fully explored on the large set of potential
>   configurations and workloads that exist.
>
> +config ZSWAP_IO_SWITCH

let's drop this, if we're going to add this I don't think we need both
a build time and runtime switch.  Just defaulting the runtime switch
to off should be fine.

> +   bool "Compressed cache for swap pages according to the IO status"
> +   depends on ZSWAP
> +   help
> + This function helps the system that normal swap speed is higher
> + than zswap speed to handle the swap IO issue.
> + For example, a VM where the swap disk device with config
> + "cache=none,aio=native".
> +
> + When this function is enabled by the swap parameter
> + io_switch_enabled_enabled, zswap will just work when the swap disk
> + has outstanding I/O requests.

I think this doc should go into Documentation/vm/zswap.rst instead, please.

> + If unsure, say "n".
> +
>  config ZPOOL
> tristate "Common API for compressed memory storage"
> help
> diff --git a/mm/page_io.c b/mm/page_io.c
> index 24ee600..e66b050 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -434,3 +434,19 @@ int swap_set_page_dirty(struct page *page)
> return __set_page_dirty_no_writeback(page);
> }
>  }
> +
> +#ifdef CONFIG_ZSWAP_IO_SWITCH
> +void swap_io_in_flight(struct page *page, unsigned int inflight[2])
> +{
> +   struct swap_info_struct *sis = page_swap_info(page);
> +
> +   if (!sis->bdev) {
> +   inflight[0] = 0;
> +   inflight[1] = 0;

I'm not quite sure when a swap_info_struct won't have a bdev (looks
like if it's neither ISBLK nor ISREG, and I'm not sure what's left
after those), but if you set both to 0 that means it will effectively
disable zswap completely for this swap device, writing all pages to
it.  Is that really the right thing to do?

> +   return;
> +   }
> +
> +   part_in_flight_rw(bdev_

Re: [PATCH v4 1/3] tpm: migrate pubek_show to struct tpm_buf

2019-10-10 Thread Greg Kroah-Hartman

On Thu, Oct 10, 2019 at 12:28:29AM +0300, Jarkko Sakkinen wrote:
> commit da379f3c1db0c9a1fd27b11d24c9894b5edc7c75 upstream
> 
> Migrated pubek_show to struct tpm_buf and cleaned up its implementation.
> Previously the output parameter structure was declared but left
> completely unused. Now it is used to refer different fields of the
> output. We can move it to tpm-sysfs.c as it does not have any use
> outside of that file.
> 
> Signed-off-by: Jarkko Sakkinen 
> ---
>  drivers/char/tpm/tpm-sysfs.c | 87 
>  drivers/char/tpm/tpm.h   | 13 --
>  2 files changed, 48 insertions(+), 52 deletions(-)

This is already in the 4.14.148 kernel release.

Why do we need it again?

Also, the mailing list is stable@vger, not linux-stable@vger.

thanks,

greg k-h

Re: [PATCH v4 0/3] tpm: Fix TPM 1.2 Shutdown sequence to prevent future TPM operations

2019-10-10 Thread Greg KH

On Thu, Oct 10, 2019 at 12:28:28AM +0300, Jarkko Sakkinen wrote:
> commit db4d8cb9c9f2af71c4d087817160d866ed572cc9 upstream
> 
> This backport is for v4.14 and v4.19 The backport requires non-racy
> behaviour from TPM 1.x sysfs code. Thus, the dependecies for that
> are included.
> 
> NOTE: 1/3 is only needed for v4.14.

How can a 0/X have a git commit id?

:)

greg k-h

Re: [PATCH v4 3/3] tpm: Fix TPM 1.2 Shutdown sequence to prevent future TPM operations

2019-10-10 Thread Greg Kroah-Hartman

On Thu, Oct 10, 2019 at 12:28:31AM +0300, Jarkko Sakkinen wrote:
> From: Vadim Sukhomlinov 
> 
> commit db4d8cb9c9f2af71c4d087817160d866ed572cc9 upstream
> 
> TPM 2.0 Shutdown involve sending TPM2_Shutdown to TPM chip and disabling
> future TPM operations. TPM 1.2 behavior was different, future TPM
> operations weren't disabled, causing rare issues. This patch ensures
> that future TPM operations are disabled.
> 
> Fixes: d1bd4a792d39 ("tpm: Issue a TPM2_Shutdown for TPM2 devices.")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Vadim Sukhomlinov 
> [dianders: resolved merge conflicts with mainline]
> Signed-off-by: Douglas Anderson 
> Reviewed-by: Jarkko Sakkinen 
> Signed-off-by: Jarkko Sakkinen 
> ---
>  drivers/char/tpm/tpm-chip.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)

This is already in the 4.14.148 4.19.78 5.1.18 5.2.1 5.3 releases.

thansk,

greg k-h

Re: [PATCH v3] ARC: mm: remove __ARCH_USE_5LEVEL_HACK

2019-10-10 Thread Kirill A. Shutemov

On Wed, Oct 09, 2019 at 06:57:31PM +, Vineet Gupta wrote:
> Add the intermediate p4d accessors to make it 5 level compliant.
> 
> This is a non-functional change anyways since ARC has software page walker
> with 2 lookup levels (pgd -> pte)
> 
> There is slight code bloat due to pulling in needless p*d_free_tlb()
> macros which needs to be addressed seperately.
> 
> | bloat-o-meter2 vmlinux-with-5LEVEL_HACK vmlinux-patched
> | add/remove: 0/0 grow/shrink: 2/0 up/down: 128/0 (128)
> | function old new   delta
> | free_pgd_range   546 656+110
> | p4d_clear_bad  2  20 +18
> | Total: Before=4137148, After=4137276, chg 0.00%
> 
> Cc: Kirill A. Shutemov 
> Signed-off-by: Vineet Gupta 

Acked-by: Kirill A. Shutemov 

-- 
 Kirill A. Shutemov

Re: [PATCH v4 2/3] tpm: use tpm_try_get_ops() in tpm-sysfs.c.

2019-10-10 Thread Greg Kroah-Hartman

On Thu, Oct 10, 2019 at 12:28:30AM +0300, Jarkko Sakkinen wrote:
> commit 2677ca98ae377517930c183248221f69f771c921 upstream
> 
> Use tpm_try_get_ops() in tpm-sysfs.c so that we can consider moving
> other decorations (locking, localities, power management for example)
> inside it. This direction can be of course taken only after other call
> sites for tpm_transmit() have been treated in the same way.
> 
> Signed-off-by: Jarkko Sakkinen 
> Reviewed-by: Stefan Berger 
> Tested-by: Stefan Berger 
> Reviewed-by: Jerry Snitselaar 
> Reviewed-by: James Bottomley 
> Tested-by: Alexander Steffen 
> ---
>  drivers/char/tpm/tpm-sysfs.c | 134 ++-
>  1 file changed, 83 insertions(+), 51 deletions(-)

This is already in the 4.14.148 4.19.78 5.1 releases.

greg k-h

Re: "reuse mergeable anon_vma as parent when fork" causes a crash on s390

2019-10-10 Thread Konstantin Khlebnikov


On 10/10/2019 06.15, Wei Yang wrote:

On Thu, Oct 10, 2019 at 10:36:01AM +0800, Wei Yang wrote:

Hi, Qian, Shakeel

Thanks for testing.

Sounds I missed some case to handle. anon_vma_clone() now would be called in
vma_adjust, which is a different case when it is introduced.



Well, I have to correct my statement. The reason is we may did something more
in anon_vma_clone().

Here is a quick fix, while I need to go through all the cases carefully.


Oops, I've overlooked this case too.

You have to check src->anon_vma
otherwise in  __split_vma or copy_vma dst could pick completely random anon_vma.

Also checking prev will not hurt, just to be sure.

So, something like this should work:

if (!dst->anon_vma && src->anon_vma &&
prev && pprev && pprev->anon_vma == src->anon_vma)
  dst->anon_vma = prev->anon_vma;



diff --git a/mm/rmap.c b/mm/rmap.c
index 12f6c3d7fd9d..2844f442208d 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -271,7 +271,7 @@ int anon_vma_clone(struct vm_area_struct *dst, struct 
vm_area_struct *src)
  * 1. Parent has vm_prev, which implies we have vm_prev.
  * 2. Parent and its vm_prev have the same anon_vma.
  */
-   if (pprev && pprev->anon_vma == src->anon_vma)
+   if (!dst->anon_vma && pprev && pprev->anon_vma == src->anon_vma)
 dst->anon_vma = prev->anon_vma;
  
 list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {



BTW, do you have the specific test case? So that I could verify my change. The
kernel build test doesn't trigger this.

Thanks a lot :-)

On Wed, Oct 09, 2019 at 03:21:11PM -0700, Shakeel Butt wrote:
--
Wei Yang
Help you, Help me

Re: [RFC v2 1/2] kvm: Mechanism to copy host timekeeping parameters into guest.

2019-10-10 Thread Vitaly Kuznetsov

Suleiman Souhlal  writes:

> This is used to synchronize time between host and guest.
> The guest can request the (guest) physical address it wants the
> data in through the MSR_KVM_TIMEKEEPER_EN MSR.
>
> It currently assumes the host timekeeper is "tsc".
>
> Signed-off-by: Suleiman Souhlal 
> ---
>  arch/x86/include/asm/kvm_host.h  |   3 +
>  arch/x86/include/asm/pvclock-abi.h   |  27 ++
>  arch/x86/include/uapi/asm/kvm_para.h |   1 +
>  arch/x86/kvm/x86.c   | 121 +++
>  4 files changed, 152 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 50eb430b0ad8..4d622450cb4a 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -659,7 +659,10 @@ struct kvm_vcpu_arch {
>   struct pvclock_vcpu_time_info hv_clock;
>   unsigned int hw_tsc_khz;
>   struct gfn_to_hva_cache pv_time;
> + struct gfn_to_hva_cache pv_timekeeper_g2h;
> + struct pvclock_timekeeper pv_timekeeper;
>   bool pv_time_enabled;
> + bool pv_timekeeper_enabled;
>   /* set guest stopped flag in pvclock flags field */
>   bool pvclock_set_guest_stopped_request;
>  
> diff --git a/arch/x86/include/asm/pvclock-abi.h 
> b/arch/x86/include/asm/pvclock-abi.h
> index 1436226efe3e..2809008b9b26 100644
> --- a/arch/x86/include/asm/pvclock-abi.h
> +++ b/arch/x86/include/asm/pvclock-abi.h
> @@ -40,6 +40,33 @@ struct pvclock_wall_clock {
>   u32   nsec;
>  } __attribute__((__packed__));
>  
> +struct pvclock_read_base {
> + u64 mask;
> + u64 cycle_last;
> + u32 mult;
> + u32 shift;
> + u64 xtime_nsec;
> + u64 base;
> +} __attribute__((__packed__));
> +
> +struct pvclock_timekeeper {
> + u64 gen;
> + u64 flags;
> + struct pvclock_read_base tkr_mono;
> + struct pvclock_read_base tkr_raw;
> + u64 xtime_sec;
> + u64 ktime_sec;
> + u64 wall_to_monotonic_sec;
> + u64 wall_to_monotonic_nsec;
> + u64 offs_real;
> + u64 offs_boot;
> + u64 offs_tai;
> + u64 raw_sec;
> + u64 tsc_offset;
> +} __attribute__((__packed__));
> +

AFAIU these structures mirror struct tk_read_base and struct timekeeper
but these are intenal to timekeeper so the risk I see is: we decide to
change timekeeper internals and these structures become hard-to-mirror
so I'd think about versioning them from the very beginning, e.g.: host
reports supported timekeer versions in MSR_KVM_TIMEKEEPER_EN (renamed)
and then guest asks for a particular version. This way we can deprecate
old versions eventually.

> +#define  PVCLOCK_TIMEKEEPER_ENABLED (1 << 0)
> +
>  #define PVCLOCK_TSC_STABLE_BIT   (1 << 0)
>  #define PVCLOCK_GUEST_STOPPED(1 << 1)
>  /* PVCLOCK_COUNTS_FROM_ZERO broke ABI and can't be used anymore. */
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
> b/arch/x86/include/uapi/asm/kvm_para.h
> index 2a8e0b6b9805..3ebb1d87db3a 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -50,6 +50,7 @@
>  #define MSR_KVM_STEAL_TIME  0x4b564d03
>  #define MSR_KVM_PV_EOI_EN  0x4b564d04
>  #define MSR_KVM_POLL_CONTROL 0x4b564d05
> +#define MSR_KVM_TIMEKEEPER_EN0x4b564d06
>  
>  struct kvm_steal_time {
>   __u64 steal;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 661e2bf38526..937f83cdda4b 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -157,6 +157,8 @@ module_param(force_emulation_prefix, bool, S_IRUGO);
>  int __read_mostly pi_inject_timer = -1;
>  module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
>  
> +static atomic_t pv_timekeepers_nr;
> +
>  #define KVM_NR_SHARED_MSRS 16
>  
>  struct kvm_shared_msrs_global {
> @@ -2729,6 +2731,16 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
> msr_data *msr_info)
>  
>   break;
>   }
> + case MSR_KVM_TIMEKEEPER_EN:
> + if (kvm_gfn_to_hva_cache_init(vcpu->kvm,
> + &vcpu->arch.pv_timekeeper_g2h, data,
> + sizeof(struct pvclock_timekeeper)))
> + vcpu->arch.pv_timekeeper_enabled = false;
> + else {
> + vcpu->arch.pv_timekeeper_enabled = true;
> + atomic_inc(&pv_timekeepers_nr);
> + }
> + break;
>   case MSR_KVM_ASYNC_PF_EN:
>   if (kvm_pv_enable_async_pf(vcpu, data))
>   return 1;
> @@ -7097,6 +7109,109 @@ static struct perf_guest_info_callbacks kvm_guest_cbs 
> = {
>  };
>  
>  #ifdef CONFIG_X86_64
> +static DEFINE_SPINLOCK(shadow_pvtk_lock);
> +static struct pvclock_timekeeper shadow_pvtk;
> +
> +static void
> +pvclock_copy_read_base(struct pvclock_read_base *pvtkr,
> +struct tk_read_base *tkr)
> +{
> + pvtkr->cycle_last = tkr->cycle_last;
> + pvtkr->mult = tkr->mult;
> + pvtkr->shift = tkr->shift;
> + pvtkr->mask = tkr->mask;
> + pvtkr->xtime_nsec = tkr->xtime_nsec;
> +

Re: [PATCH v2 0/2] extcon: axp288: Move to swnodes

2019-10-10 Thread Heikki Krogerus

On Tue, Oct 08, 2019 at 05:02:04PM +0300, Heikki Krogerus wrote:
> On Tue, Oct 08, 2019 at 03:59:23PM +0200, Hans de Goede wrote:
> > Hi,
> > 
> > On 08-10-2019 14:25, Heikki Krogerus wrote:
> > > Hi Hans,
> > > 
> > > Fixed the compiler warning in this version. No other changes.
> > > 
> > > The original cover letter:
> > > 
> > > That AXP288 extcon driver is the last that uses build-in connection
> > > description. I'm replacing it with a code that finds the role mux
> > > software node instead.
> > > 
> > > I'm proposing also here a little helper
> > > usb_role_switch_find_by_fwnode() that uses
> > > class_find_device_by_fwnode() to find the role switches.
> > 
> > Both patches look good to me and I can confirm that things still
> > work as they should on a CHT device with an AXP288 PMIC, so for both:
> > 
> > Reviewed-by: Hans de Goede 
> > Tested-by: Hans de Goede 
> > 
> > Regards,
> > 
> > Hans
> > 
> > p.s.
> > 
> > I guess this means we can remove the build-in connection (
> > device_connection_add / remove) stuff now?
> 
> Yes. I'll prepare separate patches for that.

Actually, maybe it would make sense to just remove it in this series.
I'm attaching a patch that remove struct device_connection completely.
Can you check if it makes sense to you?

thanks,

-- 
heikki
>From 9444a9498dd22fd88208d19ab3454689b953eb7b Mon Sep 17 00:00:00 2001
From: Heikki Krogerus 
Date: Wed, 9 Oct 2019 16:49:24 +0300
Subject: [PATCH] device connection: Remove struct device_connection

Now that we have software fwnodes to cover cases where a
firmware node does not describe the connections, there is no
need for dedicated device connection descriptors.

This removes struct device_connection completely. From now
on the processing is always done with the fwnode handle.

Signed-off-by: Heikki Krogerus 
---
 .../driver-api/device_connection.rst  |  37 +-
 drivers/base/devcon.c | 116 --
 drivers/usb/roles/class.c |  12 +-
 drivers/usb/typec/class.c |  12 +-
 drivers/usb/typec/mux.c   |  40 ++
 include/linux/device.h|  49 +---
 6 files changed, 48 insertions(+), 218 deletions(-)

diff --git a/Documentation/driver-api/device_connection.rst 
b/Documentation/driver-api/device_connection.rst
index ba364224c349..f1dfb426c0e8 100644
--- a/Documentation/driver-api/device_connection.rst
+++ b/Documentation/driver-api/device_connection.rst
@@ -5,39 +5,14 @@ Device connections
 Introduction
 
 
-Devices often have connections to other devices that are outside of the direct
-child/parent relationship. A serial or network communication controller, which
-could be a PCI device, may need to be able to get a reference to its PHY
-component, which could be attached for example to the I2C bus. Some device
-drivers need to be able to control the clocks or the GPIOs for their devices,
-and so on.
-
-Device connections are generic descriptions of any type of connection between
-two separate devices.
-
-Device connections alone do not create a dependency between the two devices.
-They are only descriptions which are not tied to either of the devices 
directly.
-A dependency between the two devices exists only if one of the two endpoint
-devices requests a reference to the other. The descriptions themselves can be
-defined in firmware (not yet supported) or they can be built-in.
-
-Usage
--
-
-Device connections should exist before device ``->probe`` callback is called 
for
-either endpoint device in the description. If the connections are defined in
-firmware, this is not a problem. It should be considered if the connection
-descriptions are "built-in", and need to be added separately.
-
-The connection description consists of the names of the two devices with the
-connection, i.e. the endpoints, and unique identifier for the connection which
-is needed if there are multiple connections between the two devices.
-
-After a description exists, the devices in it can request reference to the 
other
-endpoint device, or they can request the description itself.
+Device connection API allows drivers to acquire references to devices that are,
+according to their firmware nodes, connected to the devices those drivers are
+controlling. The API is in practice a wrapper that for now utilizes
+``fwnode_graph*()`` family of functions and
+:c:func:`fwnode_property_get_reference_arg` function.
 
 API
 ---
 
 .. kernel-doc:: drivers/base/devcon.c
-   :functions: device_connection_find_match device_connection_find 
device_connection_add device_connection_remove
+   :functions: device_connection_find_match fwnode_connection_find_match 
device_connection_find
diff --git a/drivers/base/devcon.c b/drivers/base/devcon.c
index 14e2178e09f8..f7698362f267 100644
--- a/drivers/base/devcon.c
+++ b/drivers/base/devcon.c
@@ -9,24 +9,20 @@
 #include 
 #include 
 
-static DEFINE_MUTEX(devcon_lock);
-static LIST_HEAD(devco

[PATCH] xen/grant-table: remove unnecessary printing

2019-10-10 Thread Fuqian Huang

xen_auto_xlat_grant_frames.vaddr is definitely NULL in this case.
So the address printing is unnecessary.

Signed-off-by: Fuqian Huang 
---
 drivers/xen/grant-table.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index 7ea6fb6a2e5d..49b381e104ef 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -1363,8 +1363,7 @@ static int gnttab_setup(void)
if (xen_feature(XENFEAT_auto_translated_physmap) && gnttab_shared.addr 
== NULL) {
gnttab_shared.addr = xen_auto_xlat_grant_frames.vaddr;
if (gnttab_shared.addr == NULL) {
-   pr_warn("gnttab share frames (addr=0x%08lx) is not 
mapped!\n",
-   (unsigned 
long)xen_auto_xlat_grant_frames.vaddr);
+   pr_warn("gnttab share frames is not mapped!\n");
return -ENOMEM;
}
}
-- 
2.11.0

[PATCH v2 0/2] scsi: ufs: Add driver for TI wrapper for Cadence UFS IP

2019-10-10 Thread Vignesh Raghavendra

This series add DT bindings and driver for TI wrapper for Cadence UFS
IP that is present on TI's J721e SoC

Vignesh Raghavendra (2):
  dt-bindings: ufs: ti,j721e-ufs.yaml: Add binding for TI UFS wrapper
  scsi: ufs: Add driver for TI wrapper for Cadence UFS IP

 .../devicetree/bindings/ufs/ti,j721e-ufs.yaml | 68 ++
 drivers/scsi/ufs/Kconfig  | 10 +++
 drivers/scsi/ufs/Makefile |  1 +
 drivers/scsi/ufs/ti-j721e-ufs.c   | 90 +++
 4 files changed, 169 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/ufs/ti,j721e-ufs.yaml
 create mode 100644 drivers/scsi/ufs/ti-j721e-ufs.c

-- 
2.23.0

[PATCH v2 1/2] dt-bindings: ufs: ti,j721e-ufs.yaml: Add binding for TI UFS wrapper

2019-10-10 Thread Vignesh Raghavendra

Add binding documentation of TI wrapper for Cadence UFS Controller.

Signed-off-by: Vignesh Raghavendra 
---

v2:
Define Cadence UFS controller as child node of the wrapper as suggested
by Rob H

 .../devicetree/bindings/ufs/ti,j721e-ufs.yaml | 68 +++
 1 file changed, 68 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/ufs/ti,j721e-ufs.yaml

diff --git a/Documentation/devicetree/bindings/ufs/ti,j721e-ufs.yaml 
b/Documentation/devicetree/bindings/ufs/ti,j721e-ufs.yaml
new file mode 100644
index ..c8a2a92074df
--- /dev/null
+++ b/Documentation/devicetree/bindings/ufs/ti,j721e-ufs.yaml
@@ -0,0 +1,68 @@
+# SPDX-License-Identifier: GPL-2.0
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/ufs/ti,j721e-ufs.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: TI J721e UFS Host Controller Glue Driver
+
+maintainers:
+  - Vignesh Raghavendra 
+
+properties:
+  compatible:
+items:
+  - const: ti,j721e-ufs
+
+  reg:
+maxItems: 1
+description: address of TI UFS glue registers
+
+  clocks:
+maxItems: 1
+description: phandle to the M-PHY clock
+
+  power-domains:
+maxItems: 1
+
+required:
+  - compatible
+  - reg
+  - clocks
+  - power-domains
+
+patternProperties:
+  "^ufs@[0-9a-f]+$":
+type: object
+description: |
+  Cadence UFS controller node must be the child node. Refer
+  Documentation/devicetree/bindings/ufs/cdns,ufshc.txt for binding
+  documentation of child node
+
+examples:
+  - |
+#include 
+#include 
+
+ufs_wrapper: ufs-wrapper@4e8 {
+   compatible = "ti,j721e-ufs";
+   reg = <0x0 0x4e8 0x0 0x100>;
+   power-domains = <&k3_pds 277>;
+   clocks = <&k3_clks 277 1>;
+   assigned-clocks = <&k3_clks 277 1>;
+   assigned-clock-parents = <&k3_clks 277 4>;
+   #address-cells = <2>;
+   #size-cells = <2>;
+
+   ufs@4e84000 {
+  compatible = "cdns,ufshc-m31-16nm", "jedec,ufs-2.0";
+  reg = <0x0 0x4e84000 0x0 0x1>;
+  interrupts = ;
+  freq-table-hz = <1920 1920>;
+  power-domains = <&k3_pds 277>;
+  clocks = <&k3_clks 277 1>;
+  assigned-clocks = <&k3_clks 277 1>;
+  assigned-clock-parents = <&k3_clks 277 4>;
+  clock-names = "core_clk";
+   };
+};
-- 
2.23.0

Re: [PATCH v1 0/2] perf stat: Support --all-kernel and --all-user

2019-10-10 Thread Jin, Yao





On 10/10/2019 4:00 PM, Jiri Olsa wrote:

On Thu, Oct 10, 2019 at 02:46:36PM +0800, Jin, Yao wrote:



On 10/1/2019 10:17 AM, Andi Kleen wrote:

I think it's useful. Makes it easy to do kernel/user break downs.
perf record should support the same.


Don't we have this already with:

[root@quaco ~]# perf stat -e cycles:u,instructions:u,cycles:k,instructions:k -a 
-- sleep 1


This only works for simple cases. Try it for --topdown or multiple -M metrics.

-Andi



Hi Arnaldo, Jiri,

We think it should be very useful if --all-user / --all-kernel can be
specified together, so that we can get a break down between user and kernel
easily.

But yes, the patches for supporting this new semantics is much complicated
than the patch which just follows original perf-record behavior. I fully
understand this concern.

So if this new semantics can be accepted, that would be very good. But if
you think the new semantics is too complicated, I'm also fine for posting a
new patch which just follows the perf-record behavior.


I still need to think a bit more about this.. did you consider
other options like cloning of the perf_evlist/perf_evsel and
changing just the exclude* bits? might be event worse actualy ;-)



That should be another approach, but it might be a bit more complicated 
than just appending ":u"/":k" modifiers to the event name string.



or maybe if we add modifier we could add extra events/groups
within the parser.. like:

   "{cycles,instructions}:A,{cache-misses,cache-references}:A,cycles:A"

but that might be still more complicated then what you did



Yes agree.


also please add the perf record changes so we have same code
and logic for both if we are going to change it



If this new semantics can be accepted, I'd like to add perf record 
supporting as well. :)


Another difficulty for the new semantics is we need to create user and 
kernel stat type in runtime_stat rblist (see patch "perf stat: Support 
topdown with --all-kernel/--all-user"). That has to bring extra complexity.


Thanks
Jin Yao


jirka

[PATCH v2 2/2] scsi: ufs: Add driver for TI wrapper for Cadence UFS IP

2019-10-10 Thread Vignesh Raghavendra

TI's J721e SoC has a Cadence UFS IP with a TI specific wrapper. This is
a minimal driver to configure the wrapper. It releases the UFS slave
device out of reset and sets up registers to indicate PHY reference
clock input frequency before probing child Cadence UFS driver.

Signed-off-by: Vignesh Raghavendra 
---

v2: No change

 drivers/scsi/ufs/Kconfig| 10 
 drivers/scsi/ufs/Makefile   |  1 +
 drivers/scsi/ufs/ti-j721e-ufs.c | 90 +
 3 files changed, 101 insertions(+)
 create mode 100644 drivers/scsi/ufs/ti-j721e-ufs.c

diff --git a/drivers/scsi/ufs/Kconfig b/drivers/scsi/ufs/Kconfig
index 0b845ab7c3bf..d14c2243e02a 100644
--- a/drivers/scsi/ufs/Kconfig
+++ b/drivers/scsi/ufs/Kconfig
@@ -132,6 +132,16 @@ config SCSI_UFS_HISI
  Select this if you have UFS controller on Hisilicon chipset.
  If unsure, say N.
 
+config SCSI_UFS_TI_J721E
+   tristate "TI glue layer for Cadence UFS Controller"
+   depends on OF && HAS_IOMEM && (ARCH_K3 || COMPILE_TEST)
+   help
+ This selects driver for TI glue layer for Cadence UFS Host
+ Controller IP.
+
+ Selects this if you have TI platform with UFS controller.
+ If unsure, say N.
+
 config SCSI_UFS_BSG
bool "Universal Flash Storage BSG device node"
depends on SCSI_UFSHCD
diff --git a/drivers/scsi/ufs/Makefile b/drivers/scsi/ufs/Makefile
index 2a9097939bcb..94c6c5d7334b 100644
--- a/drivers/scsi/ufs/Makefile
+++ b/drivers/scsi/ufs/Makefile
@@ -11,3 +11,4 @@ obj-$(CONFIG_SCSI_UFSHCD_PCI) += ufshcd-pci.o
 obj-$(CONFIG_SCSI_UFSHCD_PLATFORM) += ufshcd-pltfrm.o
 obj-$(CONFIG_SCSI_UFS_HISI) += ufs-hisi.o
 obj-$(CONFIG_SCSI_UFS_MEDIATEK) += ufs-mediatek.o
+obj-$(CONFIG_SCSI_UFS_TI_J721E) += ti-j721e-ufs.o
diff --git a/drivers/scsi/ufs/ti-j721e-ufs.c b/drivers/scsi/ufs/ti-j721e-ufs.c
new file mode 100644
index ..a653bf1902f3
--- /dev/null
+++ b/drivers/scsi/ufs/ti-j721e-ufs.c
@@ -0,0 +1,90 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+// Copyright (C) 2019 Texas Instruments Incorporated - http://www.ti.com/
+//
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define UFS_SS_CTRL0x4
+#define UFS_SS_RST_N_PCS   BIT(0)
+#define UFS_SS_CLK_26MHZ   BIT(4)
+
+static int ti_j721e_ufs_probe(struct platform_device *pdev)
+{
+   struct device *dev = &pdev->dev;
+   unsigned long clk_rate;
+   void __iomem *regbase;
+   struct clk *clk;
+   u32 reg = 0;
+   int ret;
+
+   regbase = devm_platform_ioremap_resource(pdev, 0);
+   if (IS_ERR(regbase))
+   return PTR_ERR(regbase);
+
+   /* Select MPHY refclk frequency */
+   clk = devm_clk_get(dev, NULL);
+   if (IS_ERR(clk)) {
+   dev_err(dev, "Cannot claim MPHY clock.\n");
+   return PTR_ERR(clk);
+   }
+   clk_rate = clk_get_rate(clk);
+   if (clk_rate == 2600)
+   reg |= UFS_SS_CLK_26MHZ;
+   devm_clk_put(dev, clk);
+
+   pm_runtime_enable(dev);
+   ret = pm_runtime_get_sync(dev);
+   if (ret < 0) {
+   pm_runtime_put_noidle(dev);
+   return ret;
+   }
+
+   /*  Take UFS slave device out of reset */
+   reg |= UFS_SS_RST_N_PCS;
+   writel(reg, regbase + UFS_SS_CTRL);
+
+   ret = of_platform_populate(pdev->dev.of_node, NULL, NULL,
+  dev);
+   if (ret) {
+   dev_err(dev, "failed to populate child nodes %d\n", ret);
+   pm_runtime_put_sync(dev);
+   }
+
+   return ret;
+}
+
+static int ti_j721e_ufs_remove(struct platform_device *pdev)
+{
+   of_platform_depopulate(&pdev->dev);
+   pm_runtime_put_sync(&pdev->dev);
+
+   return 0;
+}
+
+static const struct of_device_id ti_j721e_ufs_of_match[] = {
+   {
+   .compatible = "ti,j721e-ufs",
+   },
+   { },
+};
+
+static struct platform_driver ti_j721e_ufs_driver = {
+   .probe  = ti_j721e_ufs_probe,
+   .remove = ti_j721e_ufs_remove,
+   .driver = {
+   .name   = "ti-j721e-ufs",
+   .of_match_table = ti_j721e_ufs_of_match,
+   },
+};
+module_platform_driver(ti_j721e_ufs_driver);
+
+MODULE_AUTHOR("Vignesh Raghavendra ");
+MODULE_DESCRIPTION("TI UFS host controller glue driver");
+MODULE_LICENSE("GPL v2");
-- 
2.23.0

[PATCH 5.3 001/148] s390/process: avoid potential reading of freed stack

2019-10-10 Thread Greg Kroah-Hartman

From: Vasily Gorbik 

commit 8769f610fe6d473e5e8e221709c3ac402037da6c upstream.

With THREAD_INFO_IN_TASK (which is selected on s390) task's stack usage
is refcounted and should always be protected by get/put when touching
other task's stack to avoid race conditions with task's destruction code.

Fixes: d5c352cdd022 ("s390: move thread_info into task_struct")
Cc: sta...@vger.kernel.org # v4.10+
Acked-by: Ilya Leoshkevich 
Signed-off-by: Vasily Gorbik 
Signed-off-by: Greg Kroah-Hartman 

---
 arch/s390/kernel/process.c |   22 --
 1 file changed, 16 insertions(+), 6 deletions(-)

--- a/arch/s390/kernel/process.c
+++ b/arch/s390/kernel/process.c
@@ -184,20 +184,30 @@ unsigned long get_wchan(struct task_stru
 
if (!p || p == current || p->state == TASK_RUNNING || 
!task_stack_page(p))
return 0;
+
+   if (!try_get_task_stack(p))
+   return 0;
+
low = task_stack_page(p);
high = (struct stack_frame *) task_pt_regs(p);
sf = (struct stack_frame *) p->thread.ksp;
-   if (sf <= low || sf > high)
-   return 0;
+   if (sf <= low || sf > high) {
+   return_address = 0;
+   goto out;
+   }
for (count = 0; count < 16; count++) {
sf = (struct stack_frame *) sf->back_chain;
-   if (sf <= low || sf > high)
-   return 0;
+   if (sf <= low || sf > high) {
+   return_address = 0;
+   goto out;
+   }
return_address = sf->gprs[8];
if (!in_sched_functions(return_address))
-   return return_address;
+   goto out;
}
-   return 0;
+out:
+   put_task_stack(p);
+   return return_address;
 }
 
 unsigned long arch_align_stack(unsigned long sp)

Re: [PATCH v2] mm/page_isolation: fix a deadlock with printk()

2019-10-10 Thread Michal Hocko

On Thu 10-10-19 17:16:29, Sergey Senozhatsky wrote:
> On (10/10/19 09:40), Michal Hocko wrote:
> [..]
> > > > Considering that console.write is called from essentially arbitrary code
> > > > path IIUC then all the locks used in this path should be pretty much
> > > > tail locks or console internal ones without external dependencies.
> > > 
> > > That's a good expectation, but I guess it's not always the case.
> > > 
> > > One example might be NET console - net subsystem locks, net device
> > > drivers locks, maybe even some MM locks (skb allocations?).
> > 
> > I am not familiar with the netconsole code TBH. If there is absolutely
> > no way around that then we might have to bite a bullet and consider some
> > of MM locks a land of no printk.
> 
> So this is what netconsole does (before we pass on udp to net device
> driver code, which *may be* can do more allocations, I don't know):
> 
> write_msg()
>  netpoll_send_udp()
>   find_skb()
>alloc_skb(len, GFP_ATOMIC)
> kmem_cache_alloc_node()
> 
> You are the right person to ask this question to - how many MM
> locks are involved when we do GFP_ATOMIC kmem_cache allocation?
> Is there anything to be concerned about?

At least zone->lock might involved. Maybe even more.
-- 
Michal Hocko
SUSE Labs

[PATCH 5.3 010/148] KVM: PPC: Book3S: Enable XIVE native capability only if OPAL has required functions

2019-10-10 Thread Greg Kroah-Hartman

From: Paul Mackerras 

commit 2ad7a27deaf6d78545d97ab80874584f6990360e upstream.

There are some POWER9 machines where the OPAL firmware does not support
the OPAL_XIVE_GET_QUEUE_STATE and OPAL_XIVE_SET_QUEUE_STATE calls.
The impact of this is that a guest using XIVE natively will not be able
to be migrated successfully.  On the source side, the get_attr operation
on the KVM native device for the KVM_DEV_XIVE_GRP_EQ_CONFIG attribute
will fail; on the destination side, the set_attr operation for the same
attribute will fail.

This adds tests for the existence of the OPAL get/set queue state
functions, and if they are not supported, the XIVE-native KVM device
is not created and the KVM_CAP_PPC_IRQ_XIVE capability returns false.
Userspace can then either provide a software emulation of XIVE, or
else tell the guest that it does not have a XIVE controller available
to it.

Cc: sta...@vger.kernel.org # v5.2+
Fixes: 3fab2d10588e ("KVM: PPC: Book3S HV: XIVE: Activate XIVE exploitation 
mode")
Reviewed-by: David Gibson 
Reviewed-by: Cédric Le Goater 
Signed-off-by: Paul Mackerras 
Signed-off-by: Greg Kroah-Hartman 

---
 arch/powerpc/include/asm/kvm_ppc.h|1 +
 arch/powerpc/include/asm/xive.h   |1 +
 arch/powerpc/kvm/book3s.c |8 +---
 arch/powerpc/kvm/book3s_xive_native.c |5 +
 arch/powerpc/kvm/powerpc.c|3 ++-
 arch/powerpc/sysdev/xive/native.c |7 +++
 6 files changed, 21 insertions(+), 4 deletions(-)

--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -598,6 +598,7 @@ extern int kvmppc_xive_native_get_vp(str
 union kvmppc_one_reg *val);
 extern int kvmppc_xive_native_set_vp(struct kvm_vcpu *vcpu,
 union kvmppc_one_reg *val);
+extern bool kvmppc_xive_native_supported(void);
 
 #else
 static inline int kvmppc_xive_set_xive(struct kvm *kvm, u32 irq, u32 server,
--- a/arch/powerpc/include/asm/xive.h
+++ b/arch/powerpc/include/asm/xive.h
@@ -127,6 +127,7 @@ extern int xive_native_get_queue_state(u
 extern int xive_native_set_queue_state(u32 vp_id, uint32_t prio, u32 qtoggle,
   u32 qindex);
 extern int xive_native_get_vp_state(u32 vp_id, u64 *out_state);
+extern bool xive_native_has_queue_state_support(void);
 
 #else
 
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -1083,9 +1083,11 @@ static int kvmppc_book3s_init(void)
if (xics_on_xive()) {
kvmppc_xive_init_module();
kvm_register_device_ops(&kvm_xive_ops, KVM_DEV_TYPE_XICS);
-   kvmppc_xive_native_init_module();
-   kvm_register_device_ops(&kvm_xive_native_ops,
-   KVM_DEV_TYPE_XIVE);
+   if (kvmppc_xive_native_supported()) {
+   kvmppc_xive_native_init_module();
+   kvm_register_device_ops(&kvm_xive_native_ops,
+   KVM_DEV_TYPE_XIVE);
+   }
} else
 #endif
kvm_register_device_ops(&kvm_xics_ops, KVM_DEV_TYPE_XICS);
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -1171,6 +1171,11 @@ int kvmppc_xive_native_set_vp(struct kvm
return 0;
 }
 
+bool kvmppc_xive_native_supported(void)
+{
+   return xive_native_has_queue_state_support();
+}
+
 static int xive_native_debug_show(struct seq_file *m, void *private)
 {
struct kvmppc_xive *xive = m->private;
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -561,7 +561,8 @@ int kvm_vm_ioctl_check_extension(struct
 * a POWER9 processor) and the PowerNV platform, as
 * nested is not yet supported.
 */
-   r = xive_enabled() && !!cpu_has_feature(CPU_FTR_HVMODE);
+   r = xive_enabled() && !!cpu_has_feature(CPU_FTR_HVMODE) &&
+   kvmppc_xive_native_supported();
break;
 #endif
 
--- a/arch/powerpc/sysdev/xive/native.c
+++ b/arch/powerpc/sysdev/xive/native.c
@@ -811,6 +811,13 @@ int xive_native_set_queue_state(u32 vp_i
 }
 EXPORT_SYMBOL_GPL(xive_native_set_queue_state);
 
+bool xive_native_has_queue_state_support(void)
+{
+   return opal_check_token(OPAL_XIVE_GET_QUEUE_STATE) &&
+   opal_check_token(OPAL_XIVE_SET_QUEUE_STATE);
+}
+EXPORT_SYMBOL_GPL(xive_native_has_queue_state_support);
+
 int xive_native_get_vp_state(u32 vp_id, u64 *out_state)
 {
__be64 state;

[PATCH 5.3 012/148] KVM: PPC: Book3S HV: Dont push XIVE context when not using XIVE device

2019-10-10 Thread Greg Kroah-Hartman

From: Paul Mackerras 

commit 8d4ba9c931bc384bcc6889a43915aaaf19d3e499 upstream.

At present, when running a guest on POWER9 using HV KVM but not using
an in-kernel interrupt controller (XICS or XIVE), for example if QEMU
is run with the kernel_irqchip=off option, the guest entry code goes
ahead and tries to load the guest context into the XIVE hardware, even
though no context has been set up.

To fix this, we check that the "CAM word" is non-zero before pushing
it to the hardware.  The CAM word is initialized to a non-zero value
in kvmppc_xive_connect_vcpu() and kvmppc_xive_native_connect_vcpu(),
and is now cleared in kvmppc_xive_{,native_}cleanup_vcpu.

Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt 
controller")
Cc: sta...@vger.kernel.org # v4.12+
Reported-by: Cédric Le Goater 
Signed-off-by: Paul Mackerras 
Reviewed-by: Cédric Le Goater 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20190813100100.GC9567@blackberry
Signed-off-by: Greg Kroah-Hartman 

---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |2 ++
 arch/powerpc/kvm/book3s_xive.c  |   11 ++-
 arch/powerpc/kvm/book3s_xive_native.c   |3 +++
 3 files changed, 15 insertions(+), 1 deletion(-)

--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -942,6 +942,8 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_3
ld  r11, VCPU_XIVE_SAVED_STATE(r4)
li  r9, TM_QW1_OS
lwz r8, VCPU_XIVE_CAM_WORD(r4)
+   cmpwi   r8, 0
+   beq no_xive
li  r7, TM_QW1_OS + TM_WORD2
mfmsr   r0
andi.   r0, r0, MSR_DR  /* in real mode? */
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -67,8 +67,14 @@ void kvmppc_xive_push_vcpu(struct kvm_vc
void __iomem *tima = local_paca->kvm_hstate.xive_tima_virt;
u64 pq;
 
-   if (!tima)
+   /*
+* Nothing to do if the platform doesn't have a XIVE
+* or this vCPU doesn't have its own XIVE context
+* (e.g. because it's not using an in-kernel interrupt controller).
+*/
+   if (!tima || !vcpu->arch.xive_cam_word)
return;
+
eieio();
__raw_writeq(vcpu->arch.xive_saved_state.w01, tima + TM_QW1_OS);
__raw_writel(vcpu->arch.xive_cam_word, tima + TM_QW1_OS + TM_WORD2);
@@ -1146,6 +1152,9 @@ void kvmppc_xive_cleanup_vcpu(struct kvm
/* Disable the VP */
xive_native_disable_vp(xc->vp_id);
 
+   /* Clear the cam word so guest entry won't try to push context */
+   vcpu->arch.xive_cam_word = 0;
+
/* Free the queues */
for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) {
struct xive_q *q = &xc->queues[i];
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -81,6 +81,9 @@ void kvmppc_xive_native_cleanup_vcpu(str
/* Disable the VP */
xive_native_disable_vp(xc->vp_id);
 
+   /* Clear the cam word so guest entry won't try to push context */
+   vcpu->arch.xive_cam_word = 0;
+
/* Free the queues */
for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) {
kvmppc_xive_native_cleanup_queue(vcpu, i);

[PATCH 5.3 014/148] KVM: PPC: Book3S HV: Check for MMU ready on piggybacked virtual cores

2019-10-10 Thread Greg Kroah-Hartman

From: Paul Mackerras 

commit d28eafc5a64045c78136162af9d4ba42f8230080 upstream.

When we are running multiple vcores on the same physical core, they
could be from different VMs and so it is possible that one of the
VMs could have its arch.mmu_ready flag cleared (for example by a
concurrent HPT resize) when we go to run it on a physical core.
We currently check the arch.mmu_ready flag for the primary vcore
but not the flags for the other vcores that will be run alongside
it.  This adds that check, and also a check when we select the
secondary vcores from the preempted vcores list.

Cc: sta...@vger.kernel.org # v4.14+
Fixes: 38c53af85306 ("KVM: PPC: Book3S HV: Fix exclusion between HPT resizing 
and other HPT updates")
Signed-off-by: Paul Mackerras 
Signed-off-by: Greg Kroah-Hartman 

---
 arch/powerpc/kvm/book3s_hv.c |   15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2860,7 +2860,7 @@ static void collect_piggybacks(struct co
if (!spin_trylock(&pvc->lock))
continue;
prepare_threads(pvc);
-   if (!pvc->n_runnable) {
+   if (!pvc->n_runnable || !pvc->kvm->arch.mmu_ready) {
list_del_init(&pvc->preempt_list);
if (pvc->runner == NULL) {
pvc->vcore_state = VCORE_INACTIVE;
@@ -2881,15 +2881,20 @@ static void collect_piggybacks(struct co
spin_unlock(&lp->lock);
 }
 
-static bool recheck_signals(struct core_info *cip)
+static bool recheck_signals_and_mmu(struct core_info *cip)
 {
int sub, i;
struct kvm_vcpu *vcpu;
+   struct kvmppc_vcore *vc;
 
-   for (sub = 0; sub < cip->n_subcores; ++sub)
-   for_each_runnable_thread(i, vcpu, cip->vc[sub])
+   for (sub = 0; sub < cip->n_subcores; ++sub) {
+   vc = cip->vc[sub];
+   if (!vc->kvm->arch.mmu_ready)
+   return true;
+   for_each_runnable_thread(i, vcpu, vc)
if (signal_pending(vcpu->arch.run_task))
return true;
+   }
return false;
 }
 
@@ -3119,7 +3124,7 @@ static noinline void kvmppc_run_core(str
local_irq_disable();
hard_irq_disable();
if (lazy_irq_pending() || need_resched() ||
-   recheck_signals(&core_info) || !vc->kvm->arch.mmu_ready) {
+   recheck_signals_and_mmu(&core_info)) {
local_irq_enable();
vc->vcore_state = VCORE_INACTIVE;
/* Unlock all except the primary vcore */

[PATCH 5.3 015/148] KVM: PPC: Book3S HV: Dont lose pending doorbell request on migration on P9

2019-10-10 Thread Greg Kroah-Hartman

From: Paul Mackerras 

commit ff42df49e75f053a8a6b4c2533100cdcc23afe69 upstream.

On POWER9, when userspace reads the value of the DPDES register on a
vCPU, it is possible for 0 to be returned although there is a doorbell
interrupt pending for the vCPU.  This can lead to a doorbell interrupt
being lost across migration.  If the guest kernel uses doorbell
interrupts for IPIs, then it could malfunction because of the lost
interrupt.

This happens because a newly-generated doorbell interrupt is signalled
by setting vcpu->arch.doorbell_request to 1; the DPDES value in
vcpu->arch.vcore->dpdes is not updated, because it can only be updated
when holding the vcpu mutex, in order to avoid races.

To fix this, we OR in vcpu->arch.doorbell_request when reading the
DPDES value.

Cc: sta...@vger.kernel.org # v4.13+
Fixes: 579006944e0d ("KVM: PPC: Book3S HV: Virtualize doorbell facility on 
POWER9")
Signed-off-by: Paul Mackerras 
Tested-by: Alexey Kardashevskiy 
Signed-off-by: Greg Kroah-Hartman 

---
 arch/powerpc/kvm/book3s_hv.c |9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1678,7 +1678,14 @@ static int kvmppc_get_one_reg_hv(struct
*val = get_reg_val(id, vcpu->arch.pspb);
break;
case KVM_REG_PPC_DPDES:
-   *val = get_reg_val(id, vcpu->arch.vcore->dpdes);
+   /*
+* On POWER9, where we are emulating msgsndp etc.,
+* we return 1 bit for each vcpu, which can come from
+* either vcore->dpdes or doorbell_request.
+* On POWER8, doorbell_request is 0.
+*/
+   *val = get_reg_val(id, vcpu->arch.vcore->dpdes |
+  vcpu->arch.doorbell_request);
break;
case KVM_REG_PPC_VTB:
*val = get_reg_val(id, vcpu->arch.vcore->vtb);

[PATCH 5.3 013/148] KVM: PPC: Book3S HV: Fix race in re-enabling XIVE escalation interrupts

2019-10-10 Thread Greg Kroah-Hartman

From: Paul Mackerras 

commit 959c5d5134786b4988b6fdd08e444aa67d1667ed upstream.

Escalation interrupts are interrupts sent to the host by the XIVE
hardware when it has an interrupt to deliver to a guest VCPU but that
VCPU is not running anywhere in the system.  Hence we disable the
escalation interrupt for the VCPU being run when we enter the guest
and re-enable it when the guest does an H_CEDE hypercall indicating
it is idle.

It is possible that an escalation interrupt gets generated just as we
are entering the guest.  In that case the escalation interrupt may be
using a queue entry in one of the interrupt queues, and that queue
entry may not have been processed when the guest exits with an H_CEDE.
The existing entry code detects this situation and does not clear the
vcpu->arch.xive_esc_on flag as an indication that there is a pending
queue entry (if the queue entry gets processed, xive_esc_irq() will
clear the flag).  There is a comment in the code saying that if the
flag is still set on H_CEDE, we have to abort the cede rather than
re-enabling the escalation interrupt, lest we end up with two
occurrences of the escalation interrupt in the interrupt queue.

However, the exit code doesn't do that; it aborts the cede in the sense
that vcpu->arch.ceded gets cleared, but it still enables the escalation
interrupt by setting the source's PQ bits to 00.  Instead we need to
set the PQ bits to 10, indicating that an interrupt has been triggered.
We also need to avoid setting vcpu->arch.xive_esc_on in this case
(i.e. vcpu->arch.xive_esc_on seen to be set on H_CEDE) because
xive_esc_irq() will run at some point and clear it, and if we race with
that we may end up with an incorrect result (i.e. xive_esc_on set when
the escalation interrupt has just been handled).

It is extremely unlikely that having two queue entries would cause
observable problems; theoretically it could cause queue overflow, but
the CPU would have to have thousands of interrupts targetted to it for
that to be possible.  However, this fix will also make it possible to
determine accurately whether there is an unhandled escalation
interrupt in the queue, which will be needed by the following patch.

Fixes: 9b9b13a6d153 ("KVM: PPC: Book3S HV: Keep XIVE escalation interrupt 
masked unless ceded")
Cc: sta...@vger.kernel.org # v4.16+
Signed-off-by: Paul Mackerras 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20190813100349.GD9567@blackberry
Signed-off-by: Greg Kroah-Hartman 

---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   36 
 1 file changed, 23 insertions(+), 13 deletions(-)

--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -2833,29 +2833,39 @@ kvm_cede_prodded:
 kvm_cede_exit:
ld  r9, HSTATE_KVM_VCPU(r13)
 #ifdef CONFIG_KVM_XICS
-   /* Abort if we still have a pending escalation */
+   /* are we using XIVE with single escalation? */
+   ld  r10, VCPU_XIVE_ESC_VADDR(r9)
+   cmpdi   r10, 0
+   beq 3f
+   li  r6, XIVE_ESB_SET_PQ_00
+   /*
+* If we still have a pending escalation, abort the cede,
+* and we must set PQ to 10 rather than 00 so that we don't
+* potentially end up with two entries for the escalation
+* interrupt in the XIVE interrupt queue.  In that case
+* we also don't want to set xive_esc_on to 1 here in
+* case we race with xive_esc_irq().
+*/
lbz r5, VCPU_XIVE_ESC_ON(r9)
cmpwi   r5, 0
-   beq 1f
+   beq 4f
li  r0, 0
stb r0, VCPU_CEDED(r9)
-1: /* Enable XIVE escalation */
-   li  r5, XIVE_ESB_SET_PQ_00
+   li  r6, XIVE_ESB_SET_PQ_10
+   b   5f
+4: li  r0, 1
+   stb r0, VCPU_XIVE_ESC_ON(r9)
+   /* make sure store to xive_esc_on is seen before xive_esc_irq runs */
+   sync
+5: /* Enable XIVE escalation */
mfmsr   r0
andi.   r0, r0, MSR_DR  /* in real mode? */
beq 1f
-   ld  r10, VCPU_XIVE_ESC_VADDR(r9)
-   cmpdi   r10, 0
-   beq 3f
-   ldx r0, r10, r5
+   ldx r0, r10, r6
b   2f
 1: ld  r10, VCPU_XIVE_ESC_RADDR(r9)
-   cmpdi   r10, 0
-   beq 3f
-   ldcix   r0, r10, r5
+   ldcix   r0, r10, r6
 2: sync
-   li  r0, 1
-   stb r0, VCPU_XIVE_ESC_ON(r9)
 #endif /* CONFIG_KVM_XICS */
 3: b   guest_exit_cont

[PATCH 5.3 011/148] KVM: PPC: Book3S HV: XIVE: Free escalation interrupts before disabling the VP

2019-10-10 Thread Greg Kroah-Hartman

From: Cédric Le Goater 

commit 237aed48c642328ff0ab19b63423634340224a06 upstream.

When a vCPU is brought done, the XIVE VP (Virtual Processor) is first
disabled and then the event notification queues are freed. When freeing
the queues, we check for possible escalation interrupts and free them
also.

But when a XIVE VP is disabled, the underlying XIVE ENDs also are
disabled in OPAL. When an END (Event Notification Descriptor) is
disabled, its ESB pages (ESn and ESe) are disabled and loads return all
1s. Which means that any access on the ESB page of the escalation
interrupt will return invalid values.

When an interrupt is freed, the shutdown handler computes a 'saved_p'
field from the value returned by a load in xive_do_source_set_mask().
This value is incorrect for escalation interrupts for the reason
described above.

This has no impact on Linux/KVM today because we don't make use of it
but we will introduce in future changes a xive_get_irqchip_state()
handler. This handler will use the 'saved_p' field to return the state
of an interrupt and 'saved_p' being incorrect, softlockup will occur.

Fix the vCPU cleanup sequence by first freeing the escalation interrupts
if any, then disable the XIVE VP and last free the queues.

Fixes: 90c73795afa2 ("KVM: PPC: Book3S HV: Add a new KVM device for the XIVE 
native exploitation mode")
Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt 
controller")
Cc: sta...@vger.kernel.org # v4.12+
Signed-off-by: Cédric Le Goater 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20190806172538.5087-1-...@kaod.org
Signed-off-by: Greg Kroah-Hartman 

---
 arch/powerpc/kvm/book3s_xive.c|   18 ++
 arch/powerpc/kvm/book3s_xive_native.c |   12 +++-
 2 files changed, 17 insertions(+), 13 deletions(-)

--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -1134,20 +1134,22 @@ void kvmppc_xive_cleanup_vcpu(struct kvm
/* Mask the VP IPI */
xive_vm_esb_load(&xc->vp_ipi_data, XIVE_ESB_SET_PQ_01);
 
-   /* Disable the VP */
-   xive_native_disable_vp(xc->vp_id);
-
-   /* Free the queues & associated interrupts */
+   /* Free escalations */
for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) {
-   struct xive_q *q = &xc->queues[i];
-
-   /* Free the escalation irq */
if (xc->esc_virq[i]) {
free_irq(xc->esc_virq[i], vcpu);
irq_dispose_mapping(xc->esc_virq[i]);
kfree(xc->esc_virq_names[i]);
}
-   /* Free the queue */
+   }
+
+   /* Disable the VP */
+   xive_native_disable_vp(xc->vp_id);
+
+   /* Free the queues */
+   for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) {
+   struct xive_q *q = &xc->queues[i];
+
xive_native_disable_queue(xc->vp_id, q, i);
if (q->qpage) {
free_pages((unsigned long)q->qpage,
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -67,10 +67,7 @@ void kvmppc_xive_native_cleanup_vcpu(str
xc->valid = false;
kvmppc_xive_disable_vcpu_interrupts(vcpu);
 
-   /* Disable the VP */
-   xive_native_disable_vp(xc->vp_id);
-
-   /* Free the queues & associated interrupts */
+   /* Free escalations */
for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) {
/* Free the escalation irq */
if (xc->esc_virq[i]) {
@@ -79,8 +76,13 @@ void kvmppc_xive_native_cleanup_vcpu(str
kfree(xc->esc_virq_names[i]);
xc->esc_virq[i] = 0;
}
+   }
 
-   /* Free the queue */
+   /* Disable the VP */
+   xive_native_disable_vp(xc->vp_id);
+
+   /* Free the queues */
+   for (i = 0; i < KVMPPC_XIVE_Q_COUNT; i++) {
kvmppc_xive_native_cleanup_queue(vcpu, i);
}

Re: [RFC v2 2/2] x86/kvmclock: Introduce kvm-hostclock clocksource.

2019-10-10 Thread Vitaly Kuznetsov

Suleiman Souhlal  writes:

> When kvm-hostclock is selected, and the host supports it, update our
> timekeeping parameters to be the same as the host.
> This lets us have our time synchronized with the host's,
> even in the presence of host NTP or suspend.
>
> Signed-off-by: Suleiman Souhlal 
> ---
>  arch/x86/Kconfig|   9 ++
>  arch/x86/include/asm/kvmclock.h |  12 +++
>  arch/x86/kernel/Makefile|   2 +
>  arch/x86/kernel/kvmclock.c  |   5 +-
>  arch/x86/kernel/kvmhostclock.c  | 130 
>  include/linux/timekeeper_internal.h |   8 ++
>  kernel/time/timekeeping.c   |   2 +
>  7 files changed, 167 insertions(+), 1 deletion(-)
>  create mode 100644 arch/x86/kernel/kvmhostclock.c
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index d6e1faa28c58..c5b1257ea969 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -839,6 +839,15 @@ config PARAVIRT_TIME_ACCOUNTING
>  config PARAVIRT_CLOCK
>   bool
>  
> +config KVM_HOSTCLOCK
> + bool "kvmclock uses host timekeeping"
> + depends on KVM_GUEST
> + help
> +   Select this option to make the guest use the same timekeeping
> +   parameters as the host. This means that time will be almost
> +   exactly the same between the two. Only works if the host uses "tsc"
> +   clocksource.
> +
>  config JAILHOUSE_GUEST
>   bool "Jailhouse non-root cell support"
>   depends on X86_64 && PCI
> diff --git a/arch/x86/include/asm/kvmclock.h b/arch/x86/include/asm/kvmclock.h
> index eceea9299097..de1a590ff97e 100644
> --- a/arch/x86/include/asm/kvmclock.h
> +++ b/arch/x86/include/asm/kvmclock.h
> @@ -2,6 +2,18 @@
>  #ifndef _ASM_X86_KVM_CLOCK_H
>  #define _ASM_X86_KVM_CLOCK_H
>  
> +#include 
> +
>  extern struct clocksource kvm_clock;
>  
> +unsigned long kvm_get_tsc_khz(void);
> +
> +#ifdef CONFIG_KVM_HOSTCLOCK
> +void kvm_hostclock_init(void);
> +#else
> +static inline void kvm_hostclock_init(void)
> +{
> +}
> +#endif /* KVM_HOSTCLOCK */
> +
>  #endif /* _ASM_X86_KVM_CLOCK_H */
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index 3578ad248bc9..bc7be935fc5e 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -17,6 +17,7 @@ CFLAGS_REMOVE_tsc.o = -pg
>  CFLAGS_REMOVE_paravirt-spinlocks.o = -pg
>  CFLAGS_REMOVE_pvclock.o = -pg
>  CFLAGS_REMOVE_kvmclock.o = -pg
> +CFLAGS_REMOVE_kvmhostclock.o = -pg
>  CFLAGS_REMOVE_ftrace.o = -pg
>  CFLAGS_REMOVE_early_printk.o = -pg
>  CFLAGS_REMOVE_head64.o = -pg
> @@ -112,6 +113,7 @@ obj-$(CONFIG_AMD_NB)  += amd_nb.o
>  obj-$(CONFIG_DEBUG_NMI_SELFTEST) += nmi_selftest.o
>  
>  obj-$(CONFIG_KVM_GUEST)  += kvm.o kvmclock.o
> +obj-$(CONFIG_KVM_HOSTCLOCK)  += kvmhostclock.o
>  obj-$(CONFIG_PARAVIRT)   += paravirt.o paravirt_patch.o
>  obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= paravirt-spinlocks.o
>  obj-$(CONFIG_PARAVIRT_CLOCK) += pvclock.o
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index 904494b924c1..4ab862de9777 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -125,7 +125,7 @@ static inline void kvm_sched_clock_init(bool stable)
>   * poll of guests can be running and trouble each other. So we preset
>   * lpj here
>   */
> -static unsigned long kvm_get_tsc_khz(void)
> +unsigned long kvm_get_tsc_khz(void)
>  {
>   setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
>   return pvclock_tsc_khz(this_cpu_pvti());
> @@ -366,5 +366,8 @@ void __init kvmclock_init(void)
>   kvm_clock.rating = 299;
>  
>   clocksource_register_hz(&kvm_clock, NSEC_PER_SEC);
> +
> + kvm_hostclock_init();
> +
>   pv_info.name = "KVM";
>  }
> diff --git a/arch/x86/kernel/kvmhostclock.c b/arch/x86/kernel/kvmhostclock.c
> new file mode 100644
> index ..9971343c2bed
> --- /dev/null
> +++ b/arch/x86/kernel/kvmhostclock.c
> @@ -0,0 +1,130 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * KVM clocksource that uses host timekeeping.
> + * Copyright (c) 2019 Suleiman Souhlal, Google LLC
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +struct pvclock_timekeeper pv_timekeeper;
> +
> +static bool pv_timekeeper_enabled;
> +static bool pv_timekeeper_present;
> +static int old_vclock_mode;
> +
> +static u64
> +kvm_hostclock_get_cycles(struct clocksource *cs)
> +{
> + return rdtsc_ordered();
> +}
> +
> +static int
> +kvm_hostclock_enable(struct clocksource *cs)
> +{
> + pv_timekeeper_enabled = 1;
> +
> + old_vclock_mode = kvm_clock.archdata.vclock_mode;
> + kvm_clock.archdata.vclock_mode = VCLOCK_TSC;
> + return 0;
> +}
> +
> +static void
> +kvm_hostclock_disable(struct clocksource *cs)
> +{
> + pv_timekeeper_enabled = 0;
> + kvm_clock.archdata.vclock_mode = old_vclock_mode;
> +}
> +
> +struct clocksource kvm_hostclock = {
> + .name = "kvm-hostclock",
> + .read = kvm_hostclock_get_cycles,
> +

[PATCH 5.3 017/148] nbd: fix max number of supported devs

2019-10-10 Thread Greg Kroah-Hartman

From: Mike Christie 

commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4 upstream.

This fixes a bug added in 4.10 with commit:

commit 9561a7ade0c205bc2ee035a2ac880478dcc1a024
Author: Josef Bacik 
Date:   Tue Nov 22 14:04:40 2016 -0500

nbd: add multi-connection support

that limited the number of devices to 256. Before the patch we could
create 1000s of devices, but the patch switched us from using our
own thread to using a work queue which has a default limit of 256
active works.

The problem is that our recv_work function sits in a loop until
disconnection but only handles IO for one connection. The work is
started when the connection is started/restarted, but if we end up
creating 257 or more connections, the queue_work call just queues
connection257+'s recv_work and that waits for connection 1 - 256's
recv_work to be disconnected and that work instance completing.

Instead of reverting back to kthreads, this has us allocate a
workqueue_struct per device, so we can block in the work.

Cc: sta...@vger.kernel.org
Reviewed-by: Josef Bacik 
Signed-off-by: Mike Christie 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/block/nbd.c |   39 +--
 1 file changed, 25 insertions(+), 14 deletions(-)

--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -108,6 +108,7 @@ struct nbd_device {
struct nbd_config *config;
struct mutex config_lock;
struct gendisk *disk;
+   struct workqueue_struct *recv_workq;
 
struct list_head list;
struct task_struct *task_recv;
@@ -138,7 +139,6 @@ static struct dentry *nbd_dbg_dir;
 
 static unsigned int nbds_max = 16;
 static int max_part = 16;
-static struct workqueue_struct *recv_workqueue;
 static int part_shift;
 
 static int nbd_dev_dbg_init(struct nbd_device *nbd);
@@ -1038,7 +1038,7 @@ static int nbd_reconnect_socket(struct n
/* We take the tx_mutex in an error path in the recv_work, so we
 * need to queue_work outside of the tx_mutex.
 */
-   queue_work(recv_workqueue, &args->work);
+   queue_work(nbd->recv_workq, &args->work);
 
atomic_inc(&config->live_connections);
wake_up(&config->conn_wait);
@@ -1139,6 +1139,10 @@ static void nbd_config_put(struct nbd_de
kfree(nbd->config);
nbd->config = NULL;
 
+   if (nbd->recv_workq)
+   destroy_workqueue(nbd->recv_workq);
+   nbd->recv_workq = NULL;
+
nbd->tag_set.timeout = 0;
nbd->disk->queue->limits.discard_granularity = 0;
nbd->disk->queue->limits.discard_alignment = 0;
@@ -1167,6 +1171,14 @@ static int nbd_start_device(struct nbd_d
return -EINVAL;
}
 
+   nbd->recv_workq = alloc_workqueue("knbd%d-recv",
+ WQ_MEM_RECLAIM | WQ_HIGHPRI |
+ WQ_UNBOUND, 0, nbd->index);
+   if (!nbd->recv_workq) {
+   dev_err(disk_to_dev(nbd->disk), "Could not allocate knbd recv 
work queue.\n");
+   return -ENOMEM;
+   }
+
blk_mq_update_nr_hw_queues(&nbd->tag_set, config->num_connections);
nbd->task_recv = current;
 
@@ -1197,7 +1209,7 @@ static int nbd_start_device(struct nbd_d
INIT_WORK(&args->work, recv_work);
args->nbd = nbd;
args->index = i;
-   queue_work(recv_workqueue, &args->work);
+   queue_work(nbd->recv_workq, &args->work);
}
nbd_size_update(nbd);
return error;
@@ -1217,8 +1229,10 @@ static int nbd_start_device_ioctl(struct
mutex_unlock(&nbd->config_lock);
ret = wait_event_interruptible(config->recv_wq,
 atomic_read(&config->recv_threads) == 
0);
-   if (ret)
+   if (ret) {
sock_shutdown(nbd);
+   flush_workqueue(nbd->recv_workq);
+   }
mutex_lock(&nbd->config_lock);
nbd_bdev_reset(bdev);
/* user requested, ignore socket errors */
@@ -1877,6 +1891,12 @@ static void nbd_disconnect_and_put(struc
nbd_disconnect(nbd);
nbd_clear_sock(nbd);
mutex_unlock(&nbd->config_lock);
+   /*
+* Make sure recv thread has finished, so it does not drop the last
+* config ref and try to destroy the workqueue from inside the work
+* queue.
+*/
+   flush_workqueue(nbd->recv_workq);
if (test_and_clear_bit(NBD_HAS_CONFIG_REF,
   &nbd->config->runtime_flags))
nbd_config_put(nbd);
@@ -2263,20 +2283,12 @@ static int __init nbd_init(void)
 
if (nbds_max > 1UL << (MINORBITS - part_shift))
return -EINVAL;
-   recv_workqueue = alloc_workqueue("knbd-recv",
-WQ_MEM_RECLAIM | WQ_HIGHPRI |
-

[PATCH 5.3 020/148] ASoC: sgtl5000: Improve VAG power and mute control

2019-10-10 Thread Greg Kroah-Hartman

From: Oleksandr Suvorov 

commit b1f373a11d25fc9a5f7679c9b85799fe09b0dc4a upstream.

VAG power control is improved to fit the manual [1]. This patch fixes as
minimum one bug: if customer muxes Headphone to Line-In right after boot,
the VAG power remains off that leads to poor sound quality from line-in.

I.e. after boot:
  - Connect sound source to Line-In jack;
  - Connect headphone to HP jack;
  - Run following commands:
  $ amixer set 'Headphone' 80%
  $ amixer set 'Headphone Mux' LINE_IN

Change VAG power on/off control according to the following algorithm:
  - turn VAG power ON on the 1st incoming event.
  - keep it ON if there is any active VAG consumer (ADC/DAC/HP/Line-In).
  - turn VAG power OFF when there is the latest consumer's pre-down event
come.
  - always delay after VAG power OFF to avoid pop.
  - delay after VAG power ON if the initiative consumer is Line-In, this
prevents pop during line-in muxing.

According to the data sheet [1], to avoid any pops/clicks,
the outputs should be muted during input/output
routing changes.

[1] https://www.nxp.com/docs/en/data-sheet/SGTL5000.pdf

Cc: sta...@vger.kernel.org
Fixes: 9b34e6cc3bc2 ("ASoC: Add Freescale SGTL5000 codec support")
Signed-off-by: Oleksandr Suvorov 
Reviewed-by: Marcel Ziswiler 
Reviewed-by: Fabio Estevam 
Reviewed-by: Cezary Rojewski 
Link: 
https://lore.kernel.org/r/20190719100524.23300-3-oleksandr.suvo...@toradex.com
Signed-off-by: Mark Brown 
Signed-off-by: Greg Kroah-Hartman 

---
 sound/soc/codecs/sgtl5000.c |  224 ++--
 1 file changed, 194 insertions(+), 30 deletions(-)

--- a/sound/soc/codecs/sgtl5000.c
+++ b/sound/soc/codecs/sgtl5000.c
@@ -31,6 +31,13 @@
 #define SGTL5000_DAP_REG_OFFSET0x0100
 #define SGTL5000_MAX_REG_OFFSET0x013A
 
+/* Delay for the VAG ramp up */
+#define SGTL5000_VAG_POWERUP_DELAY 500 /* ms */
+/* Delay for the VAG ramp down */
+#define SGTL5000_VAG_POWERDOWN_DELAY 500 /* ms */
+
+#define SGTL5000_OUTPUTS_MUTE (SGTL5000_HP_MUTE | SGTL5000_LINE_OUT_MUTE)
+
 /* default value of sgtl5000 registers */
 static const struct reg_default sgtl5000_reg_defaults[] = {
{ SGTL5000_CHIP_DIG_POWER,  0x },
@@ -123,6 +130,13 @@ enum  {
I2S_SCLK_STRENGTH_HIGH,
 };
 
+enum {
+   HP_POWER_EVENT,
+   DAC_POWER_EVENT,
+   ADC_POWER_EVENT,
+   LAST_POWER_EVENT = ADC_POWER_EVENT
+};
+
 /* sgtl5000 private structure in codec */
 struct sgtl5000_priv {
int sysclk; /* sysclk rate */
@@ -137,8 +151,109 @@ struct sgtl5000_priv {
u8 micbias_voltage;
u8 lrclk_strength;
u8 sclk_strength;
+   u16 mute_state[LAST_POWER_EVENT + 1];
 };
 
+static inline int hp_sel_input(struct snd_soc_component *component)
+{
+   return (snd_soc_component_read32(component, SGTL5000_CHIP_ANA_CTRL) &
+   SGTL5000_HP_SEL_MASK) >> SGTL5000_HP_SEL_SHIFT;
+}
+
+static inline u16 mute_output(struct snd_soc_component *component,
+ u16 mute_mask)
+{
+   u16 mute_reg = snd_soc_component_read32(component,
+ SGTL5000_CHIP_ANA_CTRL);
+
+   snd_soc_component_update_bits(component, SGTL5000_CHIP_ANA_CTRL,
+   mute_mask, mute_mask);
+   return mute_reg;
+}
+
+static inline void restore_output(struct snd_soc_component *component,
+ u16 mute_mask, u16 mute_reg)
+{
+   snd_soc_component_update_bits(component, SGTL5000_CHIP_ANA_CTRL,
+   mute_mask, mute_reg);
+}
+
+static void vag_power_on(struct snd_soc_component *component, u32 source)
+{
+   if (snd_soc_component_read32(component, SGTL5000_CHIP_ANA_POWER) &
+   SGTL5000_VAG_POWERUP)
+   return;
+
+   snd_soc_component_update_bits(component, SGTL5000_CHIP_ANA_POWER,
+   SGTL5000_VAG_POWERUP, SGTL5000_VAG_POWERUP);
+
+   /* When VAG powering on to get local loop from Line-In, the sleep
+* is required to avoid loud pop.
+*/
+   if (hp_sel_input(component) == SGTL5000_HP_SEL_LINE_IN &&
+   source == HP_POWER_EVENT)
+   msleep(SGTL5000_VAG_POWERUP_DELAY);
+}
+
+static int vag_power_consumers(struct snd_soc_component *component,
+  u16 ana_pwr_reg, u32 source)
+{
+   int consumers = 0;
+
+   /* count dac/adc consumers unconditional */
+   if (ana_pwr_reg & SGTL5000_DAC_POWERUP)
+   consumers++;
+   if (ana_pwr_reg & SGTL5000_ADC_POWERUP)
+   consumers++;
+
+   /*
+* If the event comes from HP and Line-In is selected,
+* current action is 'DAC to be powered down'.
+* As HP_POWERUP is not set when HP muxed to line-in,
+* we need to keep VAG power ON.
+*/
+   if (source == HP_POWER_EVENT) {
+   if (hp_sel_input(component) == SGTL5000_HP_SEL_LINE_IN)
+   consumers++;
+   }

[PATCH 5.3 022/148] powerpc/mce: Fix MCE handling for huge pages

2019-10-10 Thread Greg Kroah-Hartman

From: Balbir Singh 

commit 99ead78afd1128bfcebe7f88f3b102fb2da09aee upstream.

The current code would fail on huge pages addresses, since the shift would
be incorrect. Use the correct page shift value returned by
__find_linux_pte() to get the correct physical address. The code is more
generic and can handle both regular and compound pages.

Fixes: ba41e1e1ccb9 ("powerpc/mce: Hookup derror (load/store) UE errors")
Signed-off-by: Balbir Singh 
[ar...@linux.ibm.com: Fixup pseries_do_memory_failure()]
Signed-off-by: Reza Arbab 
Tested-by: Mahesh Salgaonkar 
Signed-off-by: Santosh Sivaraj 
Cc: sta...@vger.kernel.org # v4.15+
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20190820081352.8641-3-sant...@fossix.org
Signed-off-by: Greg Kroah-Hartman 

---
 arch/powerpc/kernel/mce_power.c |   19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

--- a/arch/powerpc/kernel/mce_power.c
+++ b/arch/powerpc/kernel/mce_power.c
@@ -26,6 +26,7 @@
 unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr)
 {
pte_t *ptep;
+   unsigned int shift;
unsigned long flags;
struct mm_struct *mm;
 
@@ -35,13 +36,18 @@ unsigned long addr_to_pfn(struct pt_regs
mm = &init_mm;
 
local_irq_save(flags);
-   if (mm == current->mm)
-   ptep = find_current_mm_pte(mm->pgd, addr, NULL, NULL);
-   else
-   ptep = find_init_mm_pte(addr, NULL);
+   ptep = __find_linux_pte(mm->pgd, addr, NULL, &shift);
local_irq_restore(flags);
+
if (!ptep || pte_special(*ptep))
return ULONG_MAX;
+
+   if (shift > PAGE_SHIFT) {
+   unsigned long rpnmask = (1ul << shift) - PAGE_SIZE;
+
+   return pte_pfn(__pte(pte_val(*ptep) | (addr & rpnmask)));
+   }
+
return pte_pfn(*ptep);
 }
 
@@ -344,7 +350,7 @@ static const struct mce_derror_table mce
   MCE_INITIATOR_CPU,   MCE_SEV_SEVERE, true },
 { 0, false, 0, 0, 0, 0, 0 } };
 
-static int mce_find_instr_ea_and_pfn(struct pt_regs *regs, uint64_t *addr,
+static int mce_find_instr_ea_and_phys(struct pt_regs *regs, uint64_t *addr,
uint64_t *phys_addr)
 {
/*
@@ -541,7 +547,8 @@ static int mce_handle_derror(struct pt_r
 * kernel/exception-64s.h
 */
if (get_paca()->in_mce < MAX_MCE_DEPTH)
-   mce_find_instr_ea_and_pfn(regs, addr, 
phys_addr);
+   mce_find_instr_ea_and_phys(regs, addr,
+  phys_addr);
}
found = 1;
}

[PATCH 5.3 023/148] powerpc/mce: Schedule work from irq_work

2019-10-10 Thread Greg Kroah-Hartman

From: Santosh Sivaraj 

commit b5bda6263cad9a927e1a4edb7493d542da0c1410 upstream.

schedule_work() cannot be called from MCE exception context as MCE can
interrupt even in interrupt disabled context.

Fixes: 733e4a4c4467 ("powerpc/mce: hookup memory_failure for UE errors")
Cc: sta...@vger.kernel.org # v4.15+
Reviewed-by: Mahesh Salgaonkar 
Reviewed-by: Nicholas Piggin 
Acked-by: Balbir Singh 
Signed-off-by: Santosh Sivaraj 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20190820081352.8641-2-sant...@fossix.org
Signed-off-by: Greg Kroah-Hartman 

---
 arch/powerpc/kernel/mce.c |   11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -33,6 +33,7 @@ static DEFINE_PER_CPU(struct machine_che
mce_ue_event_queue);
 
 static void machine_check_process_queued_event(struct irq_work *work);
+static void machine_check_ue_irq_work(struct irq_work *work);
 void machine_check_ue_event(struct machine_check_event *evt);
 static void machine_process_ue_event(struct work_struct *work);
 
@@ -40,6 +41,10 @@ static struct irq_work mce_event_process
 .func = machine_check_process_queued_event,
 };
 
+static struct irq_work mce_ue_event_irq_work = {
+   .func = machine_check_ue_irq_work,
+};
+
 DECLARE_WORK(mce_ue_event_work, machine_process_ue_event);
 
 static void mce_set_error_info(struct machine_check_event *mce,
@@ -199,6 +204,10 @@ void release_mce_event(void)
get_mce_event(NULL, true);
 }
 
+static void machine_check_ue_irq_work(struct irq_work *work)
+{
+   schedule_work(&mce_ue_event_work);
+}
 
 /*
  * Queue up the MCE event which then can be handled later.
@@ -216,7 +225,7 @@ void machine_check_ue_event(struct machi
memcpy(this_cpu_ptr(&mce_ue_event_queue[index]), evt, sizeof(*evt));
 
/* Queue work to process this event later. */
-   schedule_work(&mce_ue_event_work);
+   irq_work_queue(&mce_ue_event_irq_work);
 }
 
 /*

[PATCH 5.3 018/148] PM / devfreq: tegra: Fix kHz to Hz conversion

2019-10-10 Thread Greg Kroah-Hartman

From: Dmitry Osipenko 

commit 62bacb06b9f08965c4ef10e17875450490c948c0 upstream.

The kHz to Hz is incorrectly converted in a few places in the code,
this results in a wrong frequency being calculated because devfreq core
uses OPP frequencies that are given in Hz to clamp the rate, while
tegra-devfreq gives to the core value in kHz and then it also expects to
receive value in kHz from the core. In a result memory freq is always set
to a value which is close to ULONG_MAX because of the bug. Hence the EMC
frequency is always capped to the maximum and the driver doesn't do
anything useful. This patch was tested on Tegra30 and Tegra124 SoC's, EMC
frequency scaling works properly now.

Cc:  # 4.14+
Tested-by: Steev Klimaszewski 
Reviewed-by: Chanwoo Choi 
Signed-off-by: Dmitry Osipenko 
Acked-by: Thierry Reding 
Signed-off-by: MyungJoo Ham 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/devfreq/tegra-devfreq.c |   12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

--- a/drivers/devfreq/tegra-devfreq.c
+++ b/drivers/devfreq/tegra-devfreq.c
@@ -474,11 +474,11 @@ static int tegra_devfreq_target(struct d
 {
struct tegra_devfreq *tegra = dev_get_drvdata(dev);
struct dev_pm_opp *opp;
-   unsigned long rate = *freq * KHZ;
+   unsigned long rate;
 
-   opp = devfreq_recommended_opp(dev, &rate, flags);
+   opp = devfreq_recommended_opp(dev, freq, flags);
if (IS_ERR(opp)) {
-   dev_err(dev, "Failed to find opp for %lu KHz\n", *freq);
+   dev_err(dev, "Failed to find opp for %lu Hz\n", *freq);
return PTR_ERR(opp);
}
rate = dev_pm_opp_get_freq(opp);
@@ -487,8 +487,6 @@ static int tegra_devfreq_target(struct d
clk_set_min_rate(tegra->emc_clock, rate);
clk_set_rate(tegra->emc_clock, 0);
 
-   *freq = rate;
-
return 0;
 }
 
@@ -498,7 +496,7 @@ static int tegra_devfreq_get_dev_status(
struct tegra_devfreq *tegra = dev_get_drvdata(dev);
struct tegra_devfreq_device *actmon_dev;
 
-   stat->current_frequency = tegra->cur_freq;
+   stat->current_frequency = tegra->cur_freq * KHZ;
 
/* To be used by the tegra governor */
stat->private_data = tegra;
@@ -553,7 +551,7 @@ static int tegra_governor_get_target(str
target_freq = max(target_freq, dev->target_freq);
}
 
-   *freq = target_freq;
+   *freq = target_freq * KHZ;
 
return 0;
 }

[PATCH 5.3 027/148] powerpc/powernv: Restrict OPAL symbol map to only be readable by root

2019-10-10 Thread Greg Kroah-Hartman

From: Andrew Donnellan 

commit e7de4f7b64c23e503a8c42af98d56f2a7462bd6d upstream.

Currently the OPAL symbol map is globally readable, which seems bad as
it contains physical addresses.

Restrict it to root.

Fixes: c8742f85125d ("powerpc/powernv: Expose OPAL firmware symbol map")
Cc: sta...@vger.kernel.org # v3.19+
Suggested-by: Michael Ellerman 
Signed-off-by: Andrew Donnellan 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20190503075253.22798-1-...@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman 

---
 arch/powerpc/platforms/powernv/opal.c |   11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -705,7 +705,10 @@ static ssize_t symbol_map_read(struct fi
   bin_attr->size);
 }
 
-static BIN_ATTR_RO(symbol_map, 0);
+static struct bin_attribute symbol_map_attr = {
+   .attr = {.name = "symbol_map", .mode = 0400},
+   .read = symbol_map_read
+};
 
 static void opal_export_symmap(void)
 {
@@ -722,10 +725,10 @@ static void opal_export_symmap(void)
return;
 
/* Setup attributes */
-   bin_attr_symbol_map.private = __va(be64_to_cpu(syms[0]));
-   bin_attr_symbol_map.size = be64_to_cpu(syms[1]);
+   symbol_map_attr.private = __va(be64_to_cpu(syms[0]));
+   symbol_map_attr.size = be64_to_cpu(syms[1]);
 
-   rc = sysfs_create_bin_file(opal_kobj, &bin_attr_symbol_map);
+   rc = sysfs_create_bin_file(opal_kobj, &symbol_map_attr);
if (rc)
pr_warn("Error %d creating OPAL symbols file\n", rc);
 }

[PATCH 5.3 029/148] powerpc/powernv/ioda: Fix race in TCE level allocation

2019-10-10 Thread Greg Kroah-Hartman

From: Alexey Kardashevskiy 

commit 56090a3902c80c296e822d11acdb6a101b322c52 upstream.

pnv_tce() returns a pointer to a TCE entry and originally a TCE table
would be pre-allocated. For the default case of 2GB window the table
needs only a single level and that is fine. However if more levels are
requested, it is possible to get a race when 2 threads want a pointer
to a TCE entry from the same page of TCEs.

This adds cmpxchg to handle the race. Note that once TCE is non-zero,
it cannot become zero again.

Fixes: a68bd1267b72 ("powerpc/powernv/ioda: Allocate indirect TCE levels on 
demand")
CC: sta...@vger.kernel.org # v4.19+
Signed-off-by: Alexey Kardashevskiy 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20190718051139.74787-2-...@ozlabs.ru
Signed-off-by: Greg Kroah-Hartman 

---
 arch/powerpc/platforms/powernv/pci-ioda-tce.c |   18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

--- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
@@ -49,6 +49,9 @@ static __be64 *pnv_alloc_tce_level(int n
return addr;
 }
 
+static void pnv_pci_ioda2_table_do_free_pages(__be64 *addr,
+   unsigned long size, unsigned int levels);
+
 static __be64 *pnv_tce(struct iommu_table *tbl, bool user, long idx, bool 
alloc)
 {
__be64 *tmp = user ? tbl->it_userspace : (__be64 *) tbl->it_base;
@@ -58,9 +61,9 @@ static __be64 *pnv_tce(struct iommu_tabl
 
while (level) {
int n = (idx & mask) >> (level * shift);
-   unsigned long tce;
+   unsigned long oldtce, tce = be64_to_cpu(READ_ONCE(tmp[n]));
 
-   if (tmp[n] == 0) {
+   if (!tce) {
__be64 *tmp2;
 
if (!alloc)
@@ -71,10 +74,15 @@ static __be64 *pnv_tce(struct iommu_tabl
if (!tmp2)
return NULL;
 
-   tmp[n] = cpu_to_be64(__pa(tmp2) |
-   TCE_PCI_READ | TCE_PCI_WRITE);
+   tce = __pa(tmp2) | TCE_PCI_READ | TCE_PCI_WRITE;
+   oldtce = be64_to_cpu(cmpxchg(&tmp[n], 0,
+   cpu_to_be64(tce)));
+   if (oldtce) {
+   pnv_pci_ioda2_table_do_free_pages(tmp2,
+   ilog2(tbl->it_level_size) + 3, 1);
+   tce = oldtce;
+   }
}
-   tce = be64_to_cpu(tmp[n]);
 
tmp = __va(tce & ~(TCE_PCI_READ | TCE_PCI_WRITE));
idx &= ~mask;

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1192 matches

Mail list logo