date:20190514

Re: [RESEND PATCH] powerpc/pseries: Fix cpu_hotplug_lock acquisition in resize_hpt

2019-05-14 Thread Michael Ellerman

"Gautham R. Shenoy"  writes:
> From: "Gautham R. Shenoy" 
>
> During a memory hotplug operations involving resizing of the HPT, we
> invoke a stop_machine() to perform the resizing. In this code path, we
> end up recursively taking the cpu_hotplug_lock, first in
> memory_hotplug_begin() and then subsequently in stop_machine(). This
> causes the system to hang.

This implies we have never tested a memory hotplug that resized the HPT.
Is that really true? Or did something change?

> With lockdep enabled we get the following
> error message before the hang.
>
>   swapper/0/1 is trying to acquire lock:
>   (ptrval) (cpu_hotplug_lock.rw_sem){}, at: stop_machine+0x2c/0x60
>
>   but task is already holding lock:
>   (ptrval) (cpu_hotplug_lock.rw_sem){}, at: 
> mem_hotplug_begin+0x20/0x50

Do we have the full stack trace?

>   other info that might help us debug this:
>Possible unsafe locking scenario:
>
>  CPU0
>  
> lock(cpu_hotplug_lock.rw_sem);
> lock(cpu_hotplug_lock.rw_sem);
>
>*** DEADLOCK ***
>
> Fix this issue by
>   1) Requiring all the calls to pseries_lpar_resize_hpt() be made
>  with cpu_hotplug_lock held.
>
>   2) In pseries_lpar_resize_hpt() invoke stop_machine_cpuslocked()
>  as a consequence of 1)
>
>   3) To satisfy 1), in hpt_order_set(), call mmu_hash_ops.resize_hpt()
>  with cpu_hotplug_lock held.
>
> Reported-by: Aneesh Kumar K.V 
> Signed-off-by: Gautham R. Shenoy 
> ---
>
> Rebased this one against powerpc/next instead of linux/master.
>
>  arch/powerpc/mm/book3s64/hash_utils.c | 9 -
>  arch/powerpc/platforms/pseries/lpar.c | 8 ++--
>  2 files changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
> b/arch/powerpc/mm/book3s64/hash_utils.c
> index 919a861..d07fcafd 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -38,6 +38,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -1928,10 +1929,16 @@ static int hpt_order_get(void *data, u64 *val)
>  
>  static int hpt_order_set(void *data, u64 val)
>  {
> + int ret;
> +
>   if (!mmu_hash_ops.resize_hpt)
>   return -ENODEV;
>  
> - return mmu_hash_ops.resize_hpt(val);
> + cpus_read_lock();
> + ret = mmu_hash_ops.resize_hpt(val);
> + cpus_read_unlock();
> +
> + return ret;
>  }
>  
>  DEFINE_DEBUGFS_ATTRIBUTE(fops_hpt_order, hpt_order_get, hpt_order_set, 
> "%llu\n");
> diff --git a/arch/powerpc/platforms/pseries/lpar.c 
> b/arch/powerpc/platforms/pseries/lpar.c
> index 1034ef1..2fc9756 100644
> --- a/arch/powerpc/platforms/pseries/lpar.c
> +++ b/arch/powerpc/platforms/pseries/lpar.c
> @@ -859,7 +859,10 @@ static int pseries_lpar_resize_hpt_commit(void *data)
>   return 0;
>  }
>  
> -/* Must be called in user context */
> +/*
> + * Must be called in user context. The caller should hold the

I realise you're just copying that comment, but it seems wrong. "user
context" means userspace. I think it means "process context" doesn't it?

Also "should" should be "must" :)

> + * cpus_lock.
> + */
>  static int pseries_lpar_resize_hpt(unsigned long shift)
>  {
>   struct hpt_resize_state state = {
> @@ -913,7 +916,8 @@ static int pseries_lpar_resize_hpt(unsigned long shift)
>  
>   t1 = ktime_get();
>  
> - rc = stop_machine(pseries_lpar_resize_hpt_commit, &state, NULL);
> + rc = stop_machine_cpuslocked(pseries_lpar_resize_hpt_commit,
> +  &state, NULL);
>  
>   t2 = ktime_get();

cheers

Re: [RESEND PATCH] powerpc/pseries: Fix cpu_hotplug_lock acquisition in resize_hpt

2019-05-14 Thread Michael Ellerman

"Gautham R. Shenoy"  writes:
> From: "Gautham R. Shenoy" 
>
> Subject: Re: [RESEND PATCH] powerpc/pseries: Fix cpu_hotplug_lock acquisition 
> in resize_hpt

ps. A "RESEND" implies the patch is unchanged and you're just resending
it because it was ignored.

In this case it should have just been "PATCH v2", with a note below the "---"
saying "v2: Rebased onto powerpc/next ..."

cheers

> During a memory hotplug operations involving resizing of the HPT, we
> invoke a stop_machine() to perform the resizing. In this code path, we
> end up recursively taking the cpu_hotplug_lock, first in
> memory_hotplug_begin() and then subsequently in stop_machine(). This
> causes the system to hang. With lockdep enabled we get the following
> error message before the hang.
>
>   swapper/0/1 is trying to acquire lock:
>   (ptrval) (cpu_hotplug_lock.rw_sem){}, at: stop_machine+0x2c/0x60
>
>   but task is already holding lock:
>   (ptrval) (cpu_hotplug_lock.rw_sem){}, at: 
> mem_hotplug_begin+0x20/0x50
>
>   other info that might help us debug this:
>Possible unsafe locking scenario:
>
>  CPU0
>  
> lock(cpu_hotplug_lock.rw_sem);
> lock(cpu_hotplug_lock.rw_sem);
>
>*** DEADLOCK ***
>
> Fix this issue by
>   1) Requiring all the calls to pseries_lpar_resize_hpt() be made
>  with cpu_hotplug_lock held.
>
>   2) In pseries_lpar_resize_hpt() invoke stop_machine_cpuslocked()
>  as a consequence of 1)
>
>   3) To satisfy 1), in hpt_order_set(), call mmu_hash_ops.resize_hpt()
>  with cpu_hotplug_lock held.
>
> Reported-by: Aneesh Kumar K.V 
> Signed-off-by: Gautham R. Shenoy 
> ---
>
> Rebased this one against powerpc/next instead of linux/master.
>
>  arch/powerpc/mm/book3s64/hash_utils.c | 9 -
>  arch/powerpc/platforms/pseries/lpar.c | 8 ++--
>  2 files changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
> b/arch/powerpc/mm/book3s64/hash_utils.c
> index 919a861..d07fcafd 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -38,6 +38,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -1928,10 +1929,16 @@ static int hpt_order_get(void *data, u64 *val)
>  
>  static int hpt_order_set(void *data, u64 val)
>  {
> + int ret;
> +
>   if (!mmu_hash_ops.resize_hpt)
>   return -ENODEV;
>  
> - return mmu_hash_ops.resize_hpt(val);
> + cpus_read_lock();
> + ret = mmu_hash_ops.resize_hpt(val);
> + cpus_read_unlock();
> +
> + return ret;
>  }
>  
>  DEFINE_DEBUGFS_ATTRIBUTE(fops_hpt_order, hpt_order_get, hpt_order_set, 
> "%llu\n");
> diff --git a/arch/powerpc/platforms/pseries/lpar.c 
> b/arch/powerpc/platforms/pseries/lpar.c
> index 1034ef1..2fc9756 100644
> --- a/arch/powerpc/platforms/pseries/lpar.c
> +++ b/arch/powerpc/platforms/pseries/lpar.c
> @@ -859,7 +859,10 @@ static int pseries_lpar_resize_hpt_commit(void *data)
>   return 0;
>  }
>  
> -/* Must be called in user context */
> +/*
> + * Must be called in user context. The caller should hold the
> + * cpus_lock.
> + */
>  static int pseries_lpar_resize_hpt(unsigned long shift)
>  {
>   struct hpt_resize_state state = {
> @@ -913,7 +916,8 @@ static int pseries_lpar_resize_hpt(unsigned long shift)
>  
>   t1 = ktime_get();
>  
> - rc = stop_machine(pseries_lpar_resize_hpt_commit, &state, NULL);
> + rc = stop_machine_cpuslocked(pseries_lpar_resize_hpt_commit,
> +  &state, NULL);
>  
>   t2 = ktime_get();
>  
> -- 
> 1.9.4

Re: [RFC KVM 06/27] KVM: x86: Exit KVM isolation on IRQ entry

2019-05-14 Thread Peter Zijlstra

On Mon, May 13, 2019 at 11:13:34AM -0700, Andy Lutomirski wrote:
> On Mon, May 13, 2019 at 9:28 AM Alexandre Chartre
>  wrote:

> > Actually, I am not sure this is effectively useful because the IRQ
> > handler is probably faulting before it tries to exit isolation, so
> > the isolation exit will be done by the kvm page fault handler. I need
> > to check that.
> >
> 
> The whole idea of having #PF exit with a different CR3 than was loaded
> on entry seems questionable to me.  I'd be a lot more comfortable with
> the whole idea if a page fault due to accessing the wrong data was an
> OOPS and the code instead just did the right thing directly.

So I've ran into this idea before; it basically allows a lazy approach
to things.

I'm somewhat conflicted on things, on the one hand, changing CR3 from
#PF is a natural extention in that #PF already changes page-tables (for
userspace / vmalloc etc..), on the other hand, there's a thin line
between being lazy and being sloppy.

If we're going down this route; I think we need a very coherent design
and strong rules.

Re: [PATCH] tty: serial: uartlite: avoid null pointer dereference during rmmod

2019-05-14 Thread Johan Hovold

On Tue, May 14, 2019 at 11:32:19AM +0800, Kefeng Wang wrote:
> After commit 415b43bdb008 "tty: serial: uartlite: Move uart register to
> probe", calling uart_unregister_driver unconditionally will trigger a
> null pointer dereference due to ulite_uart_driver may not registed.
> 
>   CPU: 1 PID: 3755 Comm: syz-executor.0 Not tainted 5.1.0+ #28
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 
> 04/01/2014
>   Call Trace:
>__dump_stack lib/dump_stack.c:77 [inline]
>dump_stack+0xa9/0x10e lib/dump_stack.c:113
>__kasan_report+0x171/0x18d mm/kasan/report.c:321
>kasan_report+0xe/0x20 mm/kasan/common.c:614
>tty_unregister_driver+0x19/0x100 drivers/tty/tty_io.c:3383
>uart_unregister_driver+0x30/0xc0 drivers/tty/serial/serial_core.c:2579
>__do_sys_delete_module kernel/module.c:1027 [inline]
>__se_sys_delete_module kernel/module.c:970 [inline]
>__x64_sys_delete_module+0x244/0x330 kernel/module.c:970
>do_syscall_64+0x72/0x2a0 arch/x86/entry/common.c:298
>entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> Call uart_unregister_driver only if ulite_uart_driver.state not null to
> fix it.
> 
> Cc: Peter Korsgaard 
> Cc: Shubhrajyoti Datta 
> Cc: Greg Kroah-Hartman 
> Reported-by: Hulk Robot 
> Fixes: 415b43bdb008 ("tty: serial: uartlite: Move uart register to probe")
> Signed-off-by: Kefeng Wang 
> ---
>  drivers/tty/serial/uartlite.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/tty/serial/uartlite.c b/drivers/tty/serial/uartlite.c
> index b8b912b5a8b9..06e79c11141d 100644
> --- a/drivers/tty/serial/uartlite.c
> +++ b/drivers/tty/serial/uartlite.c
> @@ -897,7 +897,8 @@ static int __init ulite_init(void)
>  static void __exit ulite_exit(void)
>  {
>   platform_driver_unregister(&ulite_platform_driver);
> - uart_unregister_driver(&ulite_uart_driver);
> + if (ulite_uart_driver.state)
> + uart_unregister_driver(&ulite_uart_driver);
>  }
>  
>  module_init(ulite_init);

This looks like you're just papering over the real issue, which is the
crazy idea of ultimately registering one driver per port:


https://lkml.kernel.org/r/1539685088-13465-1-git-send-email-shubhrajyoti.da...@gmail.com

It appears only the preparatory patches from that series were applied,
and I think whoever is responsible should consider reverting those
instead.

If the statically allocated port state is that big of any issue, you
need to make serial core support dynamic allocation.

Johan

Re: [RFC KVM 18/27] kvm/isolation: function to copy page table entries for percpu buffer

2019-05-14 Thread Peter Zijlstra

On Mon, May 13, 2019 at 11:18:41AM -0700, Andy Lutomirski wrote:
> On Mon, May 13, 2019 at 7:39 AM Alexandre Chartre
>  wrote:
> >
> > pcpu_base_addr is already mapped to the KVM address space, but this
> > represents the first percpu chunk. To access a per-cpu buffer not
> > allocated in the first chunk, add a function which maps all cpu
> > buffers corresponding to that per-cpu buffer.
> >
> > Also add function to clear page table entries for a percpu buffer.
> >
> 
> This needs some kind of clarification so that readers can tell whether
> you're trying to map all percpu memory or just map a specific
> variable.  In either case, you're making a dubious assumption that
> percpu memory contains no secrets.

I'm thinking the per-cpu random pool is a secrit. IOW, it demonstrably
does contain secrits, invalidating that premise.

[tip:perf/urgent] perf/x86/intel: Allow PEBS multi-entry in watermark mode

2019-05-14 Thread tip-bot for Stephane Eranian

Commit-ID:  c7a286577d7592720c2f179aadfb325a1ff48c95
Gitweb: https://git.kernel.org/tip/c7a286577d7592720c2f179aadfb325a1ff48c95
Author: Stephane Eranian 
AuthorDate: Mon, 13 May 2019 17:34:00 -0700
Committer:  Ingo Molnar 
CommitDate: Tue, 14 May 2019 09:07:58 +0200

perf/x86/intel: Allow PEBS multi-entry in watermark mode

This patch fixes a restriction/bug introduced by:

   583feb08e7f7 ("perf/x86/intel: Fix handling of wakeup_events for multi-entry 
PEBS")

The original patch prevented using multi-entry PEBS when wakeup_events != 0.
However given that wakeup_events is part of a union with wakeup_watermark, it
means that in watermark mode, PEBS multi-entry is also disabled which is not the
intent. This patch fixes this by checking is watermark mode is enabled.

Signed-off-by: Stephane Eranian 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: jo...@redhat.com
Cc: kan.li...@intel.com
Cc: vincent.wea...@maine.edu
Fixes: 583feb08e7f7 ("perf/x86/intel: Fix handling of wakeup_events for 
multi-entry PEBS")
Link: http://lkml.kernel.org/r/20190514003400.224340-1-eran...@google.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/events/intel/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index ef763f535e3a..12ec402f4114 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3265,7 +3265,7 @@ static int intel_pmu_hw_config(struct perf_event *event)
return ret;
 
if (event->attr.precise_ip) {
-   if (!(event->attr.freq || event->attr.wakeup_events)) {
+   if (!(event->attr.freq || (event->attr.wakeup_events && 
!event->attr.watermark))) {
event->hw.flags |= PERF_X86_EVENT_AUTO_RELOAD;
if (!(event->attr.sample_type &
  ~intel_pmu_large_pebs_flags(event)))

Re: [v2 PATCH] mm: mmu_gather: remove __tlb_reset_range() for force flush

2019-05-14 Thread Jan Stancek



- Original Message -
> 
> 
> On May 13, 2019 4:01 PM, Yang Shi  wrote:
> 
> 
> On 5/13/19 9:38 AM, Will Deacon wrote:
> > On Fri, May 10, 2019 at 07:26:54AM +0800, Yang Shi wrote:
> >> diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
> >> index 99740e1..469492d 100644
> >> --- a/mm/mmu_gather.c
> >> +++ b/mm/mmu_gather.c
> >> @@ -245,14 +245,39 @@ void tlb_finish_mmu(struct mmu_gather *tlb,
> >>   {
> >>   /*
> >>* If there are parallel threads are doing PTE changes on same range
> >> - * under non-exclusive lock(e.g., mmap_sem read-side) but defer TLB
> >> - * flush by batching, a thread has stable TLB entry can fail to flush
> >> - * the TLB by observing pte_none|!pte_dirty, for example so flush TLB
> >> - * forcefully if we detect parallel PTE batching threads.
> >> + * under non-exclusive lock (e.g., mmap_sem read-side) but defer TLB
> >> + * flush by batching, one thread may end up seeing inconsistent PTEs
> >> + * and result in having stale TLB entries.  So flush TLB forcefully
> >> + * if we detect parallel PTE batching threads.
> >> + *
> >> + * However, some syscalls, e.g. munmap(), may free page tables, this
> >> + * needs force flush everything in the given range. Otherwise this
> >> + * may result in having stale TLB entries for some architectures,
> >> + * e.g. aarch64, that could specify flush what level TLB.
> >>*/
> >> -if (mm_tlb_flush_nested(tlb->mm)) {
> >> -__tlb_reset_range(tlb);
> >> -__tlb_adjust_range(tlb, start, end - start);
> >> +if (mm_tlb_flush_nested(tlb->mm) && !tlb->fullmm) {
> >> +/*
> >> + * Since we can't tell what we actually should have
> >> + * flushed, flush everything in the given range.
> >> + */
> >> +tlb->freed_tables = 1;
> >> +tlb->cleared_ptes = 1;
> >> +tlb->cleared_pmds = 1;
> >> +tlb->cleared_puds = 1;
> >> +tlb->cleared_p4ds = 1;
> >> +
> >> +/*
> >> + * Some architectures, e.g. ARM, that have range invalidation
> >> + * and care about VM_EXEC for I-Cache invalidation, need
> >> force
> >> + * vma_exec set.
> >> + */
> >> +tlb->vma_exec = 1;
> >> +
> >> +/* Force vma_huge clear to guarantee safer flush */
> >> +tlb->vma_huge = 0;
> >> +
> >> +tlb->start = start;
> >> +tlb->end = end;
> >>   }
> > Whilst I think this is correct, it would be interesting to see whether
> > or not it's actually faster than just nuking the whole mm, as I mentioned
> > before.
> >
> > At least in terms of getting a short-term fix, I'd prefer the diff below
> > if it's not measurably worse.
> 
> I did a quick test with ebizzy (96 threads with 5 iterations) on my x86
> VM, it shows slightly slowdown on records/s but much more sys time spent
> with fullmm flush, the below is the data.
> 
>  nofullmm fullmm
> ops (records/s)  225606  225119
> sys (s)0.691.14
> 
> It looks the slight reduction of records/s is caused by the increase of
> sys time.
> 
> >
> > Will
> >
> > --->8
> >
> > diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
> > index 99740e1dd273..cc251422d307 100644
> > --- a/mm/mmu_gather.c
> > +++ b/mm/mmu_gather.c
> > @@ -251,8 +251,9 @@ void tlb_finish_mmu(struct mmu_gather *tlb,
> > * forcefully if we detect parallel PTE batching threads.
> > */
> >if (mm_tlb_flush_nested(tlb->mm)) {
> > + tlb->fullmm = 1;
> >__tlb_reset_range(tlb);
> > - __tlb_adjust_range(tlb, start, end - start);
> > + tlb->freed_tables = 1;
> >}
> >
> >tlb_flush_mmu(tlb);
> 
> 
> I think that this should have set need_flush_all and not fullmm.
> 

Wouldn't that skip the flush?

If fulmm == 0, then __tlb_reset_range() sets tlb->end = 0.
  tlb_flush_mmu
tlb_flush_mmu_tlbonly
  if (!tlb->end)
 return

Replacing fullmm with need_flush_all, brings the problem back / reproducer 
hangs.

Re: [PATCH v3 3/3] thermal: cpu_cooling: Migrate to using the EM framework

2019-05-14 Thread Quentin Perret

Hi Eduardo,

On Monday 13 May 2019 at 20:40:59 (-0700), Eduardo Valentin wrote:
> On Fri, May 03, 2019 at 10:44:09AM +0100, Quentin Perret wrote:
> > The newly introduced Energy Model framework manages power cost tables in
> > a generic way. Moreover, it supports a several types of models since the
> > tables can come from DT or firmware (through SCMI) for example. On the
> > other hand, the cpu_cooling subsystem manages its own power cost tables
> > using only DT data.
> > 
> > In order to avoid the duplication of data in the kernel, and in order to
> > enable IPA with EMs coming from more than just DT, remove the private
> > tables from cpu_cooling.c and migrate it to using the centralized EM
> > framework.
> > 
> > The case where the thermal subsystem is used without an Energy Model
> > (cpufreq_cooling_ops) is handled by looking directly at CPUFreq's
> > frequency table which is already a dependency for cpu_cooling.c anyway.
> > Since the thermal framework expects the cooling states in a particular
> > order, bail out whenever the CPUFreq table is unsorted, since that is
> > fairly uncommon in general, and there are currently no users of
> > cpu_cooling for this use-case.
> 
> Will this break DT in any way? After this change, are the existing DTs
> still compatible with this cpu cooling?

Yes, all existing DTs stay compatible with this CPU cooling. The EM can
still be built using the 'dynamic-power-coefficient' DT property thanks
to the recently introduced dev_pm_opp_of_register_em() helper, see
a4f342b9607d ("PM / OPP: Introduce a power estimation helper"). And all
relevant cpufreq drivers have already been updated to use that function.

So, this patch should cause no functional change for all existing users.
It's really just plumbing. I can probably explain that better in this
commit message rather than the cover letter if you feel it is necessary.

Thanks,
Quentin

Re: [RFC PATCH] ARM: mach-shmobile: Parse DT to get ARCH timer memory region

2019-05-14 Thread Geert Uytterhoeven

Hi Oleksandr,

On Mon, May 13, 2019 at 6:00 PM Oleksandr  wrote:
> On 13.05.19 18:13, Geert Uytterhoeven wrote:
> >> So, if the DT bindings for the counter module is not an option (if I
> >> correctly understood a discussion pointed by Geert in another letter),
> >> we should probably prevent all timer code here from being executed if
> >> PSCI is in use.
> >> What I mean is to return to [2], but with the modification to use
> >> psci_smp_available() helper as an indicator of PSCI usage.
> >>
> >> Julien, Geert, what do you think?
> > Yes, that sounds good to me.
> >
> > Note that psci_smp_available() seems to return false if CONFIG_SMP=n,
> > so checking for that is not sufficient to avoid crashes when running a
> > uniprocessor kernel on a PSCI-enabled system.
>
> Indeed, you are right.
>
>
> Nothing than just check for psci_ops.cpu_on == NULL directly comes to
> mind...
>
> Have already checked with CONFIG_SMP=n, it works.
>
> Sounds ok?

Fine for me, thanks!

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [v2 PATCH] mm: mmu_gather: remove __tlb_reset_range() for force flush

2019-05-14 Thread Nadav Amit

> On May 14, 2019, at 12:15 AM, Jan Stancek  wrote:
> 
> 
> - Original Message -
>> On May 13, 2019 4:01 PM, Yang Shi  wrote:
>> 
>> 
>> On 5/13/19 9:38 AM, Will Deacon wrote:
>>> On Fri, May 10, 2019 at 07:26:54AM +0800, Yang Shi wrote:
 diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
 index 99740e1..469492d 100644
 --- a/mm/mmu_gather.c
 +++ b/mm/mmu_gather.c
 @@ -245,14 +245,39 @@ void tlb_finish_mmu(struct mmu_gather *tlb,
  {
  /*
   * If there are parallel threads are doing PTE changes on same range
 - * under non-exclusive lock(e.g., mmap_sem read-side) but defer TLB
 - * flush by batching, a thread has stable TLB entry can fail to flush
 - * the TLB by observing pte_none|!pte_dirty, for example so flush TLB
 - * forcefully if we detect parallel PTE batching threads.
 + * under non-exclusive lock (e.g., mmap_sem read-side) but defer TLB
 + * flush by batching, one thread may end up seeing inconsistent PTEs
 + * and result in having stale TLB entries.  So flush TLB forcefully
 + * if we detect parallel PTE batching threads.
 + *
 + * However, some syscalls, e.g. munmap(), may free page tables, this
 + * needs force flush everything in the given range. Otherwise this
 + * may result in having stale TLB entries for some architectures,
 + * e.g. aarch64, that could specify flush what level TLB.
   */
 -if (mm_tlb_flush_nested(tlb->mm)) {
 -__tlb_reset_range(tlb);
 -__tlb_adjust_range(tlb, start, end - start);
 +if (mm_tlb_flush_nested(tlb->mm) && !tlb->fullmm) {
 +/*
 + * Since we can't tell what we actually should have
 + * flushed, flush everything in the given range.
 + */
 +tlb->freed_tables = 1;
 +tlb->cleared_ptes = 1;
 +tlb->cleared_pmds = 1;
 +tlb->cleared_puds = 1;
 +tlb->cleared_p4ds = 1;
 +
 +/*
 + * Some architectures, e.g. ARM, that have range invalidation
 + * and care about VM_EXEC for I-Cache invalidation, need
 force
 + * vma_exec set.
 + */
 +tlb->vma_exec = 1;
 +
 +/* Force vma_huge clear to guarantee safer flush */
 +tlb->vma_huge = 0;
 +
 +tlb->start = start;
 +tlb->end = end;
  }
>>> Whilst I think this is correct, it would be interesting to see whether
>>> or not it's actually faster than just nuking the whole mm, as I mentioned
>>> before.
>>> 
>>> At least in terms of getting a short-term fix, I'd prefer the diff below
>>> if it's not measurably worse.
>> 
>> I did a quick test with ebizzy (96 threads with 5 iterations) on my x86
>> VM, it shows slightly slowdown on records/s but much more sys time spent
>> with fullmm flush, the below is the data.
>> 
>> nofullmm fullmm
>> ops (records/s)  225606  225119
>> sys (s)0.691.14
>> 
>> It looks the slight reduction of records/s is caused by the increase of
>> sys time.
>> 
>>> Will
>>> 
>>> --->8
>>> 
>>> diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
>>> index 99740e1dd273..cc251422d307 100644
>>> --- a/mm/mmu_gather.c
>>> +++ b/mm/mmu_gather.c
>>> @@ -251,8 +251,9 @@ void tlb_finish_mmu(struct mmu_gather *tlb,
>>>* forcefully if we detect parallel PTE batching threads.
>>>*/
>>>   if (mm_tlb_flush_nested(tlb->mm)) {
>>> + tlb->fullmm = 1;
>>>   __tlb_reset_range(tlb);
>>> - __tlb_adjust_range(tlb, start, end - start);
>>> + tlb->freed_tables = 1;
>>>   }
>>> 
>>>   tlb_flush_mmu(tlb);
>> 
>> 
>> I think that this should have set need_flush_all and not fullmm.
> 
> Wouldn't that skip the flush?
> 
> If fulmm == 0, then __tlb_reset_range() sets tlb->end = 0.
>  tlb_flush_mmu
>tlb_flush_mmu_tlbonly
>  if (!tlb->end)
> return
> 
> Replacing fullmm with need_flush_all, brings the problem back / reproducer 
> hangs.

Maybe setting need_flush_all does not have the right effect, but setting
fullmm and then calling __tlb_reset_range() when the PTEs were already
zapped seems strange.

fullmm is described as:

/*
 * we are in the middle of an operation to clear
 * a full mm and can make some optimizations
 */

And this not the case.

Re: [RFC KVM 24/27] kvm/isolation: KVM page fault handler

2019-05-14 Thread Peter Zijlstra

On Mon, May 13, 2019 at 07:02:30PM -0700, Andy Lutomirski wrote:

> This sounds like a great use case for static_call().  PeterZ, do you
> suppose we could wire up static_call() with the module infrastructure
> to make it easy to do "static_call to such-and-such GPL module symbol
> if that symbol is in a loaded module, else nop"?

You're basically asking it to do dynamic linking. And I suppose that is
technically possible.

However, I'm really starting to think kvm (or at least these parts of it
that want to play these games) had better not be a module anymore.

Re: [PATCH] serial: 8250: Add support for using platform_device resources

2019-05-14 Thread Esben Haabendal

Andy Shevchenko  writes:

> On Tue, May 07, 2019 at 02:22:18PM +0200, Esben Haabendal wrote:
>> Andy Shevchenko  writes:
>> > On Tue, May 07, 2019 at 01:35:58PM +0200, Esben Haabendal wrote:
>> >> Lee Jones  writes:
>> >> > On Thu, 02 May 2019, Esben Haabendal wrote:
>> >> >
>> >> >> Could you help clarify whether or not this patch is trying to do
>> >> >> something odd/wrong?
>> >> >> 
>> >> >> I might be misunderstanding Andy (probably is), but the discussion
>> >> >> revolves around the changes I propose where I change the serial8250
>> >> >> driver to use platform_get_resource() in favour of
>> >> >> request_mem_region()/release_mem_region().
>> >> >
>> >> > Since 'serial8250' is registered as a platform device, I don't see any
>> >> > reason why it shouldn't have the capability to obtain its memory
>> >> > regions from the platform_get_*() helpers.
>> >> 
>> >> Good to hear.  That is exactly what I am trying do with this patch.
>> >> 
>> >> @Andy: If you still don't like my approach, could you please advice an
>> >> acceptable method for improving the serial8250 driver to allow the use
>> >> of platform_get_*() helpers?
>> >
>> > I still don't get why you need this.
>> 
>> Because platform_get_resource() is a generally available and useful
>> helper function for working with platform_device resources, that the
>> current standard serial8250 driver does not support.
>> 
>> I am uncertain if I still haven't convinced you that current serial8250
>> driver does not work with platform_get_resource(), or if you believe
>> that it really should not support it.
>
> I believe there is no need to do this support.
>
> Most of the platform code that uses it is quite legacy,

So all code that use/support platform_get_resource() is legacy code?

commit 7945f929f1a77a1c8887a97ca07f87626858ff42
Author: Bartosz Golaszewski 
Date:   Wed Feb 20 11:12:39 2019 +

drivers: provide devm_platform_ioremap_resource()

There are currently 1200+ instances of using platform_get_resource()
and devm_ioremap_resource() together in the kernel tree.

This patch wraps these two calls in a single helper. Thanks to that
we don't have to declare a local variable for struct resource * and can
omit the redundant argument for resource type. We also have one
function call less.

Signed-off-by: Bartosz Golaszewski 
Acked-by: Greg Kroah-Hartman 
Reviewed-by: Andy Shevchenko 
Signed-off-by: Linus Walleij 

It does not looks quite dead to me.

> and all under arch/
> ideally should be converted to use Device Tree.

When do you expect arch/x86 to be converted to device tree?

>> > If it's MFD, you may use "serial8250" with a given platform data like
>> > dozens of current users do.
>> 
>> There is only one in-tree mfd driver using "serial8250", the sm501.c
>> driver.  And that driver predates the mfd framework (mfd-core.c) by a
>> year, and does not use any of the mfd-core functionality.
>
> So, does it have an issue?

I don't have hardware so I can test it, but I assume that it is
working.

It is ignoring framework code (mfd-core), that is implemented to await
re-inventing the wheel for each and every mfd driver.  If that is an
issue, then yes, sm501.c does have an issue and could be improved/fixed.

>> I want to use the mfd-core provided handling of resource splitting,
>> because it makes it easier to handle splitting of a single memory
>> resource as defined by a PCI BAR in this case.  And the other drivers I
>> need to use all support/use platform_get_resource(), so it would even
>> have an impact on the integration of that if I cannot use mfd resource
>> splitting with serial8250.
>
> I tired to repeat, that is OKAY! You *may* split and supply resources to the
> drivers, nothing prevents you to do that with current code base.
>
> Do you see any problem with that? What is that problem?
>
> If you would like utilize serial8250, just provide a platform data for
> it.

I fear we are coming to an end here.

I don't seem to be able to break through to you, to get you to
understand the issue here.

I want to write a simple and elegant mfd driver, using mfd-core
framework (the mfd_add_devices() function call to be specific).  I don't
want to reimplement similar functionality in the mfd driver.

The other drivers I need all work fine with this, but serial8250 does
not.

As I understand Lee Jones, he seem to agree with me, so could you
please, please consider that I might not be totally on crack, and might
actually have brough forward a valid proposition.

>> > Another approach is to use 8250 library, thus, creating a specific
>> > glue driver (like all 8250_* do).
>> 
>> As mentioned, I think this is a bad approach, and I would prefer to
>> improve the "serial8250" driver instead.  But if you insist, what should
>> I call such a driver?  It needs a platform_driver name, for use when
>> matching with platform_device devices.  And it would support exactly the
>> same hardware as the current "se

Re: [PATCH -tip v8 3/6] tracing/probe: Add ustring type for user-space string

2019-05-14 Thread Ingo Molnar

* Masami Hiramatsu  wrote:

> +/* Return the length of string -- including null terminal byte */
> +static nokprobe_inline int
> +fetch_store_strlen_user(unsigned long addr)
> +{
> + return strnlen_unsafe_user((__force const void __user *)addr,
> +MAX_STRING_SIZE);

Pointless line break that doesn't improve readability.

> +/*
> + * Fetch a null-terminated string from user. Caller MUST set *(u32 *)buf
> + * with max length and relative data location.
> + */
> +static nokprobe_inline int
> +fetch_store_string_user(unsigned long addr, void *dest, void *base)
> +{
> + const void __user *uaddr =  (__force const void __user *)addr;
> + int maxlen = get_loc_len(*(u32 *)dest);
> + u8 *dst = get_loc_data(dest, base);
> + long ret;
> +
> + if (unlikely(!maxlen))
> + return -ENOMEM;
> + ret = strncpy_from_unsafe_user(dst, uaddr, maxlen);
> +
> + if (ret >= 0)
> + *(u32 *)dest = make_data_loc(ret, (void *)dst - base);
> +
>   return ret;

Firstly, why is there a 'dest' and a 'dst' variable name as well - the 
two are very similar and the difference not explained at all.

Secondly, a style nit: if you group statements then please group 
statements based on the usual logic - which is the group them by the flow 
of logic. In the above case you grouped the 'maxlen' check with the 
strncpy_from_unsafe_user() call, while the grouping should be the other 
way around:

if (unlikely(!maxlen))
return -ENOMEM;

ret = strncpy_from_unsafe_user(dst, uaddr, maxlen);
if (ret >= 0)
*(u32 *)dest = make_data_loc(ret, (void *)dst - base);

return ret;

Third, hiding the get_loc_data() call within variable initialization is 
bad style - we usually only put 'trivial' (constant) initializations 
there.

Fourth, 'dst' is independent of 'maxlen', so it should probably 
calculated *after* maxlen.

I.e. the whole sequence should be:

maxlen = get_loc_len(*(u32 *)dest);
if (unlikely(!maxlen))
return -ENOMEM;

dst = get_loc_data(dest, base);

ret = strncpy_from_unsafe_user(dst, uaddr, maxlen);
if (ret >= 0)
*(u32 *)dest = make_data_loc(ret, (void *)dst - base);

return ret;

Fifth, we don't actually dereference 'dst', do we? So the whole type 
casting to 'void *' could be avoided by declaring 'dst' (or whatever its 
new, clearer name is) not as u8 *, but as void *.

I.e. these are five problems in a short sequence of code, which it sad to 
see in a v8 submission. :-/

Please review the other patches and the whole code base for similar 
mishaps and small details as well.

Thanks,

Ingo

[PATCH] EDAC, mc: Fix edac_mc_find() in case no device is found

2019-05-14 Thread Robert Richter

The function should return NULL in case no device is found, but it
always returns the last checked mc device from the list even if the
index did not match. This patch fixes this.

I did some analysis why this did not raise any issues for about 3
years and the reason is that edac_mc_find() is mostly used to search
for existing devices. Thus, the bug is not triggered.

Fixes: c73e8833bec5 ("EDAC, mc: Fix locking around mc_devices list")
Signed-off-by: Robert Richter 
---
 drivers/edac/edac_mc.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 13594ffadcb3..aeeaaf30b38a 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -688,10 +688,9 @@ struct mem_ctl_info *edac_mc_find(int idx)
mci = list_entry(item, struct mem_ctl_info, link);
 
if (mci->mc_idx >= idx) {
-   if (mci->mc_idx == idx) {
-   goto unlock;
-   }
-   break;
+   if (mci->mc_idx != idx)
+   mci = NULL;
+   goto unlock;
}
}
 
-- 
2.20.1

Re: [RFC KVM 00/27] KVM Address Space Isolation

2019-05-14 Thread Peter Zijlstra

(please, wrap our emails at 78 chars)

On Tue, May 14, 2019 at 12:08:23AM +0300, Liran Alon wrote:

> 3) From (2), we should have theoretically deduced that for every
> #VMExit, there is a need to kick the sibling hyperthread also outside
> of guest until the #VMExit is completed.

That's not in fact quite true; all you have to do is send the IPI.
Having one sibling IPI the other sibling carries enough guarantees that
the receiving sibling will not execute any further guest instructions.

That is, you don't have to wait on the VMExit to complete; you can just
IPI and get on with things. Now, this is still expensive, But it is
heaps better than doing a full sync up between siblings.

Re: [PATCH] tty: serial_core: Fix the incorrect configuration of baud rate and data length at the console serial port resume

2019-05-14 Thread Johan Hovold

On Thu, May 09, 2019 at 01:42:39PM +0800, Lanqing Liu wrote:
> When userspace opens a serial port for console, uart_port_startup()
> is called. This function assigns the uport->cons->cflag value to
> TTY->termios.c_cflag, then it is cleared to 0. When the user space
> closes this serial port, the TTY structure will be released, and at
> this time uport->cons->cflag has also been cleared.
> 
> On the Spreadtrum platform, in some special scenarios, like charging mode,
> userspace needs to close the console, which means the uport->cons->cflag
> has also been cleared. But printing logs is still needed in the kernel. So
> when system enters suspend and resume, the console needs to be configure
> the baud rate and data length of the serial port according to its own cflag
> when resuming the console port. At this time, the cflag is 0, which will
> cause serial port to produce configuration errors that do not meet user
> expectations.

This is actually yet another regression due to 761ed4a94582 ("tty:
serial_core: convert uart_close to use tty_port_close") which
incidentally removed the call to uart_shutdown() where the cflag was
being saved precisely to avoid the problem you're describing:

ae84db9661ca ("serial: core: Preserve termios c_cflag for console 
resume")

Judging from a quick look it seems the xmit buf, which is released in
that function may now be leaking too.

> To fix this, assigning the TTY->termios.c_cflag value to uport->cons->cflag
> before the userspace closes this console serial port. It will ensure that
> the correct cflag value can be gotten when the console serial port was
> resumed.

Not sure this is the right fix, but I don't have time to look at this
right now.

Johan

Re: [PATCH] serial: 8250: Add support for using platform_device resources

2019-05-14 Thread Esben Haabendal

Andy Shevchenko  writes:

> On Tue, May 07, 2019 at 02:22:18PM +0200, Esben Haabendal wrote:
>> Andy Shevchenko  writes:
>> > On Tue, May 07, 2019 at 01:35:58PM +0200, Esben Haabendal wrote:
>> >> Lee Jones  writes:
>> >> > On Thu, 02 May 2019, Esben Haabendal wrote:
>> >> >
>> >> >> Could you help clarify whether or not this patch is trying to do
>> >> >> something odd/wrong?
>> >> >> 
>> >> >> I might be misunderstanding Andy (probably is), but the discussion
>> >> >> revolves around the changes I propose where I change the serial8250
>> >> >> driver to use platform_get_resource() in favour of
>> >> >> request_mem_region()/release_mem_region().
>> >> >
>> >> > Since 'serial8250' is registered as a platform device, I don't see any
>> >> > reason why it shouldn't have the capability to obtain its memory
>> >> > regions from the platform_get_*() helpers.
>> >> 
>> >> Good to hear.  That is exactly what I am trying do with this patch.
>> >> 
>> >> @Andy: If you still don't like my approach, could you please advice an
>> >> acceptable method for improving the serial8250 driver to allow the use
>> >> of platform_get_*() helpers?
>> >
>> > I still don't get why you need this.
>> 
>> Because platform_get_resource() is a generally available and useful
>> helper function for working with platform_device resources, that the
>> current standard serial8250 driver does not support.
>> 
>> I am uncertain if I still haven't convinced you that current serial8250
>> driver does not work with platform_get_resource(), or if you believe
>> that it really should not support it.
>
> I believe there is no need to do this support.
>
> Most of the platform code that uses it is quite legacy, and all under arch/
> ideally should be converted to use Device Tree.

Please take a look at https://lkml.org/lkml/2019/4/9/576
("[PATCH v2 2/4] mfd: ioc3: Add driver for SGI IOC3 chip")

This is basically what I am trying to do.  I am just so unfortunate that
the serial devices I have are completely generic, so it does not make
sense for me to create a specialized 8250 driver.

Look at how the serial8250_ioc3_driver uses platform_get_resource() to
get the register memory, and how that works together with
mfd_add_devices() in the mfd driver.  Nice and elegant.  Standard
recommended approach for an mfd driver.

/Esben

Re: [RFC KVM 00/27] KVM Address Space Isolation

2019-05-14 Thread Peter Zijlstra

On Mon, May 13, 2019 at 07:07:36PM -0700, Andy Lutomirski wrote:
> On Mon, May 13, 2019 at 2:09 PM Liran Alon  wrote:

> > The hope is that the very vast majority of #VMExit handlers will be
> > able to completely run without requiring to switch to full address
> > space. Therefore, avoiding the performance hit of (2).

> > However, for the very few #VMExits that does require to run in full
> > kernel address space, we must first kick the sibling hyperthread
> > outside of guest and only then switch to full kernel address space
> > and only once all hyperthreads return to KVM address space, then
> > allow then to enter into guest.
> 
> What exactly does "kick" mean in this context?  It sounds like you're
> going to need to be able to kick sibling VMs from extremely atomic
> contexts like NMI and MCE.

Yeah, doing the full synchronous thing from NMI/MCE context sounds
exceedingly dodgy, howver..

Realistically they only need to send an IPI to the other sibling; they
don't need to wait for the VMExit to complete or anything else.

And that is something we can do from NMI context -- with a bit of care.
See also arch_irq_work_raise(); specifically we need to ensure we leave
the APIC in an idle state, such that if we interrupted an APIC sequence
it will not suddenly fail/violate the APIC write/state etc.

Re: [PATCH v1] mtd: rawnand: Add Macronix NAND read retry support

2019-05-14 Thread Thomas Petazzoni

Hello,

On Tue, 14 May 2019 09:53:16 +0800
masonccy...@mxic.com.tw wrote:

> > > ---
> > >  static void macronix_nand_onfi_init(struct nand_chip *chip)
> > >  {
> > >   struct nand_parameters *p = &chip->parameters;
> > >   struct nand_onfi_vendor_macronix *mxic = (void 
> > > *)p->onfi->vendor;  
> > 
> > Why cast to void*, instead of casting directly to struct
> > nand_onfi_vendor_macronix * ?  
> 
> Due to got a warning:
> 
>  warning: initialization from incompatible pointer type
>   struct nand_onfi_vendor_macronix *mxic = p->onfi->vendor;

You didn't look at my code, I suggested:

mxic = (struct nand_onfi_vendor_macronix *) p->info->vendor;

I.e, you indeed still need a cast, because p->info->vendor is a u8[].
But instead of casting to void*, and then implicitly casting to struct
nand_onfi_vendor_macronix *, I suggest to cast directly to struct
nand_onfi_vendor_macronix *.

> > >   if (!p->onfi ||
> > >   ((mxic->reliability_func & MACRONIX_READ_RETRY_BIT) ==   
> 0))
> > >   return;  
> > 
> > So, the code should be:
> > 
> >struct nand_onfi_vendor_macronix *mxic;
> > 
> >if (!p->onfi)
> >   return;
> > 
> >mxic = (struct nand_onfi_vendor_macronix *) p->info->vendor;
> > 
> >if ((mxic->reliability_func & MACRONIX_READ_RETRY_BIT) == 0)
> >   return;  
> 
> Also got a warning:
> 
> warning: ISO C90 forbids mixed declarations and code 
> [-Wdeclaration-after-statement]

No, you don't get this warning if you use my code. You get this warning
if you declare and initialized the "mxic" variable at the same location.

>  static void macronix_nand_onfi_init(struct nand_chip *chip)
>  {
>  struct nand_parameters *p = &chip->parameters;
>  struct nand_onfi_vendor_macronix *mxic = (void *)p->onfi->vendor;

You are dereferencing p->info...

> 
>  if (!p->onfi)
>  return;

... before you check it is NULL. This is wrong.

Please check again the code I sent in my previous e-mail:

struct nand_onfi_vendor_macronix *mxic;
 
if (!p->onfi)
   return;
 
mxic = (struct nand_onfi_vendor_macronix *) p->info->vendor;
 
if ((mxic->reliability_func & MACRONIX_READ_RETRY_BIT) == 0)
   return;  

Best regards,

Thomas Petazzoni
-- 
Thomas Petazzoni, CTO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

[GIT PULL] fuse update for 5.2

2019-05-14 Thread Miklos Szeredi

Hi Linus,

Please pull from:

  git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git 
tags/fuse-update-5.2

Add more caching controls for userspace filesystems to use, as well as bug
fixes and cleanups.

Thanks,
Miklos

---
Alan Somers (3):
  fuse: document fuse_fsync_in.fsync_flags
  fuse: fix changelog entry for protocol 7.12
  fuse: fix changelog entry for protocol 7.9

David Howells (1):
  fuse: Convert fusectl to use the new mount API

Ian Abbott (1):
  fuse: Add ioctl flag for x32 compat ioctl

Kirill Smelkov (5):
  fuse: convert printk -> pr_*
  fuse: allow filesystems to have precise control over data cache
  fuse: retrieve: cap requested size to negotiated max_write
  fuse: require /dev/fuse reads to have enough buffer capacity
  fuse: Add FOPEN_STREAM to use stream_open()

Liu Bo (1):
  fuse: honor RLIMIT_FSIZE in fuse_file_fallocate

Miklos Szeredi (1):
  fuse: fix writepages on 32bit

zhangliguang (1):
  fuse: clean up fuse_alloc_inode

---
 fs/fuse/control.c | 20 +++-
 fs/fuse/cuse.c| 13 +++--
 fs/fuse/dev.c | 16 +---
 fs/fuse/file.c| 22 ++
 fs/fuse/fuse_i.h  |  7 +++
 fs/fuse/inode.c   | 23 ---
 include/uapi/linux/fuse.h | 22 --
 7 files changed, 92 insertions(+), 31 deletions(-)

[GIT PULL] overlayfs update for 5.2

2019-05-14 Thread Miklos Szeredi

Hi Linus,

Please pull from:

  git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git 
tags/ovl-update-5.2

Just bug fixes in this small update.

Thanks,
Miklos

---
Amir Goldstein (4):
  ovl: fix missing upper fs freeze protection on copy up for ioctl
  ovl: support stacked SEEK_HOLE/SEEK_DATA
  ovl: do not generate duplicate fsnotify events for "fake" path
  ovl: relax WARN_ON() for overlapping layers use case

Jiufei Xue (1):
  ovl: check the capability before cred overridden

---
 fs/overlayfs/copy_up.c   |   6 +--
 fs/overlayfs/dir.c   |   2 +-
 fs/overlayfs/file.c  | 133 +--
 fs/overlayfs/inode.c |   3 +-
 fs/overlayfs/overlayfs.h |   2 +-
 5 files changed, 113 insertions(+), 33 deletions(-)

Re: [RFC KVM 00/27] KVM Address Space Isolation

2019-05-14 Thread Liran Alon

> On 14 May 2019, at 10:29, Peter Zijlstra  wrote:
> 
> 
> (please, wrap our emails at 78 chars)
> 
> On Tue, May 14, 2019 at 12:08:23AM +0300, Liran Alon wrote:
> 
>> 3) From (2), we should have theoretically deduced that for every
>> #VMExit, there is a need to kick the sibling hyperthread also outside
>> of guest until the #VMExit is completed.
> 
> That's not in fact quite true; all you have to do is send the IPI.
> Having one sibling IPI the other sibling carries enough guarantees that
> the receiving sibling will not execute any further guest instructions.
> 
> That is, you don't have to wait on the VMExit to complete; you can just
> IPI and get on with things. Now, this is still expensive, But it is
> heaps better than doing a full sync up between siblings.
> 

I agree.

I didn’t say you need to do full sync. You just need to IPI the sibling
hyperthreads before switching to the full kernel address space.
But you need to make sure these sibling hyperthreads don’t get back into
the guest until all hyperthreads are running with KVM isolated address space.

It is still very expensive if done for every #VMExit. Which as I explained,
can be avoided in case we use the KVM isolated address space technique.

-Liran

Re: [PATCH 2/2] serial: 8250: Add support for 8250/16550 as MFD function

2019-05-14 Thread Esben Haabendal

Lee Jones  writes:

> On Tue, 07 May 2019, Esben Haabendal wrote:
>
>> Lee Jones  writes:
>> 
>> > On Fri, 26 Apr 2019, Esben Haabendal wrote:
>> >
>> >> The serial8250-mfd driver is for adding 8250/16550 UART ports as functions
>> >> to an MFD driver.
>> >> 
>> >> When calling mfd_add_device(), platform_data should be a pointer to a
>> >> struct plat_serial8250_port, with proper settings like .flags, .type,
>> >> .iotype, .regshift and .uartclk.  Memory (or ioport) and IRQ should be
>> >> passed as cell resources.
>> >
>> > What?  No, please!
>> >
>> > If you *must* create a whole driver just to be able to use
>> > platform_*() helpers (which I don't think you should), then please
>> > call it something else.  This doesn't have anything to do with MFD.
>> 
>> True.
>> 
>> I really don't think it is a good idea to create a whole driver just to
>> be able to use platform_get_*() helpers.  And if I am forced to do this,
>> because I am unable to convince Andy to improve the standard serial8250
>> driver to support that, it should be called MFD.  The driver would be
>
> I assume you mean "shouldn't"?

Of-course.

>> generally usable for all usecases where platform_get_*() works.
>> 
>> I don't have any idea what to call such a driver.  It really would just
>> be a fork of the current serial8250 driver, just allowing use of
>> platform_get_*(), supporting exactly the same hardware.
>> 
>> I am still hoping that we can find a way to improve serial8250 to be
>> usable in these cases.
>
> Me too.

Unfortunately, I don't seem to be able to convince Andy to accept
something like that.

I might have to do this out-of-tree :(

/Esben

Re: [RFC KVM 06/27] KVM: x86: Exit KVM isolation on IRQ entry

2019-05-14 Thread Alexandre Chartre




On 5/14/19 9:07 AM, Peter Zijlstra wrote:

On Mon, May 13, 2019 at 11:13:34AM -0700, Andy Lutomirski wrote:

On Mon, May 13, 2019 at 9:28 AM Alexandre Chartre
 wrote:



Actually, I am not sure this is effectively useful because the IRQ
handler is probably faulting before it tries to exit isolation, so
the isolation exit will be done by the kvm page fault handler. I need
to check that.



The whole idea of having #PF exit with a different CR3 than was loaded
on entry seems questionable to me.  I'd be a lot more comfortable with
the whole idea if a page fault due to accessing the wrong data was an
OOPS and the code instead just did the right thing directly.


So I've ran into this idea before; it basically allows a lazy approach
to things.

I'm somewhat conflicted on things, on the one hand, changing CR3 from
#PF is a natural extention in that #PF already changes page-tables (for
userspace / vmalloc etc..), on the other hand, there's a thin line
between being lazy and being sloppy.

If we're going down this route; I think we need a very coherent design
and strong rules.



Right. We should particularly ensure that the KVM page-table remains a
subset of the kernel page-table, in particular page-table changes (e.g.
for vmalloc etc...) should happen in the kernel page-table and not in
the kvm page-table.

So we should probably enforce switching to the kernel page-table when
doing operation like vmalloc. The current code doesn't enforce it, but
I can see it faulting, when doing any allocation (because the kvm page
table doesn't have all structures used during an allocation).

alex.

Re: [PATCH v2 00/17] kunit: introduce KUnit, the Linux kernel unit testing framework

2019-05-14 Thread Brendan Higgins

On Sat, May 11, 2019 at 08:43:23AM +0200, Knut Omang wrote:
> On Fri, 2019-05-10 at 14:59 -0700, Frank Rowand wrote:
> > On 5/10/19 3:23 AM, Brendan Higgins wrote:
> > >> On Fri, May 10, 2019 at 7:49 AM Knut Omang  wrote:
> > >>>
> > >>> On Thu, 2019-05-09 at 22:18 -0700, Frank Rowand wrote:
> >  On 5/9/19 4:40 PM, Logan Gunthorpe wrote:
> > >
> > >
> > > On 2019-05-09 5:30 p.m., Theodore Ts'o wrote:
> > >> On Thu, May 09, 2019 at 04:20:05PM -0600, Logan Gunthorpe wrote:
> > >>>
> > >>> The second item, arguably, does have significant overlap with 
> > >>> kselftest.
> > >>> Whether you are running short tests in a light weight UML 
> > >>> environment or
> > >>> higher level tests in an heavier VM the two could be using the same
> > >>> framework for writing or defining in-kernel tests. It *may* also be 
> > >>> valuable
> > >>> for some people to be able to run all the UML tests in the heavy VM
> > >>> environment along side other higher level tests.
> > >>>
> > >>> Looking at the selftests tree in the repo, we already have similar 
> > >>> items to
> > >>> what Kunit is adding as I described in point (2) above. 
> > >>> kselftest_harness.h
> > >>> contains macros like EXPECT_* and ASSERT_* with very similar 
> > >>> intentions to
> > >>> the new KUNIT_EXECPT_* and KUNIT_ASSERT_* macros.
> > >>>
> > >>> However, the number of users of this harness appears to be quite 
> > >>> small. Most
> > >>> of the code in the selftests tree seems to be a random mismash of 
> > >>> scripts
> > >>> and userspace code so it's not hard to see it as something 
> > >>> completely
> > >>> different from the new Kunit:
> > >>>
> > >>> $ git grep --files-with-matches kselftest_harness.h *
> > >>
> > >> To the extent that we can unify how tests are written, I agree that
> > >> this would be a good thing.  However, you should note that
> > >> kselftest_harness.h is currently assums that it will be included in
> > >> userspace programs.  This is most obviously seen if you look closely
> > >> at the functions defined in the header files which makes calls to
> > >> fork(), abort() and fprintf().
> > >
> > > Ah, yes. I obviously did not dig deep enough. Using kunit for
> > > in-kernel tests and kselftest_harness for userspace tests seems like
> > > a sensible line to draw to me. Trying to unify kernel and userspace
> > > here sounds like it could be difficult so it's probably not worth
> > > forcing the issue unless someone wants to do some really fancy work
> > > to get it done.
> > >
> > > Based on some of the other commenters, I was under the impression
> > > that kselftests had in-kernel tests but I'm not sure where or if they
> > > exist.
> > 
> >  YES, kselftest has in-kernel tests.  (Excuse the shouting...)
> > 
> >  Here is a likely list of them in the kernel source tree:
> > 
> >  $ grep module_init lib/test_*.c
> >  lib/test_bitfield.c:module_init(test_bitfields)
> >  lib/test_bitmap.c:module_init(test_bitmap_init);
> >  lib/test_bpf.c:module_init(test_bpf_init);
> >  lib/test_debug_virtual.c:module_init(test_debug_virtual_init);
> >  lib/test_firmware.c:module_init(test_firmware_init);
> >  lib/test_hash.c:module_init(test_hash_init);  /* Does everything */
> >  lib/test_hexdump.c:module_init(test_hexdump_init);
> >  lib/test_ida.c:module_init(ida_checks);
> >  lib/test_kasan.c:module_init(kmalloc_tests_init);
> >  lib/test_list_sort.c:module_init(list_sort_test);
> >  lib/test_memcat_p.c:module_init(test_memcat_p_init);
> >  lib/test_module.c:static int __init test_module_init(void)
> >  lib/test_module.c:module_init(test_module_init);
> >  lib/test_objagg.c:module_init(test_objagg_init);
> >  lib/test_overflow.c:static int __init test_module_init(void)
> >  lib/test_overflow.c:module_init(test_module_init);
> >  lib/test_parman.c:module_init(test_parman_init);
> >  lib/test_printf.c:module_init(test_printf_init);
> >  lib/test_rhashtable.c:module_init(test_rht_init);
> >  lib/test_siphash.c:module_init(siphash_test_init);
> >  lib/test_sort.c:module_init(test_sort_init);
> >  lib/test_stackinit.c:module_init(test_stackinit_init);
> >  lib/test_static_key_base.c:module_init(test_static_key_base_init);
> >  lib/test_static_keys.c:module_init(test_static_key_init);
> >  lib/test_string.c:module_init(string_selftest_init);
> >  lib/test_ubsan.c:module_init(test_ubsan_init);
> >  lib/test_user_copy.c:module_init(test_user_copy_init);
> >  lib/test_uuid.c:module_init(test_uuid_init);
> >  lib/test_vmalloc.c:module_init(vmalloc_test_init)
> >  lib/test_xarray.c:module_init(xarray_checks);
> > 
> > 
> > > If they do exists, it seems like it would make sense to
> > > co

Re: [RFC] x86: Speculative execution warnings

2019-05-14 Thread Paul Turner

From: Nadav Amit 
Date: Fri, May 10, 2019 at 7:45 PM
To: 
Cc: Borislav Petkov, , Nadav Amit, Andy
Lutomirsky, Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Jann Horn

> It may be useful to check in runtime whether certain assertions are
> violated even during speculative execution. This can allow to avoid
> adding unnecessary memory fences and at the same time check that no data
> leak channels exist.
>
> For example, adding such checks can show that allocating zeroed pages
> can return speculatively non-zeroed pages (the first qword is not
> zero).  [This might be a problem when the page-fault handler performs
> software page-walk, for example.]
>
> Introduce SPEC_WARN_ON(), which checks in runtime whether a certain
> condition is violated during speculative execution. The condition should
> be computed without branches, e.g., using bitwise operators. The check
> will wait for the condition to be realized (i.e., not speculated), and
> if the assertion is violated, a warning will be thrown.
>
> Warnings can be provided in one of two modes: precise and imprecise.
> Both mode are not perfect. The precise mode does not always make it easy
> to understand which assertion was broken, but instead points to a point
> in the execution somewhere around the point in which the assertion was
> violated.  In addition, it prints a warning for each violation (unlike
> WARN_ONCE() like behavior).
>
> The imprecise mode, on the other hand, can sometimes throw the wrong
> indication, specifically if the control flow has changed between the
> speculative execution and the actual one. Note that it is not a
> false-positive, it just means that the output would mislead the user to
> think the wrong assertion was broken.
>
> There are some more limitations. Since the mechanism requires an
> indirect branch, it should not be used in production systems that are
> susceptible for Spectre v2. The mechanism requires TSX and performance
> counters that are only available in skylake+. There is a hidden
> assumption that TSX is not used in the kernel for anything else, other
> than this mechanism.
>

Nice trick!

Can you eliminate the indirect call by forcing an access fault to
abort the transaction instead, e.g. "cmove 0, $1"?

(If this works, it may also allow support on older architectures as
the RTM_RETIRED.ABORT* events go back further I believe?)

> The basic idea behind the implementation is to use a performance counter
> that updates also during speculative execution as an indication for
> assertion failure. By using conditional-mov, which is not predicted,
> to affect the control flow, the condition is realized before the event
> that affects the PMU is triggered.
>
> Enable this feature by setting "spec_warn=on" or "spec_warn=precise"
> kernel parameter. I did not run performance numbers but I guess the
> overhead should not be too high.
>
> I did not run too many tests, but brief experiments suggest that it does
> work. Let me know if I missed anything and whether you think this can be
> useful. To be frank, the exact use cases are not super clear, and there
> are various possible extensions (e.g., ensuring the speculation window
> is long enough by adding data dependencies). I would appreciate your
> inputs.
>
> Cc: Andy Lutomirsky 
> Cc: Ingo Molnar 
> Cc: Peter Zijlstra 
> Cc: Thomas Gleixner 
> Cc: Jann Horn 
> Signed-off-by: Nadav Amit 
> ---
>  arch/x86/Kconfig |   4 +
>  arch/x86/include/asm/nospec-branch.h |  30 +
>  arch/x86/kernel/Makefile |   1 +
>  arch/x86/kernel/nospec.c | 185 +++
>  4 files changed, 220 insertions(+)
>  create mode 100644 arch/x86/kernel/nospec.c
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 62fc3fda1a05..2cc57c2172be 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -2887,6 +2887,10 @@ config X86_DMA_REMAP
>  config HAVE_GENERIC_GUP
> def_bool y
>
> +config DEBUG_SPECULATIVE_EXECUTION
> +   bool "Debug speculative execution"
> +   depends on X86_64
> +
>  source "drivers/firmware/Kconfig"
>
>  source "arch/x86/kvm/Kconfig"
> diff --git a/arch/x86/include/asm/nospec-branch.h 
> b/arch/x86/include/asm/nospec-branch.h
> index dad12b767ba0..3f1af6378304 100644
> --- a/arch/x86/include/asm/nospec-branch.h
> +++ b/arch/x86/include/asm/nospec-branch.h
> @@ -290,6 +290,36 @@ static inline void 
> indirect_branch_prediction_barrier(void)
>  /* The Intel SPEC CTRL MSR base value cache */
>  extern u64 x86_spec_ctrl_base;
>
> +#ifdef CONFIG_DEBUG_SPECULATIVE_EXECUTION
> +
> +extern bool spec_check(unsigned long cond);
> +
> +DECLARE_STATIC_KEY_FALSE(spec_test_key);
> +DECLARE_STATIC_KEY_FALSE(spec_test_precise_key);
> +
> +#define SPEC_WARN_ON(cond) \
> +do {   \
> +   bool _error;\
> +

Re: [RFC KVM 00/27] KVM Address Space Isolation

2019-05-14 Thread Liran Alon




> On 14 May 2019, at 5:07, Andy Lutomirski  wrote:
> 
> On Mon, May 13, 2019 at 2:09 PM Liran Alon  wrote:
>> 
>> 
>> 
>>> On 13 May 2019, at 21:17, Andy Lutomirski  wrote:
>>> 
 I expect that the KVM address space can eventually be expanded to include
 the ioctl syscall entries. By doing so, and also adding the KVM page table
 to the process userland page table (which should be safe to do because the
 KVM address space doesn't have any secret), we could potentially handle the
 KVM ioctl without having to switch to the kernel pagetable (thus 
 effectively
 eliminating KPTI for KVM). Then the only overhead would be if a VM-Exit has
 to be handled using the full kernel address space.
 
>>> 
>>> In the hopefully common case where a VM exits and then gets re-entered
>>> without needing to load full page tables, what code actually runs?
>>> I'm trying to understand when the optimization of not switching is
>>> actually useful.
>>> 
>>> Allowing ioctl() without switching to kernel tables sounds...
>>> extremely complicated.  It also makes the dubious assumption that user
>>> memory contains no secrets.
>> 
>> Let me attempt to clarify what we were thinking when creating this patch 
>> series:
>> 
>> 1) It is never safe to execute one hyperthread inside guest while it’s 
>> sibling hyperthread runs in a virtual address space which contains secrets 
>> of host or other guests.
>> This is because we assume that using some speculative gadget (such as 
>> half-Spectrev2 gadget), it will be possible to populate *some* CPU core 
>> resource which could then be *somehow* leaked by the hyperthread running 
>> inside guest. In case of L1TF, this would be data populated to the L1D cache.
>> 
>> 2) Because of (1), every time a hyperthread runs inside host kernel, we must 
>> make sure it’s sibling is not running inside guest. i.e. We must kick the 
>> sibling hyperthread outside of guest using IPI.
>> 
>> 3) From (2), we should have theoretically deduced that for every #VMExit, 
>> there is a need to kick the sibling hyperthread also outside of guest until 
>> the #VMExit is completed. Such a patch series was implemented at some point 
>> but it had (obviously) significant performance hit.
>> 
>> 
> 4) The main goal of this patch series is to preserve (2), but to avoid
> the overhead specified in (3).
>> 
>> The way this patch series achieves (4) is by observing that during the run 
>> of a VM, most #VMExits can be handled rather quickly and locally inside KVM 
>> and doesn’t need to reference any data that is not relevant to this VM or 
>> KVM code. Therefore, if we will run these #VMExits in an isolated virtual 
>> address space (i.e. KVM isolated address space), there is no need to kick 
>> the sibling hyperthread from guest while these #VMExits handlers run.
> 
> Thanks!  This clarifies a lot of things.
> 
>> The hope is that the very vast majority of #VMExit handlers will be able to 
>> completely run without requiring to switch to full address space. Therefore, 
>> avoiding the performance hit of (2).
>> However, for the very few #VMExits that does require to run in full kernel 
>> address space, we must first kick the sibling hyperthread outside of guest 
>> and only then switch to full kernel address space and only once all 
>> hyperthreads return to KVM address space, then allow then to enter into 
>> guest.
> 
> What exactly does "kick" mean in this context?  It sounds like you're
> going to need to be able to kick sibling VMs from extremely atomic
> contexts like NMI and MCE.

Yes that’s true.
“kick” in this context will probably mean sending an IPI to all sibling 
hyperthreads.
This IPI will cause these sibling hyperthreads to exit from guest to host on 
EXTERNAL_INTERRUPT
and wait for a condition that again allows to enter back into guest.
This condition will be once all hyperthreads of CPU core is again running only 
within KVM isolated address space of this VM.

-Liran

Re: [PATCH 00/18] ARM/ARM64: Support hierarchical CPU arrangement for PSCI

2019-05-14 Thread Rafael J. Wysocki

On Mon, May 13, 2019 at 9:23 PM Ulf Hansson  wrote:
>
> This series enables support for hierarchical CPU arrangement, managed by PSCI
> for ARM/ARM64. It's based on using the generic PM domain (genpd), which
> recently was extended to manage devices belonging to CPUs.

ACK for the patches touching cpuidle in this series (from the
framework perspective), but I'm assuming it to be taken care of by
ARM/ARM64 maintainers.

Re: [PATCH v3 2/2] media: docs-rst: Document memory-to-memory video encoder interface

2019-05-14 Thread Tomasz Figa

Hi Michael,

On Tue, Apr 30, 2019 at 07:34:12PM +0200, Michael Tretter wrote:
> On Thu, 24 Jan 2019 19:04:19 +0900, Tomasz Figa wrote:

[snip]

> > +State machine
> > +=
> > +
> > +.. kernel-render:: DOT
> > +   :alt: DOT digraph of encoder state machine
> > +   :caption: Encoder state machine
> > +
> > +   digraph encoder_state_machine {
> > +   node [shape = doublecircle, label="Encoding"] Encoding;
> > +
> > +   node [shape = circle, label="Initialization"] Initialization;
> > +   node [shape = circle, label="Stopped"] Stopped;
> > +   node [shape = circle, label="Drain"] Drain;
> > +   node [shape = circle, label="Reset"] Reset;
> > +
> > +   node [shape = point]; qi
> > +   qi -> Initialization [ label = "open()" ];
> > +
> > +   Initialization -> Encoding [ label = "Both queues streaming" ];
> > +
> > +   Encoding -> Drain [ label = "V4L2_DEC_CMD_STOP" ];
> > +   Encoding -> Reset [ label = "VIDIOC_STREAMOFF(CAPTURE)" ];
> > +   Encoding -> Stopped [ label = "VIDIOC_STREAMOFF(OUTPUT)" ];
> > +   Encoding -> Encoding;
> > +
> > +   Drain -> Stopped [ label = "All CAPTURE\nbuffers 
> > dequeued\nor\nVIDIOC_STREAMOFF(CAPTURE)" ];
> 
> Shouldn't this be
> 
>   Drain -> Stopped [ label = "All OUTPUT\nbuffers 
> dequeued\nor\nVIDIOC_STREAMOFF(OUTPUT)" ];
> 
> ? While draining, the encoder continues encoding until all source
> buffers, i.e., buffers in the OUTPUT queue, are encoded or STREAMOFF
> happens on the OUTPUT queue. At the same time, the client continues to
> queue and dequeue buffers on the CAPTURE queue and there might be
> buffers queued on the CAPTURE queue even if the driver returned the
> buffer with the FLAG_LAST set and returns -EPIPE on further DQBUF
> requests.
>

The STREAMOFF should be on OUTPUT indeed, because that immediately
removes any OUTPUT buffers from the queue, so there is nothing to be
encoded to wait for anymore.

The "All OUTPUT buffers dequeued" part is correct, though. The last
OUTPUT buffer in the flush sequence is considered encoded after the
application dequeues the corresponding CAPTURE buffer is dequeued and
that buffer is marked with the V4L2_BUF_FLAG_LAST flag.

Best regards,
Tomasz

Re: [PATCH] nvme/pci: Use host managed power state for suspend

2019-05-14 Thread Rafael J. Wysocki

On Mon, May 13, 2019 at 5:10 PM Keith Busch  wrote:
>
> On Mon, May 13, 2019 at 03:05:42PM +, mario.limoncie...@dell.com wrote:
> > This system power state - suspend to idle is going to freeze threads.
> > But we're talking a multi threaded kernel.  Can't there be a timing problem 
> > going
> > on then too?  With a disk flush being active in one task and the other task 
> > trying
> > to put the disk into the deepest power state.  If you don't freeze the 
> > queues how
> > can you guarantee that didn't happen?
>
> But if an active data flush task is running, then we're not idle and
> shouldn't go to low power.

To be entirely precise, system suspend prevents user space from
running while it is in progress.  It doesn't do that to kernel
threads, at least not by default, though, so if there is a kernel
thread flushing the data, it needs to be stopped or suspended somehow
directly in the system suspend path.  [And yes, system suspend (or
hibernation) may take place at any time so long as all user space can
be prevented from running then (by means of the tasks freezer).]

However, freezing the queues from a driver ->suspend callback doesn't
help in general and the reason why is hibernation.  Roughly speaking,
hibernation works in two steps, the first of which creates a snapshot
image of system memory and the second one writes that image to
persistent storage.  Devices are resumed between the two steps in
order to make it possible to do the write, but that would unfreeze the
queues and let the data flusher run.  If it runs, it may cause the
memory snapshot image that has just been created to become outdated
and restoring the system memory contents from that image going forward
may cause corruption to occur.

Thus freezing the queues from a driver ->suspend callback should not
be relied on for correctness if the same callback is used for system
suspend and hibernation, which is the case here.  If doing that
prevents the system from crashing, it is critical to find out why IMO,
as that may very well indicate a broader issue, not necessarily in the
driver itself.

But note that even if the device turns out to behave oddly, it still
needs to be handled, unless it may be prevented from shipping to users
in that shape.  If it ships, users will face the odd behavior anyway.

[PATCH] ppp: deflate: Fix possible crash in deflate_init

2019-05-14 Thread YueHaibing

BUG: unable to handle kernel paging request at a018f000
PGD 3270067 P4D 3270067 PUD 3271063 PMD 2307eb067 PTE 0
Oops:  [#1] PREEMPT SMP
CPU: 0 PID: 4138 Comm: modprobe Not tainted 5.1.0-rc7+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
RIP: 0010:ppp_register_compressor+0x3e/0xd0 [ppp_generic]
Code: 98 4a 3f e2 48 8b 15 c1 67 00 00 41 8b 0c 24 48 81 fa 40 f0 19 a0
75 0e eb 35 48 8b 12 48 81 fa 40 f0 19 a0 74
RSP: 0018:c9d93c68 EFLAGS: 00010287
RAX: a018f000 RBX: a01a3000 RCX: 001a
RDX: 888230c750a0 RSI:  RDI: a019f000
RBP: c9d93c80 R08: 0001 R09: 
R10:  R11:  R12: a0194080
R13: 88822ee1a700 R14:  R15: c9d93e78
FS:  7f2339557540() GS:888237a0()
knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: a018f000 CR3: 00022bde4000 CR4: 06f0
Call Trace:
 ? 0xa01a3000
 deflate_init+0x11/0x1000 [ppp_deflate]
 ? 0xa01a3000
 do_one_initcall+0x6c/0x3cc
 ? kmem_cache_alloc_trace+0x248/0x3b0
 do_init_module+0x5b/0x1f1
 load_module+0x1db1/0x2690
 ? m_show+0x1d0/0x1d0
 __do_sys_finit_module+0xc5/0xd0
 __x64_sys_finit_module+0x15/0x20
 do_syscall_64+0x6b/0x1d0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

If ppp_deflate fails to register in deflate_init,
module initialization failed out, however
ppp_deflate_draft may has been regiestred and not
unregistered before return.
Then the seconed modprobe will trigger crash like this.

Reported-by: Hulk Robot 
Signed-off-by: YueHaibing 
---
 drivers/net/ppp/ppp_deflate.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ppp/ppp_deflate.c b/drivers/net/ppp/ppp_deflate.c
index b5edc7f..2829efe 100644
--- a/drivers/net/ppp/ppp_deflate.c
+++ b/drivers/net/ppp/ppp_deflate.c
@@ -610,12 +610,16 @@ static void z_incomp(void *arg, unsigned char *ibuf, int 
icnt)
 
 static int __init deflate_init(void)
 {
-int answer = ppp_register_compressor(&ppp_deflate);
-if (answer == 0)
-printk(KERN_INFO
-  "PPP Deflate Compression module registered\n");
+   int answer;
+
+   answer = ppp_register_compressor(&ppp_deflate);
+   if (answer)
+   return answer;
+
+   pr_info("PPP Deflate Compression module registered\n");
ppp_register_compressor(&ppp_deflate_draft);
-return answer;
+
+   return 0;
 }
 
 static void __exit deflate_cleanup(void)
-- 
1.8.3.1

[PATCH v6 3/6] dt-bindings: pinctrl: meson: Add drive-strength-microamp property

2019-05-14 Thread Guillaume La Roque

Add optional drive-strength-microamp property

Signed-off-by: Guillaume La Roque 
Reviewed-by: Martin Blumenstingl 
Reviewed-by: Rob Herring 
---
 Documentation/devicetree/bindings/pinctrl/meson,pinctrl.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/pinctrl/meson,pinctrl.txt 
b/Documentation/devicetree/bindings/pinctrl/meson,pinctrl.txt
index a47dd990a8d3..a7618605bf1e 100644
--- a/Documentation/devicetree/bindings/pinctrl/meson,pinctrl.txt
+++ b/Documentation/devicetree/bindings/pinctrl/meson,pinctrl.txt
@@ -51,6 +51,10 @@ Configuration nodes support the generic properties 
"bias-disable",
 "bias-pull-up" and "bias-pull-down", described in file
 pinctrl-bindings.txt
 
+Optional properties :
+ - drive-strength-microamp: Drive strength for the specified pins in uA.
+   This property is only valid for G12A and newer.
+
 === Example ===
 
pinctrl: pinctrl@c1109880 {
-- 
2.17.1

[PATCH v6 6/6] pinctrl: meson: g12a: add DS bank value

2019-05-14 Thread Guillaume La Roque

add drive-strength bank regiter and bit value for G12A SoC

Signed-off-by: Guillaume La Roque 
Reviewed-by: Martin Blumenstingl 
---
 drivers/pinctrl/meson/pinctrl-meson-g12a.c | 36 +++---
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/drivers/pinctrl/meson/pinctrl-meson-g12a.c 
b/drivers/pinctrl/meson/pinctrl-meson-g12a.c
index d494492e98e9..3475cd7bd2af 100644
--- a/drivers/pinctrl/meson/pinctrl-meson-g12a.c
+++ b/drivers/pinctrl/meson/pinctrl-meson-g12a.c
@@ -1304,28 +1304,28 @@ static struct meson_pmx_func 
meson_g12a_aobus_functions[] = {
 };
 
 static struct meson_bank meson_g12a_periphs_banks[] = {
-   /* name  first  last  irq  pullen  pull  dir  out  in */
-   BANK("Z",GPIOZ_0,GPIOZ_15, 12, 27,
-4,  0,  4,  0,  12,  0,  13, 0,  14, 0),
-   BANK("H",GPIOH_0,GPIOH_8, 28, 36,
-3,  0,  3,  0,  9,  0,  10,  0,  11,  0),
-   BANK("BOOT", BOOT_0, BOOT_15,  37, 52,
-0,  0,  0,  0,  0, 0,  1, 0,  2, 0),
-   BANK("C",GPIOC_0,GPIOC_7,  53, 60,
-1,  0,  1,  0,  3, 0,  4, 0,  5, 0),
-   BANK("A",GPIOA_0,GPIOA_15,  61, 76,
-5,  0,  5,  0,  16,  0,  17,  0,  18,  0),
-   BANK("X",GPIOX_0,GPIOX_19,   77, 96,
-2,  0,  2,  0,  6,  0,  7,  0,  8,  0),
+   /* name  first  last  irq  pullen  pull  dir  out  in  ds */
+   BANK_DS("Z",GPIOZ_0,GPIOZ_15, 12, 27,
+   4,  0,  4,  0,  12,  0,  13, 0,  14, 0, 5, 0),
+   BANK_DS("H",GPIOH_0,GPIOH_8, 28, 36,
+   3,  0,  3,  0,  9,  0,  10,  0,  11,  0, 4, 0),
+   BANK_DS("BOOT", BOOT_0, BOOT_15,  37, 52,
+   0,  0,  0,  0,  0, 0,  1, 0,  2, 0, 0, 0),
+   BANK_DS("C",GPIOC_0,GPIOC_7,  53, 60,
+   1,  0,  1,  0,  3, 0,  4, 0,  5, 0, 1, 0),
+   BANK_DS("A",GPIOA_0,GPIOA_15,  61, 76,
+   5,  0,  5,  0,  16,  0,  17,  0,  18,  0, 6, 0),
+   BANK_DS("X",GPIOX_0,GPIOX_19,   77, 96,
+   2,  0,  2,  0,  6,  0,  7,  0,  8,  0, 2, 0),
 };
 
 static struct meson_bank meson_g12a_aobus_banks[] = {
-   /* name  first  last  irq  pullen  pull  dir  out  in  */
-   BANK("AO",   GPIOAO_0,  GPIOAO_11,  0, 11,
-3,  0,  2, 0,  0,  0,  4, 0,  1,  0),
+   /* name  first  last  irq  pullen  pull  dir  out  in  ds */
+   BANK_DS("AO", GPIOAO_0, GPIOAO_11, 0, 11, 3, 0, 2, 0, 0, 0, 4, 0, 1, 0,
+   0, 0),
/* GPIOE actually located in the AO bank */
-   BANK("E",   GPIOE_0,  GPIOE_2,   97, 99,
-3,  16,  2, 16,  0,  16,  4, 16,  1,  16),
+   BANK_DS("E", GPIOE_0, GPIOE_2, 97, 99, 3, 16, 2, 16, 0, 16, 4, 16, 1,
+   16, 1, 0),
 };
 
 static struct meson_pmx_bank meson_g12a_periphs_pmx_banks[] = {
-- 
2.17.1

[PATCH v6 5/6] pinctrl: meson: add support of drive-strength-microamp

2019-05-14 Thread Guillaume La Roque

drive-strength-microamp is a new feature needed for G12A SoC.
the default DS setting after boot is usually 500uA and it is not enough for
many functions. We need to be able to set the drive strength to reliably
enable things like MMC, I2C, etc ...

Signed-off-by: Guillaume La Roque 
Reviewed-by: Martin Blumenstingl 
Tested-by: Martin Blumenstingl 
---
 drivers/pinctrl/meson/pinctrl-meson.c | 99 +++
 drivers/pinctrl/meson/pinctrl-meson.h | 18 -
 2 files changed, 116 insertions(+), 1 deletion(-)

diff --git a/drivers/pinctrl/meson/pinctrl-meson.c 
b/drivers/pinctrl/meson/pinctrl-meson.c
index 8ea5c1527064..33b4b141baac 100644
--- a/drivers/pinctrl/meson/pinctrl-meson.c
+++ b/drivers/pinctrl/meson/pinctrl-meson.c
@@ -220,11 +220,54 @@ static int meson_pinconf_enable_bias(struct meson_pinctrl 
*pc, unsigned int pin,
return 0;
 }
 
+static int meson_pinconf_set_drive_strength(struct meson_pinctrl *pc,
+   unsigned int pin,
+   u16 drive_strength_ua)
+{
+   struct meson_bank *bank;
+   unsigned int reg, bit, ds_val;
+   int ret;
+
+   if (!pc->reg_ds) {
+   dev_err(pc->dev, "drive-strength not supported\n");
+   return -ENOTSUPP;
+   }
+
+   ret = meson_get_bank(pc, pin, &bank);
+   if (ret)
+   return ret;
+
+   meson_calc_reg_and_bit(bank, pin, REG_DS, ®, &bit);
+   bit = bit << 1;
+
+   if (drive_strength_ua <= 500) {
+   ds_val = MESON_PINCONF_DRV_500UA;
+   } else if (drive_strength_ua <= 2500) {
+   ds_val = MESON_PINCONF_DRV_2500UA;
+   } else if (drive_strength_ua <= 3000) {
+   ds_val = MESON_PINCONF_DRV_3000UA;
+   } else if (drive_strength_ua <= 4000) {
+   ds_val = MESON_PINCONF_DRV_4000UA;
+   } else {
+   dev_warn_once(pc->dev,
+ "pin %u: invalid drive-strength : %d , default to 
4mA\n",
+ pin, drive_strength_ua);
+   ds_val = MESON_PINCONF_DRV_4000UA;
+   }
+
+   ret = regmap_update_bits(pc->reg_ds, reg, 0x3 << bit, ds_val << bit);
+   if (ret)
+   return ret;
+
+   return 0;
+}
+
 static int meson_pinconf_set(struct pinctrl_dev *pcdev, unsigned int pin,
 unsigned long *configs, unsigned num_configs)
 {
struct meson_pinctrl *pc = pinctrl_dev_get_drvdata(pcdev);
enum pin_config_param param;
+   unsigned int drive_strength_ua;
int i, ret;
 
for (i = 0; i < num_configs; i++) {
@@ -246,6 +289,14 @@ static int meson_pinconf_set(struct pinctrl_dev *pcdev, 
unsigned int pin,
if (ret)
return ret;
break;
+   case PIN_CONFIG_DRIVE_STRENGTH_UA:
+   drive_strength_ua =
+   pinconf_to_config_argument(configs[i]);
+   ret = meson_pinconf_set_drive_strength
+   (pc, pin, drive_strength_ua);
+   if (ret)
+   return ret;
+   break;
default:
return -ENOTSUPP;
}
@@ -288,12 +339,55 @@ static int meson_pinconf_get_pull(struct meson_pinctrl 
*pc, unsigned int pin)
return conf;
 }
 
+static int meson_pinconf_get_drive_strength(struct meson_pinctrl *pc,
+   unsigned int pin,
+   u16 *drive_strength_ua)
+{
+   struct meson_bank *bank;
+   unsigned int reg, bit;
+   unsigned int val;
+   int ret;
+
+   if (!pc->reg_ds)
+   return -ENOTSUPP;
+
+   ret = meson_get_bank(pc, pin, &bank);
+   if (ret)
+   return ret;
+
+   meson_calc_reg_and_bit(bank, pin, REG_DS, ®, &bit);
+
+   ret = regmap_read(pc->reg_ds, reg, &val);
+   if (ret)
+   return ret;
+
+   switch ((val >> bit) & 0x3) {
+   case MESON_PINCONF_DRV_500UA:
+   *drive_strength_ua = 500;
+   break;
+   case MESON_PINCONF_DRV_2500UA:
+   *drive_strength_ua = 2500;
+   break;
+   case MESON_PINCONF_DRV_3000UA:
+   *drive_strength_ua = 3000;
+   break;
+   case MESON_PINCONF_DRV_4000UA:
+   *drive_strength_ua = 4000;
+   break;
+   default:
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
 static int meson_pinconf_get(struct pinctrl_dev *pcdev, unsigned int pin,
 unsigned long *config)
 {
struct meson_pinctrl *pc = pinctrl_dev_get_drvdata(pcdev);
enum pin_config_param param = pinconf_to_config_param(*config);
u16 arg;
+   int ret;
 
switch (param) {
case PIN_CONFIG_BIAS_DISABLE:

[PATCH v6 1/6] dt-bindings: pinctrl: add a 'drive-strength-microamp' property

2019-05-14 Thread Guillaume La Roque

This property allow drive-strength parameter in uA instead of mA.

Signed-off-by: Guillaume La Roque 
Acked-by: Martin Blumenstingl 
Reviewed-by: Martin Blumenstingl 
Reviewed-by: Rob Herring 
---
 Documentation/devicetree/bindings/pinctrl/pinctrl-bindings.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/devicetree/bindings/pinctrl/pinctrl-bindings.txt 
b/Documentation/devicetree/bindings/pinctrl/pinctrl-bindings.txt
index cef2b5855d60..fcd37e93ed4d 100644
--- a/Documentation/devicetree/bindings/pinctrl/pinctrl-bindings.txt
+++ b/Documentation/devicetree/bindings/pinctrl/pinctrl-bindings.txt
@@ -258,6 +258,7 @@ drive-push-pull - drive actively high and low
 drive-open-drain   - drive with open drain
 drive-open-source  - drive with open source
 drive-strength - sink or source at most X mA
+drive-strength-microamp- sink or source at most X uA
 input-enable   - enable input on pin (no effect on output, such as
  enabling an input buffer)
 input-disable  - disable input on pin (no effect on output, such as
@@ -326,6 +327,8 @@ arguments are described below.
 
 - drive-strength takes as argument the target strength in mA.
 
+- drive-strength-microamp takes as argument the target strength in uA.
+
 - input-debounce takes the debounce time in usec as argument
   or 0 to disable debouncing
 
-- 
2.17.1

[PATCH v6 4/6] pinctrl: meson: Rework enable/disable bias part

2019-05-14 Thread Guillaume La Roque

rework bias enable/disable part to prepare drive-strength integration
no functional changes

Signed-off-by: Guillaume La Roque 
Reviewed-by: Martin Blumenstingl 
Tested-by: Martin Blumenstingl 
---
 drivers/pinctrl/meson/pinctrl-meson.c | 85 +++
 1 file changed, 49 insertions(+), 36 deletions(-)

diff --git a/drivers/pinctrl/meson/pinctrl-meson.c 
b/drivers/pinctrl/meson/pinctrl-meson.c
index 96a4a72708e4..8ea5c1527064 100644
--- a/drivers/pinctrl/meson/pinctrl-meson.c
+++ b/drivers/pinctrl/meson/pinctrl-meson.c
@@ -174,62 +174,75 @@ int meson_pmx_get_groups(struct pinctrl_dev *pcdev, 
unsigned selector,
return 0;
 }
 
-static int meson_pinconf_set(struct pinctrl_dev *pcdev, unsigned int pin,
-unsigned long *configs, unsigned num_configs)
+static int meson_pinconf_disable_bias(struct meson_pinctrl *pc,
+ unsigned int pin)
 {
-   struct meson_pinctrl *pc = pinctrl_dev_get_drvdata(pcdev);
struct meson_bank *bank;
-   enum pin_config_param param;
-   unsigned int reg, bit;
-   int i, ret;
+   unsigned int reg, bit = 0;
+   int ret;
 
ret = meson_get_bank(pc, pin, &bank);
if (ret)
return ret;
 
+   meson_calc_reg_and_bit(bank, pin, REG_PULLEN, ®, &bit);
+   ret = regmap_update_bits(pc->reg_pullen, reg, BIT(bit), 0);
+   if (ret)
+   return ret;
+
+   return 0;
+}
+
+static int meson_pinconf_enable_bias(struct meson_pinctrl *pc, unsigned int 
pin,
+bool pull_up)
+{
+   struct meson_bank *bank;
+   unsigned int reg, bit, val = 0;
+   int ret;
+
+   ret = meson_get_bank(pc, pin, &bank);
+   if (ret)
+   return ret;
+
+   meson_calc_reg_and_bit(bank, pin, REG_PULL, ®, &bit);
+   if (pull_up)
+   val = BIT(bit);
+
+   ret = regmap_update_bits(pc->reg_pull, reg, BIT(bit), val);
+   if (ret)
+   return ret;
+
+   meson_calc_reg_and_bit(bank, pin, REG_PULLEN, ®, &bit);
+   ret = regmap_update_bits(pc->reg_pullen, reg, BIT(bit), BIT(bit));
+   if (ret)
+   return ret;
+
+   return 0;
+}
+
+static int meson_pinconf_set(struct pinctrl_dev *pcdev, unsigned int pin,
+unsigned long *configs, unsigned num_configs)
+{
+   struct meson_pinctrl *pc = pinctrl_dev_get_drvdata(pcdev);
+   enum pin_config_param param;
+   int i, ret;
+
for (i = 0; i < num_configs; i++) {
param = pinconf_to_config_param(configs[i]);
 
switch (param) {
case PIN_CONFIG_BIAS_DISABLE:
-   dev_dbg(pc->dev, "pin %u: disable bias\n", pin);
-
-   meson_calc_reg_and_bit(bank, pin, REG_PULLEN, ®,
-  &bit);
-   ret = regmap_update_bits(pc->reg_pullen, reg,
-BIT(bit), 0);
+   ret = meson_pinconf_disable_bias(pc, pin);
if (ret)
return ret;
break;
case PIN_CONFIG_BIAS_PULL_UP:
-   dev_dbg(pc->dev, "pin %u: enable pull-up\n", pin);
-
-   meson_calc_reg_and_bit(bank, pin, REG_PULLEN,
-  ®, &bit);
-   ret = regmap_update_bits(pc->reg_pullen, reg,
-BIT(bit), BIT(bit));
-   if (ret)
-   return ret;
-
-   meson_calc_reg_and_bit(bank, pin, REG_PULL, ®, &bit);
-   ret = regmap_update_bits(pc->reg_pull, reg,
-BIT(bit), BIT(bit));
+   ret = meson_pinconf_enable_bias(pc, pin, true);
if (ret)
return ret;
break;
case PIN_CONFIG_BIAS_PULL_DOWN:
-   dev_dbg(pc->dev, "pin %u: enable pull-down\n", pin);
-
-   meson_calc_reg_and_bit(bank, pin, REG_PULLEN,
-  ®, &bit);
-   ret = regmap_update_bits(pc->reg_pullen, reg,
-BIT(bit), BIT(bit));
-   if (ret)
-   return ret;
-
-   meson_calc_reg_and_bit(bank, pin, REG_PULL, ®, &bit);
-   ret = regmap_update_bits(pc->reg_pull, reg,
-BIT(bit), 0);
+   ret = meson_pinconf_enable_bias(pc, pin, false);
if (ret)
return ret;
break;
-- 
2.17.1

Re: [RFC KVM 18/27] kvm/isolation: function to copy page table entries for percpu buffer

2019-05-14 Thread Alexandre Chartre




On 5/14/19 9:09 AM, Peter Zijlstra wrote:

On Mon, May 13, 2019 at 11:18:41AM -0700, Andy Lutomirski wrote:

On Mon, May 13, 2019 at 7:39 AM Alexandre Chartre
 wrote:


pcpu_base_addr is already mapped to the KVM address space, but this
represents the first percpu chunk. To access a per-cpu buffer not
allocated in the first chunk, add a function which maps all cpu
buffers corresponding to that per-cpu buffer.

Also add function to clear page table entries for a percpu buffer.



This needs some kind of clarification so that readers can tell whether
you're trying to map all percpu memory or just map a specific
variable.  In either case, you're making a dubious assumption that
percpu memory contains no secrets.


I'm thinking the per-cpu random pool is a secrit. IOW, it demonstrably
does contain secrits, invalidating that premise.



The current code unconditionally maps the entire first percpu chunk
(pcpu_base_addr). So it assumes it doesn't contain any secret. That is
mainly a simplification for the POC because a lot of core information
that we need, for example just to switch mm, are stored there (like
cpu_tlbstate, current_task...).

If the entire first percpu chunk effectively has secret then we will
need to individually map only buffers we need. The kvm_copy_percpu_mapping()
function is added to copy mapping for a specified percpu buffer, so
this used to map percpu buffers which are not in the first percpu chunk.

Also note that mapping is constrained by PTE (4K), so mapped buffers
(percpu or not) which do not fill a whole set of pages can leak adjacent
data store on the same pages.

alex.

[PATCH v6 2/6] pinctrl: generic: add new 'drive-strength-microamp' property support

2019-05-14 Thread Guillaume La Roque

Add drive-strength-microamp property support to allow drive strength in uA

Signed-off-by: Guillaume La Roque 
---
 drivers/pinctrl/pinconf-generic.c   | 2 ++
 include/linux/pinctrl/pinconf-generic.h | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/drivers/pinctrl/pinconf-generic.c 
b/drivers/pinctrl/pinconf-generic.c
index b4f7f8a458ea..d0cbdb1ad76a 100644
--- a/drivers/pinctrl/pinconf-generic.c
+++ b/drivers/pinctrl/pinconf-generic.c
@@ -39,6 +39,7 @@ static const struct pin_config_item conf_items[] = {
PCONFDUMP(PIN_CONFIG_DRIVE_OPEN_SOURCE, "output drive open source", 
NULL, false),
PCONFDUMP(PIN_CONFIG_DRIVE_PUSH_PULL, "output drive push pull", NULL, 
false),
PCONFDUMP(PIN_CONFIG_DRIVE_STRENGTH, "output drive strength", "mA", 
true),
+   PCONFDUMP(PIN_CONFIG_DRIVE_STRENGTH_UA, "output drive strength", "uA", 
true),
PCONFDUMP(PIN_CONFIG_INPUT_DEBOUNCE, "input debounce", "usec", true),
PCONFDUMP(PIN_CONFIG_INPUT_ENABLE, "input enabled", NULL, false),
PCONFDUMP(PIN_CONFIG_INPUT_SCHMITT, "input schmitt trigger", NULL, 
false),
@@ -167,6 +168,7 @@ static const struct pinconf_generic_params dt_params[] = {
{ "drive-open-source", PIN_CONFIG_DRIVE_OPEN_SOURCE, 0 },
{ "drive-push-pull", PIN_CONFIG_DRIVE_PUSH_PULL, 0 },
{ "drive-strength", PIN_CONFIG_DRIVE_STRENGTH, 0 },
+   { "drive-strength-microamp", PIN_CONFIG_DRIVE_STRENGTH_UA, 0 },
{ "input-debounce", PIN_CONFIG_INPUT_DEBOUNCE, 0 },
{ "input-disable", PIN_CONFIG_INPUT_ENABLE, 0 },
{ "input-enable", PIN_CONFIG_INPUT_ENABLE, 1 },
diff --git a/include/linux/pinctrl/pinconf-generic.h 
b/include/linux/pinctrl/pinconf-generic.h
index 6c0680641108..72d06d6a3099 100644
--- a/include/linux/pinctrl/pinconf-generic.h
+++ b/include/linux/pinctrl/pinconf-generic.h
@@ -55,6 +55,8 @@
  * push-pull mode, the argument is ignored.
  * @PIN_CONFIG_DRIVE_STRENGTH: the pin will sink or source at most the current
  * passed as argument. The argument is in mA.
+ * @PIN_CONFIG_DRIVE_STRENGTH_UA: the pin will sink or source at most the 
current
+ * passed as argument. The argument is in uA.
  * @PIN_CONFIG_INPUT_DEBOUNCE: this will configure the pin to debounce mode,
  * which means it will wait for signals to settle when reading inputs. The
  * argument gives the debounce time in usecs. Setting the
@@ -112,6 +114,7 @@ enum pin_config_param {
PIN_CONFIG_DRIVE_OPEN_SOURCE,
PIN_CONFIG_DRIVE_PUSH_PULL,
PIN_CONFIG_DRIVE_STRENGTH,
+   PIN_CONFIG_DRIVE_STRENGTH_UA,
PIN_CONFIG_INPUT_DEBOUNCE,
PIN_CONFIG_INPUT_ENABLE,
PIN_CONFIG_INPUT_SCHMITT,
-- 
2.17.1

[PATCH v4 3/8] remoteproc: stm32: add an ST stm32_rproc driver

2019-05-14 Thread Fabien Dessenne

This patch introduces a new remoteproc driver to control Cortex-M4
co-processor of the STM32 family.
It provides with the following features:
- start and stop
- dedicated co-processor memory regions registration
- coredump and recovery

Signed-off-by: Fabien Dessenne 
Signed-off-by: Ludovic Barre 
Signed-off-by: Loic Pallardy 
Signed-off-by: Arnaud Pouliquen 
---
 drivers/remoteproc/Kconfig   |  15 +
 drivers/remoteproc/Makefile  |   1 +
 drivers/remoteproc/stm32_rproc.c | 628 +++
 3 files changed, 644 insertions(+)
 create mode 100644 drivers/remoteproc/stm32_rproc.c

diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
index f0abd26..0fba05a 100644
--- a/drivers/remoteproc/Kconfig
+++ b/drivers/remoteproc/Kconfig
@@ -197,6 +197,21 @@ config ST_REMOTEPROC
 config ST_SLIM_REMOTEPROC
tristate
 
+config STM32_RPROC
+   tristate "STM32 remoteproc support"
+   depends on ARCH_STM32
+   depends on REMOTEPROC
+   select MAILBOX
+   help
+ Say y here to support STM32 MCU processors via the
+ remote processor framework.
+
+ You want to say y here in order to enable AMP
+ use-cases to run on your platform (dedicated firmware could be
+ offloaded to remote MCU processors using this framework).
+
+ This can be either built-in or a loadable module.
+
 endif # REMOTEPROC
 
 endmenu
diff --git a/drivers/remoteproc/Makefile b/drivers/remoteproc/Makefile
index ce5d061..00f09e6 100644
--- a/drivers/remoteproc/Makefile
+++ b/drivers/remoteproc/Makefile
@@ -26,3 +26,4 @@ qcom_wcnss_pil-y  += qcom_wcnss.o
 qcom_wcnss_pil-y   += qcom_wcnss_iris.o
 obj-$(CONFIG_ST_REMOTEPROC)+= st_remoteproc.o
 obj-$(CONFIG_ST_SLIM_REMOTEPROC)   += st_slim_rproc.o
+obj-$(CONFIG_STM32_RPROC)  += stm32_rproc.o
diff --git a/drivers/remoteproc/stm32_rproc.c b/drivers/remoteproc/stm32_rproc.c
new file mode 100644
index 000..eb24b92c
--- /dev/null
+++ b/drivers/remoteproc/stm32_rproc.c
@@ -0,0 +1,628 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) STMicroelectronics 2018 - All Rights Reserved
+ * Authors: Ludovic Barre  for STMicroelectronics.
+ *  Fabien Dessenne  for STMicroelectronics.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "remoteproc_internal.h"
+
+#define HOLD_BOOT  0
+#define RELEASE_BOOT   1
+
+#define MBOX_NB_VQ 2
+#define MBOX_NB_MBX3
+
+#define STM32_SMC_RCC  0x82001000
+#define STM32_SMC_REG_WRITE0x1
+
+#define STM32_MBX_VQ0  "vq0"
+#define STM32_MBX_VQ1  "vq1"
+#define STM32_MBX_SHUTDOWN "shutdown"
+
+struct stm32_syscon {
+   struct regmap *map;
+   u32 reg;
+   u32 mask;
+};
+
+struct stm32_rproc_mem {
+   char name[20];
+   void __iomem *cpu_addr;
+   phys_addr_t bus_addr;
+   u32 dev_addr;
+   size_t size;
+};
+
+struct stm32_rproc_mem_ranges {
+   u32 dev_addr;
+   u32 bus_addr;
+   u32 size;
+};
+
+struct stm32_mbox {
+   const unsigned char name[10];
+   struct mbox_chan *chan;
+   struct mbox_client client;
+   int vq_id;
+};
+
+struct stm32_rproc {
+   struct reset_control *rst;
+   struct stm32_syscon hold_boot;
+   struct stm32_syscon pdds;
+   u32 nb_rmems;
+   struct stm32_rproc_mem *rmems;
+   struct stm32_mbox mb[MBOX_NB_MBX];
+   bool secured_soc;
+};
+
+static int stm32_rproc_pa_to_da(struct rproc *rproc, phys_addr_t pa, u64 *da)
+{
+   unsigned int i;
+   struct stm32_rproc *ddata = rproc->priv;
+   struct stm32_rproc_mem *p_mem;
+
+   for (i = 0; i < ddata->nb_rmems; i++) {
+   p_mem = &ddata->rmems[i];
+
+   if (pa < p_mem->bus_addr ||
+   pa >= p_mem->bus_addr + p_mem->size)
+   continue;
+   *da = pa - p_mem->bus_addr + p_mem->dev_addr;
+   dev_dbg(rproc->dev.parent, "pa %#x to da %llx\n", pa, *da);
+   return 0;
+   }
+
+   return -EINVAL;
+}
+
+static int stm32_rproc_mem_alloc(struct rproc *rproc,
+struct rproc_mem_entry *mem)
+{
+   struct device *dev = rproc->dev.parent;
+   void *va;
+
+   dev_dbg(dev, "map memory: %pa+%zx\n", &mem->dma, mem->len);
+   va = ioremap_wc(mem->dma, mem->len);
+   if (IS_ERR_OR_NULL(va)) {
+   dev_err(dev, "Unable to map memory region: %pa+%zx\n",
+   &mem->dma, mem->len);
+   return -ENOMEM;
+   }
+
+   /* Update memory entry va */
+   mem->va = va;
+
+   return 0;
+}
+
+static int stm32_rproc_mem_release(struct rproc *rproc,
+  struct rproc_mem_entry *mem)
+{
+   dev_dbg(rproc->dev.parent, "unmap memory: %pa\n"

[PATCH v4 2/8] dt-bindings: remoteproc: add bindings for stm32 remote processor driver

2019-05-14 Thread Fabien Dessenne

Add the device tree bindings document for the stm32 remoteproc devices.

Signed-off-by: Fabien Dessenne 
---
 .../devicetree/bindings/remoteproc/stm32-rproc.txt | 63 ++
 1 file changed, 63 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/remoteproc/stm32-rproc.txt

diff --git a/Documentation/devicetree/bindings/remoteproc/stm32-rproc.txt 
b/Documentation/devicetree/bindings/remoteproc/stm32-rproc.txt
new file mode 100644
index 000..5fa915a
--- /dev/null
+++ b/Documentation/devicetree/bindings/remoteproc/stm32-rproc.txt
@@ -0,0 +1,63 @@
+STMicroelectronics STM32 Remoteproc
+---
+This document defines the binding for the remoteproc component that loads and
+boots firmwares on the ST32MP family chipset.
+
+Required properties:
+- compatible:  Must be "st,stm32mp1-m4"
+- reg: Address ranges of the RETRAM and MCU SRAM memories used by the
+   remote processor.
+- resets:  Reference to a reset controller asserting the remote processor.
+- st,syscfg-holdboot: Reference to the system configuration which holds the
+   remote processor reset hold boot
+   1st cell: phandle of syscon block
+   2nd cell: register offset containing the hold boot setting
+   3rd cell: register bitmask for the hold boot field
+- st,syscfg-tz: Reference to the system configuration which holds the RCC trust
+   zone mode
+   1st cell: phandle to syscon block
+   2nd cell: register offset containing the RCC trust zone mode setting
+   3rd cell: register bitmask for the RCC trust zone mode bit
+
+Optional properties:
+- interrupts:  Should contain the watchdog interrupt
+- mboxes:  This property is required only if the rpmsg/virtio functionality
+   is used. List of phandle and mailbox channel specifiers:
+   - a channel (a) used to communicate through virtqueues with the
+ remote proc.
+ Bi-directional channel:
+ - from local to remote = send message
+ - from remote to local = send message ack
+   - a channel (b) working the opposite direction of channel (a)
+   - a channel (c) used by the local proc to notify the remote proc
+ that it is about to be shut down.
+ Unidirectional channel:
+ - from local to remote, where ACK from the remote means
+   that it is ready for shutdown
+- mbox-names:  This property is required if the mboxes property is used.
+   - must be "vq0" for channel (a)
+   - must be "vq1" for channel (b)
+   - must be "shutdown" for channel (c)
+- memory-region: List of phandles to the reserved memory regions associated 
with
+   the remoteproc device. This is variable and describes the
+   memories shared with the remote processor (eg: remoteproc
+   firmware and carveouts, rpmsg vrings, ...).
+   (see ../reserved-memory/reserved-memory.txt)
+- st,syscfg-pdds: Reference to the system configuration which holds the remote
+   processor deep sleep setting
+   1st cell: phandle to syscon block
+   2nd cell: register offset containing the deep sleep setting
+   3rd cell: register bitmask for the deep sleep bit
+- st,auto-boot:If defined, when remoteproc is probed, it loads the 
default
+   firmware and starts the remote processor.
+
+Example:
+   m4_rproc: m4@1000 {
+   compatible = "st,stm32mp1-m4";
+   reg = <0x1000 0x4>,
+ <0x3000 0x4>,
+ <0x3800 0x1>;
+   resets = <&rcc MCU_R>;
+   st,syscfg-holdboot = <&rcc 0x10C 0x1>;
+   st,syscfg-tz = <&rcc 0x000 0x1>;
+   };
-- 
2.7.4

[PATCH v6 0/6] Add drive-strength in Meson pinctrl driver

2019-05-14 Thread Guillaume La Roque

The purpose of this patchset is to add drive-strength support in meson pinconf
driver. This is a new feature that was added on the g12a. It is critical for us
to support this since many functions are failing with default pad 
drive-strength.

The value achievable by the SoC are 0.5mA, 2.5mA, 3mA and 4mA and the DT 
property
'drive-strength' is expressed in mA.
So this patch add another generic property "drive-strength-microamp". The 
change to do so
would be minimal and could be benefit to other platforms later on.

Cheers
Guillaume

Changes since v5:
- restore Tested-by/Reviewed-by/Ack-by tags

Changes since v4:
- fix dt-binding documentation
- rename drive-strength-uA to drive-strength-microamp in coverletter

Changes since v3:
- remove dev_err in meson_get_drive_strength
- cleanup code

Changes since v2:
- rename driver-strength-uA property to drive-strength-microamp
- rework patch series for better understanding
- rework set_bias function

Changes since v1:
- fix missing break
- implement new pinctrl generic property "drive-strength-uA"

[1] https://lkml.kernel.org/r/20190314163725.7918-1-jbru...@baylibre.com
Tested-by: Jerome Brunet 

Guillaume La Roque (6):
  dt-bindings: pinctrl: add a 'drive-strength-microamp' property
  pinctrl: generic: add new 'drive-strength-microamp' property support
  dt-bindings: pinctrl: meson: Add drive-strength-microamp property
  pinctrl: meson: Rework enable/disable bias part
  pinctrl: meson: add support of drive-strength-microamp
  pinctrl: meson: g12a: add DS bank value

 .../bindings/pinctrl/meson,pinctrl.txt|   4 +
 .../bindings/pinctrl/pinctrl-bindings.txt |   3 +
 drivers/pinctrl/meson/pinctrl-meson-g12a.c|  36 ++--
 drivers/pinctrl/meson/pinctrl-meson.c | 180 ++
 drivers/pinctrl/meson/pinctrl-meson.h |  18 +-
 drivers/pinctrl/pinconf-generic.c |   2 +
 include/linux/pinctrl/pinconf-generic.h   |   3 +
 7 files changed, 193 insertions(+), 53 deletions(-)

-- 
2.17.1

Re: [PATCH] arm64: dts: imx8mq: Remove unnecessary blank lines

2019-05-14 Thread Daniel Baluta

On Tue, May 14, 2019 at 9:09 AM Anson Huang  wrote:
>
> Unnecessary blank lines do NOT help readability, so remove them.
>
> Signed-off-by: Anson Huang 

Reviewed-by: Daniel Baluta

[PATCH v4 6/8] ARM: dts: stm32: enable m4 coprocessor support on STM32MP157c-ed1

2019-05-14 Thread Fabien Dessenne

Enable m4 coprocessor for STM32MP157c-ed1 board.

Signed-off-by: Fabien Dessenne 
---
 arch/arm/boot/dts/stm32mp157c-ed1.dts | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm/boot/dts/stm32mp157c-ed1.dts 
b/arch/arm/boot/dts/stm32mp157c-ed1.dts
index acfc5cd..e5a6f40 100644
--- a/arch/arm/boot/dts/stm32mp157c-ed1.dts
+++ b/arch/arm/boot/dts/stm32mp157c-ed1.dts
@@ -134,6 +134,16 @@
status = "okay";
 };
 
+&m4_rproc {
+   memory-region = <&retram>, <&mcuram>, <&mcuram2>, <&vdev0vring0>,
+   <&vdev0vring1>, <&vdev0buffer>;
+   mboxes = <&ipcc 0>, <&ipcc 1>, <&ipcc 2>;
+   mbox-names = "vq0", "vq1", "shutdown";
+   interrupt-parent = <&exti>;
+   interrupts = <68 1>;
+   status = "okay";
+};
+
 &rng1 {
status = "okay";
 };
-- 
2.7.4

[PATCH v4 0/8] stm32 m4 remoteproc on STM32MP157c

2019-05-14 Thread Fabien Dessenne

STMicrolectronics STM32MP157 MPU are based on a Dual Arm Cortex-A7 core and a
Cortex-M4.
This patchset adds the support of the stm32_rproc driver allowing to control
the M4 remote processor.

Changes since v3:
-Replaced "st,auto_boot" with "st,auto-boot"
-Update m4 reg values and align with unit-address

Changes since v2:
- Clarified "reg" description
- Change m4 unit adress to 3800
- Renamed "auto_boot" in "st,auto-boot"

Changes since v1:
- Gave details about the memory mapping (in bindings).
- Used 'dma-ranges' instead of 'ranges'.
- Updated the 'compatible' property.
- Remove the 'recovery', 'reset-names' and 'interrupt-names' properties.
- Clarified why / when mailboxes are optional.

Fabien Dessenne (8):
  dt-bindings: stm32: add bindings for ML-AHB interconnect
  dt-bindings: remoteproc: add bindings for stm32 remote processor
driver
  remoteproc: stm32: add an ST stm32_rproc driver
  ARM: dts: stm32: add m4 remoteproc support on STM32MP157c
  ARM: dts: stm32: declare copro reserved memories on STM32MP157c-ed1
  ARM: dts: stm32: enable m4 coprocessor support on STM32MP157c-ed1
  ARM: dts: stm32: declare copro reserved memories on STM32MP157a-dk1
  ARM: dts: stm32: enable m4 coprocessor support on STM32MP157a-dk1

 .../devicetree/bindings/arm/stm32/mlahb.txt|  37 ++
 .../devicetree/bindings/remoteproc/stm32-rproc.txt |  63 +++
 arch/arm/boot/dts/stm32mp157a-dk1.dts  |  52 ++
 arch/arm/boot/dts/stm32mp157c-ed1.dts  |  52 ++
 arch/arm/boot/dts/stm32mp157c.dtsi |  20 +
 drivers/remoteproc/Kconfig |  15 +
 drivers/remoteproc/Makefile|   1 +
 drivers/remoteproc/stm32_rproc.c   | 628 +
 8 files changed, 868 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/stm32/mlahb.txt
 create mode 100644 Documentation/devicetree/bindings/remoteproc/stm32-rproc.txt
 create mode 100644 drivers/remoteproc/stm32_rproc.c

-- 
2.7.4

[PATCH v4 7/8] ARM: dts: stm32: declare copro reserved memories on STM32MP157a-dk1

2019-05-14 Thread Fabien Dessenne

Declare reserved memories shared by the processors for STM32MP157a-dk1

Signed-off-by: Fabien Dessenne 
---
 arch/arm/boot/dts/stm32mp157a-dk1.dts | 42 +++
 1 file changed, 42 insertions(+)

diff --git a/arch/arm/boot/dts/stm32mp157a-dk1.dts 
b/arch/arm/boot/dts/stm32mp157a-dk1.dts
index 85a761a..26ce8de 100644
--- a/arch/arm/boot/dts/stm32mp157a-dk1.dts
+++ b/arch/arm/boot/dts/stm32mp157a-dk1.dts
@@ -27,6 +27,48 @@
reg = <0xc000 0x2000>;
};
 
+   reserved-memory {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   ranges;
+
+   mcuram2: mcuram2@1000 {
+   compatible = "shared-dma-pool";
+   reg = <0x1000 0x4>;
+   no-map;
+   };
+
+   vdev0vring0: vdev0vring0@1004 {
+   compatible = "shared-dma-pool";
+   reg = <0x1004 0x1000>;
+   no-map;
+   };
+
+   vdev0vring1: vdev0vring1@10041000 {
+   compatible = "shared-dma-pool";
+   reg = <0x10041000 0x1000>;
+   no-map;
+   };
+
+   vdev0buffer: vdev0buffer@10042000 {
+   compatible = "shared-dma-pool";
+   reg = <0x10042000 0x4000>;
+   no-map;
+   };
+
+   mcuram: mcuram@3000 {
+   compatible = "shared-dma-pool";
+   reg = <0x3000 0x4>;
+   no-map;
+   };
+
+   retram: retram@3800 {
+   compatible = "shared-dma-pool";
+   reg = <0x3800 0x1>;
+   no-map;
+   };
+   };
+
led {
compatible = "gpio-leds";
blue {
-- 
2.7.4

[PATCH v4 5/8] ARM: dts: stm32: declare copro reserved memories on STM32MP157c-ed1

2019-05-14 Thread Fabien Dessenne

Declare reserved memories shared by the processors for STM32MP157c-ed1
board.

Signed-off-by: Fabien Dessenne 
---
 arch/arm/boot/dts/stm32mp157c-ed1.dts | 42 +++
 1 file changed, 42 insertions(+)

diff --git a/arch/arm/boot/dts/stm32mp157c-ed1.dts 
b/arch/arm/boot/dts/stm32mp157c-ed1.dts
index 626ceb3..acfc5cd 100644
--- a/arch/arm/boot/dts/stm32mp157c-ed1.dts
+++ b/arch/arm/boot/dts/stm32mp157c-ed1.dts
@@ -21,6 +21,48 @@
reg = <0xC000 0x4000>;
};
 
+   reserved-memory {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   ranges;
+
+   mcuram2: mcuram2@1000 {
+   compatible = "shared-dma-pool";
+   reg = <0x1000 0x4>;
+   no-map;
+   };
+
+   vdev0vring0: vdev0vring0@1004 {
+   compatible = "shared-dma-pool";
+   reg = <0x1004 0x1000>;
+   no-map;
+   };
+
+   vdev0vring1: vdev0vring1@10041000 {
+   compatible = "shared-dma-pool";
+   reg = <0x10041000 0x1000>;
+   no-map;
+   };
+
+   vdev0buffer: vdev0buffer@10042000 {
+   compatible = "shared-dma-pool";
+   reg = <0x10042000 0x4000>;
+   no-map;
+   };
+
+   mcuram: mcuram@3000 {
+   compatible = "shared-dma-pool";
+   reg = <0x3000 0x4>;
+   no-map;
+   };
+
+   retram: retram@3800 {
+   compatible = "shared-dma-pool";
+   reg = <0x3800 0x1>;
+   no-map;
+   };
+   };
+
aliases {
serial0 = &uart4;
};
-- 
2.7.4

[PATCH v4 8/8] ARM: dts: stm32: enable m4 coprocessor support on STM32MP157a-dk1

2019-05-14 Thread Fabien Dessenne

Enable m4 coprocessor for STM32MP157a-dk1 board.

Signed-off-by: Fabien Dessenne 
---
 arch/arm/boot/dts/stm32mp157a-dk1.dts | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm/boot/dts/stm32mp157a-dk1.dts 
b/arch/arm/boot/dts/stm32mp157a-dk1.dts
index 26ce8de..da64ee2 100644
--- a/arch/arm/boot/dts/stm32mp157a-dk1.dts
+++ b/arch/arm/boot/dts/stm32mp157a-dk1.dts
@@ -116,6 +116,16 @@
status = "okay";
 };
 
+&m4_rproc {
+   memory-region = <&retram>, <&mcuram>, <&mcuram2>, <&vdev0vring0>,
+   <&vdev0vring1>, <&vdev0buffer>;
+   mboxes = <&ipcc 0>, <&ipcc 1>, <&ipcc 2>;
+   mbox-names = "vq0", "vq1", "shutdown";
+   interrupt-parent = <&exti>;
+   interrupts = <68 1>;
+   status = "okay";
+};
+
 &rng1 {
status = "okay";
 };
-- 
2.7.4

[PATCH v4 4/8] ARM: dts: stm32: add m4 remoteproc support on STM32MP157c

2019-05-14 Thread Fabien Dessenne

Declare the M4 remote processor in a sub-node of the mlahb simple bus.

Signed-off-by: Fabien Dessenne 
---
 arch/arm/boot/dts/stm32mp157c.dtsi | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/arm/boot/dts/stm32mp157c.dtsi 
b/arch/arm/boot/dts/stm32mp157c.dtsi
index c664b55..39bbcda 100644
--- a/arch/arm/boot/dts/stm32mp157c.dtsi
+++ b/arch/arm/boot/dts/stm32mp157c.dtsi
@@ -1242,4 +1242,24 @@
status = "disabled";
};
};
+
+   mlahb {
+   compatible = "simple-bus";
+   #address-cells = <1>;
+   #size-cells = <1>;
+   dma-ranges = <0x 0x3800 0x1>,
+<0x1000 0x1000 0x6>,
+<0x3000 0x3000 0x6>;
+
+   m4_rproc: m4@1000 {
+   compatible = "st,stm32mp1-m4";
+   reg = <0x1000 0x4>,
+ <0x3000 0x4>,
+ <0x3800 0x1>;
+   resets = <&rcc MCU_R>;
+   st,syscfg-holdboot = <&rcc 0x10C 0x1>;
+   st,syscfg-tz = <&rcc 0x000 0x1>;
+   status = "disabled";
+   };
+   };
 };
-- 
2.7.4

[PATCH v4 1/8] dt-bindings: stm32: add bindings for ML-AHB interconnect

2019-05-14 Thread Fabien Dessenne

Document the ML-AHB interconnect for stm32 SoCs.

Signed-off-by: Fabien Dessenne 
---
 .../devicetree/bindings/arm/stm32/mlahb.txt| 37 ++
 1 file changed, 37 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/stm32/mlahb.txt

diff --git a/Documentation/devicetree/bindings/arm/stm32/mlahb.txt 
b/Documentation/devicetree/bindings/arm/stm32/mlahb.txt
new file mode 100644
index 000..25307aa
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/stm32/mlahb.txt
@@ -0,0 +1,37 @@
+ML-AHB interconnect bindings
+
+These bindings describe the STM32 SoCs ML-AHB interconnect bus which connects
+a Cortex-M subsystem with dedicated memories.
+The MCU SRAM and RETRAM memory parts can be accessed through different 
addresses
+(see "RAM aliases" in [1]) using different buses (see [2]) : balancing the
+Cortex-M firmware accesses among those ports allows to tune the system
+performance.
+
+[1]: https://www.st.com/resource/en/reference_manual/dm00327659.pdf
+[2]: https://wiki.st.com/stm32mpu/wiki/STM32MP15_RAM_mapping
+
+Required properties:
+- compatible: should be "simple-bus"
+- dma-ranges: describes memory addresses translation between the local CPU and
+  the remote Cortex-M processor. Each memory region, is declared with
+  3 parameters:
+- param 1: device base address (Cortex-M processor address)
+- param 2: physical base address (local CPU address)
+- param 3: size of the memory region.
+
+The Cortex-M remote processor accessed via the mlahb interconnect is described
+by a child node.
+
+Example:
+mlahb {
+   compatible = "simple-bus";
+   #address-cells = <1>;
+   #size-cells = <1>;
+   dma-ranges = <0x 0x3800 0x1>,
+<0x1000 0x1000 0x6>,
+<0x3000 0x3000 0x6>;
+
+   m4_rproc: m4@1000 {
+   ...
+   };
+};
-- 
2.7.4

RE: [PATCH] vsprintf: Do not break early boot with probing addresses

2019-05-14 Thread David Laight

> And I like Steven's "(fault)" idea.
> How about this:
> 
>   if ptr < PAGE_SIZE  -> "(null)"
>   if IS_ERR_VALUE(ptr)-> "(fault)"
> 
>   -ss

Or:
if (ptr < PAGE_SIZE)
return ptr ? "(null+)" : "(null)";
if IS_ERR_VALUE(ptr)
return "(errno)"

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)

Re: [PATCH v2] serial: sh-sci: disable DMA for uart_console

2019-05-14 Thread Geert Uytterhoeven

Hi George,

On Mon, May 13, 2019 at 5:48 PM George G. Davis  wrote:
> As noted in commit 84b40e3b57ee ("serial: 8250: omap: Disable DMA for
> console UART"), UART console lines use low-level PIO only access functions
> which will conflict with use of the line when DMA is enabled, e.g. when
> the console line is also used for systemd messages. So disable DMA
> support for UART console lines.
>
> Fixes: https://patchwork.kernel.org/patch/10929511/

I don't think this is an appropriate reference, as it points to a patch that
was never applied.

As the problem has basically existed forever, IMHO no Fixes tag
is needed.

> Reported-by: Michael Rodin 
> Tested-by: Eugeniu Rosca 
> Reviewed-by: Simon Horman 
> Reviewed-by: Wolfram Sang 
> Cc: sta...@vger.kernel.org
> Signed-off-by: George G. Davis 
> ---
> v2: Clarify comment regarding DMA support on kernel console,
> add {Tested,Reviewed}-by:, and Cc: linux-stable lines.

Thanks for the update!

Reviewed-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH 2/2] powerpc/8xx: Add microcode patch to move SMC parameter RAM.

2019-05-14 Thread Christophe Leroy





Le 14/05/2019 à 08:56, Michael Ellerman a écrit :

Christophe Leroy  writes:


Some SCC functions like the QMC requires an extended parameter RAM.
On modern 8xx (ie 866 and 885), SPI area can already be relocated,
allowing the use of those functions on SCC2. But SCC3 and SCC4
parameter RAM collide with SMC1 and SMC2 parameter RAMs.

This patch adds microcode to allow the relocation of both SMC1 and
SMC2, and relocate them at offsets 0x1ec0 and 0x1fc0.
Those offsets are by default for the CPM1 DSP1 and DSP2, but there
is no kernel driver using them at the moment so this area can be
reused.

Signed-off-by: Christophe Leroy 
---
  arch/powerpc/platforms/8xx/Kconfig  |   7 ++
  arch/powerpc/platforms/8xx/micropatch.c | 109 +++-
  2 files changed, 114 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/8xx/micropatch.c 
b/arch/powerpc/platforms/8xx/micropatch.c
index 33a9042fca80..dc4423daf7d4 100644
--- a/arch/powerpc/platforms/8xx/micropatch.c
+++ b/arch/powerpc/platforms/8xx/micropatch.c
@@ -622,6 +622,86 @@ static uint patch_2f00[] __initdata = {
  };
  #endif
  
+/*

+ * SMC relocation patch arrays.
+ */
+
+#ifdef CONFIG_SMC_UCODE_PATCH
+
+static uint patch_2000[] __initdata = {
+   0x3fff, 0x3ffd, 0x3ffb, 0x3ff9,
+   0x5fefeff8, 0x5f91eff8, 0x3ff3, 0x3ff1,
+   0x3a11e710, 0xedf0ccb9, 0xf318ed66, 0x7f0e5fe2,


Do we have any doc on what these values are?


No we don't




I get that it's microcode but do we have any more detail than that?
What's the source etc?



There is an Engineering Bulletin (EB662) dated 2006 from Freescale which 
slightly describe things and there are associated S-Record files 
containing those values.


And an old related message in the mailing list 
https://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg46038.html


Christophe

Re: [PATCH] EDAC, mpc85xx: Prevent building as a module

2019-05-14 Thread Borislav Petkov

On Tue, May 14, 2019 at 04:50:49PM +1000, Michael Ellerman wrote:
> Looks good. I even booted it :)

Cool, thanks!

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH] mac80211: fix possible deadlock in TX path

2019-05-14 Thread Johannes Berg

On Sat, 2019-04-27 at 22:41 +0200, Erik Stromdahl wrote:
> This patch fixes a possible deadlock when updating the TX statistics
> (when calling into ieee80211_tx_stats()) from ieee80211_tx_dequeue().
> 
> ieee80211_tx_dequeue() might be called from process context.

I think this really is the problem.

> [] (ieee80211_xmit_fast_finish) from [] 
> (ieee80211_tx_dequeue+0x30c/0xb9c)
>  r10:d2f1a900 r9:d2d607a4 r8:d2cf20dc r7:d330b29c r6:d2cf2000 r5:d2c342ba
>  r4:d2899d3c
> [] (ieee80211_tx_dequeue) from [] 
> (ath10k_mac_tx_push_txq+0x78/0x2a4 [ath10k_core])
>  r10:d2d607cc r9:d2fe06a0 r8: r7:d2fe1e30 r6:d2fe1d38 r5:d2fe1540
>  r4:d2cf20dc
> [] (ath10k_mac_tx_push_txq [ath10k_core]) from [] 
> (ath10k_mac_tx_push_pending+0x1d4/0x2e0 [ath10k_core])
>  r10:d2cf20dc r9:bf0582b4 r8:bf0b1dba r7:0002 r6:c1429994 r5:
>  r4:d2fe06a0
> [] (ath10k_mac_tx_push_pending [ath10k_core]) from [] 
> (ath10k_sdio_irq_handler+0x30c/0x4d8 [ath10k_sdio])
>  r10:5b5a r9:d2fcc040 r8:00180201 r7:d2fe6540 r6:d2fe6a7c r5:
>  r4:d2fe1540

It seems to be entirely ath10k's fault, and quite possibly our
documentation, but we probably should have local_bh_disable() there
rather than try to do u64_stats_update_begin_irqsave() in some path that
really doesn't need it.

This is going to be whack-a-mole otherwise - the TX path in mac80211
really expects to not be interrupted by softirqs.

johannes

Re: [RFC KVM 18/27] kvm/isolation: function to copy page table entries for percpu buffer

2019-05-14 Thread Andy Lutomirski

> On May 14, 2019, at 1:25 AM, Alexandre Chartre  
> wrote:
> 
> 
>> On 5/14/19 9:09 AM, Peter Zijlstra wrote:
>>> On Mon, May 13, 2019 at 11:18:41AM -0700, Andy Lutomirski wrote:
>>> On Mon, May 13, 2019 at 7:39 AM Alexandre Chartre
>>>  wrote:

 pcpu_base_addr is already mapped to the KVM address space, but this
 represents the first percpu chunk. To access a per-cpu buffer not
 allocated in the first chunk, add a function which maps all cpu
 buffers corresponding to that per-cpu buffer.

 Also add function to clear page table entries for a percpu buffer.

>>> 
>>> This needs some kind of clarification so that readers can tell whether
>>> you're trying to map all percpu memory or just map a specific
>>> variable.  In either case, you're making a dubious assumption that
>>> percpu memory contains no secrets.
>> I'm thinking the per-cpu random pool is a secrit. IOW, it demonstrably
>> does contain secrits, invalidating that premise.
> 
> The current code unconditionally maps the entire first percpu chunk
> (pcpu_base_addr). So it assumes it doesn't contain any secret. That is
> mainly a simplification for the POC because a lot of core information
> that we need, for example just to switch mm, are stored there (like
> cpu_tlbstate, current_task...).

I don’t think you should need any of this.

> 
> If the entire first percpu chunk effectively has secret then we will
> need to individually map only buffers we need. The kvm_copy_percpu_mapping()
> function is added to copy mapping for a specified percpu buffer, so
> this used to map percpu buffers which are not in the first percpu chunk.
> 
> Also note that mapping is constrained by PTE (4K), so mapped buffers
> (percpu or not) which do not fill a whole set of pages can leak adjacent
> data store on the same pages.
> 
> 

I would take a different approach: figure out what you need and put it in its 
own dedicated area, kind of like cpu_entry_area.

One nasty issue you’ll have is vmalloc: the kernel stack is in the vmap range, 
and, if you allow access to vmap memory at all, you’ll need some way to ensure 
that *unmap* gets propagated. I suspect the right choice is to see if you can 
avoid using the kernel stack at all in isolated mode.  Maybe you could run on 
the IRQ stack instead.

Re: [RFC KVM 00/27] KVM Address Space Isolation

2019-05-14 Thread Alexandre Chartre




On 5/13/19 11:08 PM, Liran Alon wrote:




On 13 May 2019, at 21:17, Andy Lutomirski  wrote:


I expect that the KVM address space can eventually be expanded to include
the ioctl syscall entries. By doing so, and also adding the KVM page table
to the process userland page table (which should be safe to do because the
KVM address space doesn't have any secret), we could potentially handle the
KVM ioctl without having to switch to the kernel pagetable (thus effectively
eliminating KPTI for KVM). Then the only overhead would be if a VM-Exit has
to be handled using the full kernel address space.



In the hopefully common case where a VM exits and then gets re-entered
without needing to load full page tables, what code actually runs?
I'm trying to understand when the optimization of not switching is
actually useful.

Allowing ioctl() without switching to kernel tables sounds...
extremely complicated.  It also makes the dubious assumption that user
memory contains no secrets.


Let me attempt to clarify what we were thinking when creating this patch series:

1) It is never safe to execute one hyperthread inside guest while it’s sibling 
hyperthread runs in a virtual address space which contains secrets of host or 
other guests.
This is because we assume that using some speculative gadget (such as 
half-Spectrev2 gadget), it will be possible to populate *some* CPU core 
resource which could then be *somehow* leaked by the hyperthread running inside 
guest. In case of L1TF, this would be data populated to the L1D cache.

2) Because of (1), every time a hyperthread runs inside host kernel, we must 
make sure it’s sibling is not running inside guest. i.e. We must kick the 
sibling hyperthread outside of guest using IPI.

3) From (2), we should have theoretically deduced that for every #VMExit, there 
is a need to kick the sibling hyperthread also outside of guest until the 
#VMExit is completed. Such a patch series was implemented at some point but it 
had (obviously) significant performance hit.

4) The main goal of this patch series is to preserve (2), but to avoid the 
overhead specified in (3).

The way this patch series achieves (4) is by observing that during the run of a 
VM, most #VMExits can be handled rather quickly and locally inside KVM and 
doesn’t need to reference any data that is not relevant to this VM or KVM code. 
Therefore, if we will run these #VMExits in an isolated virtual address space 
(i.e. KVM isolated address space), there is no need to kick the sibling 
hyperthread from guest while these #VMExits handlers run.
The hope is that the very vast majority of #VMExit handlers will be able to 
completely run without requiring to switch to full address space. Therefore, 
avoiding the performance hit of (2).
However, for the very few #VMExits that does require to run in full kernel 
address space, we must first kick the sibling hyperthread outside of guest and 
only then switch to full kernel address space and only once all hyperthreads 
return to KVM address space, then allow then to enter into guest.

 From this reason, I think the above paragraph (that was added to my original 
cover letter) is incorrect.


Yes, I am wrong. The KVM page table can't be added to the process userland page
table because this can leak secrets from userland. I was only thinking about
performances to reduce the number of context switches. So just forget that
paragraph :-)

alex.



I believe that we should by design treat all exits to userspace VMM (e.g. QEMU) 
as slow-path that should not be optimised and therefore ok to switch address 
space (and therefore also kick sibling hyperthread). Similarly, all IOCTLs 
handlers are also slow-path and therefore it should be ok for them to also not 
run in KVM isolated address space.

-Liran

Re: [PATCH 2/2] powerpc/8xx: Add microcode patch to move SMC parameter RAM.

2019-05-14 Thread Christophe Leroy




Le 14/05/2019 à 10:31, Christophe Leroy a écrit :



Le 14/05/2019 à 08:56, Michael Ellerman a écrit :

Christophe Leroy  writes:


Some SCC functions like the QMC requires an extended parameter RAM.
On modern 8xx (ie 866 and 885), SPI area can already be relocated,
allowing the use of those functions on SCC2. But SCC3 and SCC4
parameter RAM collide with SMC1 and SMC2 parameter RAMs.

This patch adds microcode to allow the relocation of both SMC1 and
SMC2, and relocate them at offsets 0x1ec0 and 0x1fc0.
Those offsets are by default for the CPM1 DSP1 and DSP2, but there
is no kernel driver using them at the moment so this area can be
reused.

Signed-off-by: Christophe Leroy 
---
  arch/powerpc/platforms/8xx/Kconfig  |   7 ++
  arch/powerpc/platforms/8xx/micropatch.c | 109 
+++-

  2 files changed, 114 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/8xx/micropatch.c 
b/arch/powerpc/platforms/8xx/micropatch.c

index 33a9042fca80..dc4423daf7d4 100644
--- a/arch/powerpc/platforms/8xx/micropatch.c
+++ b/arch/powerpc/platforms/8xx/micropatch.c
@@ -622,6 +622,86 @@ static uint patch_2f00[] __initdata = {
  };
  #endif
+/*
+ * SMC relocation patch arrays.
+ */
+
+#ifdef CONFIG_SMC_UCODE_PATCH
+
+static uint patch_2000[] __initdata = {
+    0x3fff, 0x3ffd, 0x3ffb, 0x3ff9,
+    0x5fefeff8, 0x5f91eff8, 0x3ff3, 0x3ff1,
+    0x3a11e710, 0xedf0ccb9, 0xf318ed66, 0x7f0e5fe2,


Do we have any doc on what these values are?


No we don't




I get that it's microcode but do we have any more detail than that?
What's the source etc?



There is an Engineering Bulletin (EB662) dated 2006 from Freescale which 
slightly describe things and there are associated S-Record files 
containing those values.


Find attached the said doc and files. Not sure it will go through the 
list. Can't find it on NXP website thought, that must be too old.


Christophe



And an old related message in the mailing list 
https://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg46038.html


Christophe
<>

Re: [PATCH RESEND 1/2] soc: imx: Add SCU SoC info driver support

2019-05-14 Thread Daniel Baluta

On Tue, May 14, 2019 at 2:34 AM Anson Huang  wrote:
>
> Hi, Daniel
>
> > -Original Message-
> > From: Daniel Baluta [mailto:daniel.bal...@gmail.com]
> > Sent: Monday, May 13, 2019 10:30 PM
> > To: Anson Huang 
> > Cc: catalin.mari...@arm.com; will.dea...@arm.com;
> > shawn...@kernel.org; s.ha...@pengutronix.de; ker...@pengutronix.de;
> > feste...@gmail.com; maxime.rip...@bootlin.com; agr...@kernel.org;
> > o...@lixom.net; horms+rene...@verge.net.au;
> > ja...@amarulasolutions.com; bjorn.anders...@linaro.org; Leonard Crestez
> > ; marc.w.gonza...@free.fr;
> > dingu...@kernel.org; enric.balle...@collabora.com; Aisheng Dong
> > ; r...@kernel.org; Abel Vesa
> > ; l.st...@pengutronix.de; linux-arm-
> > ker...@lists.infradead.org; linux-kernel@vger.kernel.org; dl-linux-imx
> > ; Daniel Baluta 
> > Subject: Re: [PATCH RESEND 1/2] soc: imx: Add SCU SoC info driver support
> >
> > 
> >
> > > +
> > > +static u32 imx8qxp_soc_revision(void) {
> > > +   struct imx_sc_msg_misc_get_soc_id msg;
> > > +   struct imx_sc_rpc_msg *hdr = &msg.hdr;
> > > +   u32 rev = 0;
> > > +   int ret;
> > > +
> > > +   hdr->ver = IMX_SC_RPC_VERSION;
> > > +   hdr->svc = IMX_SC_RPC_SVC_MISC;
> > > +   hdr->func = IMX_SC_MISC_FUNC_GET_CONTROL;
> > > +   hdr->size = 3;
> > > +
> > > +   msg.data.send.control = IMX_SC_C_ID;
> > > +   msg.data.send.resource = IMX_SC_R_SYSTEM;
> > > +
> > > +   ret = imx_scu_call_rpc(soc_ipc_handle, &msg, true);
> > > +   if (ret) {
> > > +   dev_err(&imx_scu_soc_pdev->dev,
> > > +   "get soc info failed, ret %d\n", ret);
> > > +   return rev;
> >
> > So you return 0 (rev  = 0) here in case of error? This doesn't seem to be 
> > right.
> > Maybe return ret?
>
> This is intentional, similar with current i.MX8MQ soc info driver, when 
> getting revision
> failed, just return 0 as revision info and it will show "unknown" in sysfs.

Ok, I understand. Lets make this clear from the source code.

   ret = imx_scu_call_rpc(soc_ipc_handle, &msg, true);
+   if (ret) {
+   dev_err(&imx_scu_soc_pdev->dev,
+   "get soc info failed, ret %d\n", ret);
/* returning 0 means getting revision failed */
+   return 0;
+   }

Re: [PATCH] i2c: at91: handle TXRDY interrupt spam

2019-05-14 Thread Ludovic Desroches

On Sat, May 04, 2019 at 05:28:51AM +0530, Raag Jadav wrote:
> On Thu, May 02, 2019 at 04:01:16PM +0200, Ludovic Desroches wrote:
> > On Tue, Apr 30, 2019 at 04:03:32AM +0530, Raag Jadav wrote:
> > > External E-Mail
> > > 
> > > 
> > > On Mon, Apr 29, 2019 at 11:00:05AM +0200, Ludovic Desroches wrote:
> > > > Hello Raag,
> > > > 
> > > > On Tue, Apr 23, 2019 at 01:06:48PM +0530, Raag Jadav wrote:
> > > > > External E-Mail
> > > > > 
> > > > > 
> > > > > Performing i2c write operation while SDA or SCL line is held
> > > > > or grounded by slave device, we go into infinite 
> > > > > at91_twi_write_next_byte
> > > > > loop with TXRDY interrupt spam.
> > > > 
> > > > Sorry but I am not sure to have the full picture, the controller is in
> > > > slave or master mode?
> > > > 
> > > > SVREAD is only used in slave mode. When SVREAD is set, it means that a 
> > > > read
> > > > access is performed and your issue concerns the write operation.
> > > > 
> > > > Regards
> > > > 
> > > > Ludovic
> > > 
> > > Yes, even though the datasheet suggests that SVREAD is irrelevant in 
> > > master mode,
> > > TXRDY and SVREAD are the only ones being set in status register upon 
> > > reproducing the issue.
> > > Couldn't think of a better way to handle such strange behaviour.
> > > Any suggestions would be appreciated.
> > 
> > I have the confirmation that you can't rely on the SVREAD flag when in
> > master mode. This flag should always have the same value.
> > 
> > I am trying to understand what could lead to your situation. Can you
> > give me more details. What kind of device it is? What does lead to this
> > situation? Does it happen randomly or not?
> 
> One of the sama5d2 based board I worked on, was having trouble complete its 
> boot
> because of a faulty i2c device, which was randomly holding down the SDA line
> on i2c write operation, not allowing the controller to complete its 
> transmission,
> causing a massive TXRDY interrupt spam, ultimately hanging the processor.
> 
> Another strange observation was that SVREAD was being set in the status 
> register
> along with TXRDY, every time I reproduced the issue.
> You can reproduce it by simply grounding the SDA line and performing i2c write
> on the bus.

Thanks for the details, I'll discussed it with hw guys but expect some
dealy as I'll be off next 2 weeks.

Regards

Ludovic

> 
> Note that NACK, LOCK or TXCOMP are never set as the transmission never 
> completes.
> I'm not sure why slave bits are being set in master mode,
> but it's been working reliably for me.
> 
> This patch doesn't recover the SDA line. It just prevents the processor from
> getting hanged in case of i2c bus lockup.
> 
> Cheers,
> Raag
> 
> > 
> > Regards
> > 
> > Ludovic
> > 
> > > 
> > > Cheers,
> > > Raag
> > > 
> > > > 
> > > > > 
> > > > > Signed-off-by: Raag Jadav 
> > > > > ---
> > > > >  drivers/i2c/busses/i2c-at91.c | 6 +-
> > > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/i2c/busses/i2c-at91.c 
> > > > > b/drivers/i2c/busses/i2c-at91.c
> > > > > index 3f3e8b3..b2f5fdb 100644
> > > > > --- a/drivers/i2c/busses/i2c-at91.c
> > > > > +++ b/drivers/i2c/busses/i2c-at91.c
> > > > > @@ -72,6 +72,7 @@
> > > > >  #define  AT91_TWI_TXCOMP BIT(0)  /* Transmission 
> > > > > Complete */
> > > > >  #define  AT91_TWI_RXRDY  BIT(1)  /* Receive Holding 
> > > > > Register Ready */
> > > > >  #define  AT91_TWI_TXRDY  BIT(2)  /* Transmit Holding 
> > > > > Register Ready */
> > > > > +#define  AT91_TWI_SVREAD BIT(3)  /* Slave Read */
> > > > >  #define  AT91_TWI_OVRE   BIT(6)  /* Overrun Error */
> > > > >  #define  AT91_TWI_UNRE   BIT(7)  /* Underrun Error */
> > > > >  #define  AT91_TWI_NACK   BIT(8)  /* Not Acknowledged */
> > > > > @@ -571,7 +572,10 @@ static irqreturn_t atmel_twi_interrupt(int irq, 
> > > > > void *dev_id)
> > > > >   at91_disable_twi_interrupts(dev);
> > > > >   complete(&dev->cmd_complete);
> > > > >   } else if (irqstatus & AT91_TWI_TXRDY) {
> > > > > - at91_twi_write_next_byte(dev);
> > > > > + if ((status & AT91_TWI_SVREAD) && (dev->buf_len == 0))
> > > > > + at91_twi_write(dev, AT91_TWI_IDR, 
> > > > > AT91_TWI_TXRDY);
> > > > > + else
> > > > > + at91_twi_write_next_byte(dev);
> > > > >   }
> > > > >  
> > > > >   /* catch error flags */
> > > > > -- 
> > > > > 2.7.4
> > > > > 
> > > > > 
> > > 
> > > ___
> > > linux-arm-kernel mailing list
> > > linux-arm-ker...@lists.infradead.org
> > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> > >

Re: [PATCH RT v2] Fix a lockup in wait_for_completion() and friends

2019-05-14 Thread Peter Zijlstra

On Thu, May 09, 2019 at 06:19:25PM +0200, Sebastian Andrzej Siewior wrote:
> On 2019-05-08 15:57:28 [-0500], miny...@acm.org wrote:

> >  kernel/sched/completion.c | 8 
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
> > index 755a58084978..4f9b4cc0c95a 100644
> > --- a/kernel/sched/completion.c
> > +++ b/kernel/sched/completion.c
> > @@ -70,20 +70,20 @@ do_wait_for_common(struct completion *x,
> >long (*action)(long), long timeout, int state)
> >  {
> > if (!x->done) {
> > -   DECLARE_SWAITQUEUE(wait);
> > -
> > -   __prepare_to_swait(&x->wait, &wait);
> 
> you can keep DECLARE_SWAITQUEUE remove just __prepare_to_swait()
> 
> > do {
> > +   DECLARE_SWAITQUEUE(wait);
> > +
> > if (signal_pending_state(state, current)) {
> > timeout = -ERESTARTSYS;
> > break;
> > }
> > +   __prepare_to_swait(&x->wait, &wait);
> 
> add this, yes and you are done.
> 
> > __set_current_state(state);
> > raw_spin_unlock_irq(&x->wait.lock);
> > timeout = action(timeout);
> > raw_spin_lock_irq(&x->wait.lock);
> > +   __finish_swait(&x->wait, &wait);
> > } while (!x->done && timeout);
> > -   __finish_swait(&x->wait, &wait);
> > if (!x->done)
> > return timeout;
> > }

Now.. that will fix it, but I think it is also wrong.

The problem being that it violates FIFO, something that might be more
important on -RT than elsewhere.

The regular wait API seems confused/inconsistent when it uses
autoremove_wake_function and default_wake_function, which doesn't help,
but we can easily support this with swait -- the problematic thing is
the custom wake functions, we musn't do that.

(also, mingo went and renamed a whole bunch of wait_* crap and didn't do
the same to swait_ so now its named all different :/)

Something like the below perhaps.

---
diff --git a/include/linux/swait.h b/include/linux/swait.h
index 73e06e9986d4..f194437ae7d2 100644
--- a/include/linux/swait.h
+++ b/include/linux/swait.h
@@ -61,11 +61,13 @@ struct swait_queue_head {
 struct swait_queue {
struct task_struct  *task;
struct list_headtask_list;
+   unsigned intremove;
 };
 
 #define __SWAITQUEUE_INITIALIZER(name) {   \
.task   = current,  \
.task_list  = LIST_HEAD_INIT((name).task_list), \
+   .remove = 1,\
 }
 
 #define DECLARE_SWAITQUEUE(name)   \
diff --git a/kernel/sched/swait.c b/kernel/sched/swait.c
index e83a3f8449f6..86974ecbabfc 100644
--- a/kernel/sched/swait.c
+++ b/kernel/sched/swait.c
@@ -28,7 +28,8 @@ void swake_up_locked(struct swait_queue_head *q)
 
curr = list_first_entry(&q->task_list, typeof(*curr), task_list);
wake_up_process(curr->task);
-   list_del_init(&curr->task_list);
+   if (curr->remove)
+   list_del_init(&curr->task_list);
 }
 EXPORT_SYMBOL(swake_up_locked);
 
@@ -57,7 +58,8 @@ void swake_up_all(struct swait_queue_head *q)
curr = list_first_entry(&tmp, typeof(*curr), task_list);
 
wake_up_state(curr->task, TASK_NORMAL);
-   list_del_init(&curr->task_list);
+   if (curr->remove)
+   list_del_init(&curr->task_list);
 
if (list_empty(&tmp))
break;

RE: [PATCH RESEND 1/2] soc: imx: Add SCU SoC info driver support

2019-05-14 Thread Anson Huang



> -Original Message-
> From: Daniel Baluta [mailto:daniel.bal...@gmail.com]
> Sent: Tuesday, May 14, 2019 4:39 PM
> To: Anson Huang 
> Cc: catalin.mari...@arm.com; will.dea...@arm.com;
> shawn...@kernel.org; s.ha...@pengutronix.de; ker...@pengutronix.de;
> feste...@gmail.com; maxime.rip...@bootlin.com; agr...@kernel.org;
> o...@lixom.net; horms+rene...@verge.net.au;
> ja...@amarulasolutions.com; bjorn.anders...@linaro.org; Leonard Crestez
> ; marc.w.gonza...@free.fr;
> dingu...@kernel.org; enric.balle...@collabora.com; Aisheng Dong
> ; r...@kernel.org; Abel Vesa
> ; l.st...@pengutronix.de; linux-arm-
> ker...@lists.infradead.org; linux-kernel@vger.kernel.org; dl-linux-imx
> ; Daniel Baluta 
> Subject: Re: [PATCH RESEND 1/2] soc: imx: Add SCU SoC info driver support
> 
> On Tue, May 14, 2019 at 2:34 AM Anson Huang 
> wrote:
> >
> > Hi, Daniel
> >
> > > -Original Message-
> > > From: Daniel Baluta [mailto:daniel.bal...@gmail.com]
> > > Sent: Monday, May 13, 2019 10:30 PM
> > > To: Anson Huang 
> > > Cc: catalin.mari...@arm.com; will.dea...@arm.com;
> > > shawn...@kernel.org; s.ha...@pengutronix.de;
> ker...@pengutronix.de;
> > > feste...@gmail.com; maxime.rip...@bootlin.com; agr...@kernel.org;
> > > o...@lixom.net; horms+rene...@verge.net.au;
> > > ja...@amarulasolutions.com; bjorn.anders...@linaro.org; Leonard
> > > Crestez ; marc.w.gonza...@free.fr;
> > > dingu...@kernel.org; enric.balle...@collabora.com; Aisheng Dong
> > > ; r...@kernel.org; Abel Vesa
> > > ; l.st...@pengutronix.de; linux-arm-
> > > ker...@lists.infradead.org; linux-kernel@vger.kernel.org;
> > > dl-linux-imx ; Daniel Baluta
> > > 
> > > Subject: Re: [PATCH RESEND 1/2] soc: imx: Add SCU SoC info driver
> > > support
> > >
> > > 
> > >
> > > > +
> > > > +static u32 imx8qxp_soc_revision(void) {
> > > > +   struct imx_sc_msg_misc_get_soc_id msg;
> > > > +   struct imx_sc_rpc_msg *hdr = &msg.hdr;
> > > > +   u32 rev = 0;
> > > > +   int ret;
> > > > +
> > > > +   hdr->ver = IMX_SC_RPC_VERSION;
> > > > +   hdr->svc = IMX_SC_RPC_SVC_MISC;
> > > > +   hdr->func = IMX_SC_MISC_FUNC_GET_CONTROL;
> > > > +   hdr->size = 3;
> > > > +
> > > > +   msg.data.send.control = IMX_SC_C_ID;
> > > > +   msg.data.send.resource = IMX_SC_R_SYSTEM;
> > > > +
> > > > +   ret = imx_scu_call_rpc(soc_ipc_handle, &msg, true);
> > > > +   if (ret) {
> > > > +   dev_err(&imx_scu_soc_pdev->dev,
> > > > +   "get soc info failed, ret %d\n", ret);
> > > > +   return rev;
> > >
> > > So you return 0 (rev  = 0) here in case of error? This doesn't seem to be
> right.
> > > Maybe return ret?
> >
> > This is intentional, similar with current i.MX8MQ soc info driver,
> > when getting revision failed, just return 0 as revision info and it will 
> > show
> "unknown" in sysfs.
> 
> Ok, I understand. Lets make this clear from the source code.
> 
>ret = imx_scu_call_rpc(soc_ipc_handle, &msg, true);
> +   if (ret) {
> +   dev_err(&imx_scu_soc_pdev->dev,
> +   "get soc info failed, ret %d\n", ret);
> /* returning 0 means getting revision failed */
> +   return 0;
> +   }

OK, will add a comment in V2.

Anson.

[PATCH] dt-bindings: property-units: Sanitize unit naming

2019-05-14 Thread Geert Uytterhoeven

Make the naming of units consistent with common practices:
  - Do not capitalize the first character of units ("Celsius" is
special, as it is not the unit name, but a reference to its
proposer),
  - Do not use plural for units,
  - Do not abbreviate "ampere",
  - Concatenate prefixes and units (no spaces or hyphens),
  - Separate units by spaces not hyphens,
  - "milli" applies to "degree", not to "Celsius".

Signed-off-by: Geert Uytterhoeven 
---
 .../devicetree/bindings/property-units.txt| 34 +--
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/Documentation/devicetree/bindings/property-units.txt 
b/Documentation/devicetree/bindings/property-units.txt
index bfd33734facaba73..e9b8360b32880f12 100644
--- a/Documentation/devicetree/bindings/property-units.txt
+++ b/Documentation/devicetree/bindings/property-units.txt
@@ -12,32 +12,32 @@ unit prefixes.
 Time/Frequency
 
 -mhz   : megahertz
--hz: Hertz (preferred)
--sec   : seconds
--ms: milliseconds
--us: microseconds
--ns: nanoseconds
+-hz: hertz (preferred)
+-sec   : second
+-ms: millisecond
+-us: microsecond
+-ns: nanosecond
 
 Distance
 
--mm: millimeters
+-mm: millimeter
 
 Electricity
 
--microamp  : micro amps
--microamp-hours : micro amp-hours
--ohms  : Ohms
--micro-ohms: micro Ohms
--microwatt-hours: micro Watt-hours
--microvolt : micro volts
--picofarads: picofarads
--femtofarads   : femtofarads
+-microamp  : microampere
+-microamp-hours : microampere hour
+-ohms  : ohm
+-micro-ohms: microohm
+-microwatt-hours: microwatt hour
+-microvolt : microvolt
+-picofarads: picofarad
+-femtofarads   : femtofarad
 
 Temperature
 
--celsius   : Degrees Celsius
--millicelsius  : Degreee milli-Celsius
+-celsius   : degree Celsius
+-millicelsius  : millidegree Celsius
 
 Pressure
 
--kpascal   : kiloPascal
+-kpascal   : kilopascal
-- 
2.17.1

Re: [PATCH] i2c: at91: handle TXRDY interrupt spam

2019-05-14 Thread Ludovic Desroches

On Mon, May 06, 2019 at 10:19:01AM +0200, Eugen Hristev - M18282 wrote:
> 
> 
> On 04.05.2019 02:58, Raag Jadav wrote:
> 
> > On Thu, May 02, 2019 at 04:01:16PM +0200, Ludovic Desroches wrote:
> >> On Tue, Apr 30, 2019 at 04:03:32AM +0530, Raag Jadav wrote:
> >>> External E-Mail
> >>>
> >>>
> >>> On Mon, Apr 29, 2019 at 11:00:05AM +0200, Ludovic Desroches wrote:
>  Hello Raag,
> 
>  On Tue, Apr 23, 2019 at 01:06:48PM +0530, Raag Jadav wrote:
> > External E-Mail
> >
> >
> > Performing i2c write operation while SDA or SCL line is held
> > or grounded by slave device, we go into infinite 
> > at91_twi_write_next_byte
> > loop with TXRDY interrupt spam.
> 
>  Sorry but I am not sure to have the full picture, the controller is in
>  slave or master mode?
> 
>  SVREAD is only used in slave mode. When SVREAD is set, it means that a 
>  read
>  access is performed and your issue concerns the write operation.
> 
>  Regards
> 
>  Ludovic
> >>>
> >>> Yes, even though the datasheet suggests that SVREAD is irrelevant in 
> >>> master mode,
> >>> TXRDY and SVREAD are the only ones being set in status register upon 
> >>> reproducing the issue.
> >>> Couldn't think of a better way to handle such strange behaviour.
> >>> Any suggestions would be appreciated.
> >>
> >> I have the confirmation that you can't rely on the SVREAD flag when in
> >> master mode. This flag should always have the same value.
> >>
> >> I am trying to understand what could lead to your situation. Can you
> >> give me more details. What kind of device it is? What does lead to this
> >> situation? Does it happen randomly or not?
> > 
> > One of the sama5d2 based board I worked on, was having trouble complete its 
> > boot
> > because of a faulty i2c device, which was randomly holding down the SDA line
> > on i2c write operation, not allowing the controller to complete its 
> > transmission,
> > causing a massive TXRDY interrupt spam, ultimately hanging the processor.
> > 
> > Another strange observation was that SVREAD was being set in the status 
> > register
> > along with TXRDY, every time I reproduced the issue.
> > You can reproduce it by simply grounding the SDA line and performing i2c 
> > write
> > on the bus.
> > 
> > Note that NACK, LOCK or TXCOMP are never set as the transmission never 
> > completes.
> > I'm not sure why slave bits are being set in master mode,
> > but it's been working reliably for me.
> > 
> > This patch doesn't recover the SDA line. It just prevents the processor from
> > getting hanged in case of i2c bus lockup.
> 
> Hello,
> 
> I have noticed the same hanging at some points... In my case it is 
> because of this patch:
> 
> commit e8f39e9fc0e0b7bce24922da925af820bacb8ef8
> Author: David Engraf 
> Date:   Thu Apr 26 11:53:14 2018 +0200
> 

Good to know.

> 
> diff --git a/drivers/i2c/busses/i2c-at91.c b/drivers/i2c/busses/i2c-at91.c
> index bfd1fdf..3f3e8b3 100644
> --- a/drivers/i2c/busses/i2c-at91.c
> +++ b/drivers/i2c/busses/i2c-at91.c
> @@ -518,8 +518,16 @@ static irqreturn_t atmel_twi_interrupt(int irq, 
> void *dev_id)
>   * the RXRDY interrupt first in order to not keep garbage data 
> in the
>   * Receive Holding Register for the next transfer.
>   */
> -   if (irqstatus & AT91_TWI_RXRDY)
> -   at91_twi_read_next_byte(dev);
> +   if (irqstatus & AT91_TWI_RXRDY) {
> +   /*
> +* Read all available bytes at once by polling RXRDY 
> usable w/
> +* and w/o FIFO. With FIFO enabled we could also read 
> RXFL and
> +* avoid polling RXRDY.
> +*/
> +   do {
> +   at91_twi_read_next_byte(dev);
> +   } while (at91_twi_read(dev, AT91_TWI_SR) & AT91_TWI_RXRDY);
> +   }
> 
> 
> In my opinion having a do/while with an exit condition relying solely on 
> a bit read from hardware is unacceptable in IRQ context - kernel can 
> hang here.
> A timeout would be a solution...

You're right with a faulty hardware it can lead to disaster. As you
mentionned issues with this patch, the end of loop condition is not good
as it can stay true indefinitely.

For sure a timeout is a solution but its value can be controversial.
Maybe there is a better combination of flags to check in the status
register. I'll see this point too.

Regards

Ludovic

> 
> For me, reverting this patch solves hanging issues.
> 
> Hope this helps,
> 
> Eugen
> 
> > 
> > Cheers,
> > Raag
> > 
> >>
> >> Regards
> >>
> >> Ludovic
> >>
> >>>
> >>> Cheers,
> >>> Raag
> >>>
> 
> >
> > Signed-off-by: Raag Jadav 
> > ---
> >   drivers/i2c/busses/i2c-at91.c | 6 +-
> >   1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/i2c/busses/i2c-at91.c 
> > b/drivers/i2c/busses/i2c-at91.c
> > index 3f3e8b3..b2f5fdb 100644
> > --- a/drivers/i2c/busses/i2c-at91.c
>

Re: [PATCH v2] media/doc: Allow sizeimage to be set by v4l clients

2019-05-14 Thread Hans Verkuil

Hi Stanimir,

On 4/12/19 5:59 PM, Stanimir Varbanov wrote:
> This changes v4l2_pix_format and v4l2_plane_pix_format sizeimage
> field description to allow v4l clients to set bigger image size
> in case of variable length compressed data.

I've been reconsidering this change. The sizeimage value in the format
is the minimum size a buffer should have in order to store the data of
an image of the width and height as described in the format.

But there is nothing that prevents userspace from calling VIDIOC_CREATEBUFS
instead of VIDIOC_REQBUFS to allocate larger buffers.

So do we really need this change?

The more I think about this, the more uncomfortable I become with this change.

Regards,

Hans

> 
> Signed-off-by: Stanimir Varbanov 
> ---
>  Documentation/media/uapi/v4l/pixfmt-v4l2-mplane.rst | 13 -
>  Documentation/media/uapi/v4l/pixfmt-v4l2.rst| 11 ++-
>  2 files changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/media/uapi/v4l/pixfmt-v4l2-mplane.rst 
> b/Documentation/media/uapi/v4l/pixfmt-v4l2-mplane.rst
> index 5688c816e334..005428a8121e 100644
> --- a/Documentation/media/uapi/v4l/pixfmt-v4l2-mplane.rst
> +++ b/Documentation/media/uapi/v4l/pixfmt-v4l2-mplane.rst
> @@ -31,7 +31,18 @@ describing all planes of that format.
>  
>  * - __u32
>- ``sizeimage``
> -  - Maximum size in bytes required for image data in this plane.
> +  - Maximum size in bytes required for image data in this plane,
> + set by the driver. When the image consists of variable length
> + compressed data this is the number of bytes required by the
> + codec to support the worst-case compression scenario.
> +
> + For uncompressed images the driver will set the value. For
> + variable length compressed data clients are allowed to set
> + the sizeimage field, but the driver may ignore it and set the
> + value itself, or it may modify the provided value based on
> + alignment requirements or minimum/maximum size requirements.
> + If the client wants to leave this to the driver, then it should
> + set sizeimage to 0.
>  * - __u32
>- ``bytesperline``
>- Distance in bytes between the leftmost pixels in two adjacent
> diff --git a/Documentation/media/uapi/v4l/pixfmt-v4l2.rst 
> b/Documentation/media/uapi/v4l/pixfmt-v4l2.rst
> index 71eebfc6d853..0f7771151db9 100644
> --- a/Documentation/media/uapi/v4l/pixfmt-v4l2.rst
> +++ b/Documentation/media/uapi/v4l/pixfmt-v4l2.rst
> @@ -89,7 +89,16 @@ Single-planar format structure
>- Size in bytes of the buffer to hold a complete image, set by the
>   driver. Usually this is ``bytesperline`` times ``height``. When
>   the image consists of variable length compressed data this is the
> - maximum number of bytes required to hold an image.
> + number of bytes required by the codec to support the worst-case
> + compression scenario.
> +
> + For uncompressed images the driver will set the value. For
> + variable length compressed data clients are allowed to set
> + the sizeimage field, but the driver may ignore it and set the
> + value itself, or it may modify the provided value based on
> + alignment requirements or minimum/maximum size requirements.
> + If the client wants to leave this to the driver, then it should
> + set sizeimage to 0.
>  * - __u32
>- ``colorspace``
>- Image colorspace, from enum :c:type:`v4l2_colorspace`.
>

RE: [PATCH] tools pci: Do not delete pcitest.sh in 'make clean'

2019-05-14 Thread Gustavo Pimentel

On Mon, May 13, 2019 at 19:37:28, Arnaldo Carvalho de Melo 
 wrote:


Hi Arnaldo,

I think Kishon has already dispatched a patch fixing this issue.
Kishon, can you confirm it?

Regards,
Gustavo

> Hi,
> 
>   I have this in my local perf/core branch, lined up for 5.2,
> please let me know if you're ok with it.
> 
> - Arnaldo
> 
> commit 4dfe8f59156382b7695fe5c10bddd5c97c84289a
> Author: Arnaldo Carvalho de Melo 
> Date:   Mon May 13 13:53:20 2019 -0400
> 
> tools pci: Do not delete pcitest.sh in 'make clean'
> 
> When running 'make -C tools clean' I noticed that a revision controlled
> file was being deleted:
> 
>   $ git diff
>   diff --git a/tools/pci/pcitest.sh b/tools/pci/pcitest.sh
>   deleted file mode 100644
>   index 75ed48ff2990..
>   --- a/tools/pci/pcitest.sh
>   +++ /dev/null
>   @@ -1,72 +0,0 @@
>   -#!/bin/sh
>   -# SPDX-License-Identifier: GPL-2.0
>   -
>   -echo "BAR tests"
>   -echo
>   
> 
> So I changed the make variables to fix that, testing it should produce
> the same intended result while not deleting revision controlled files.
> 
>   $ make O=/tmp/build/pci -C tools/pci install
>   make: Entering directory '/home/acme/git/perf/tools/pci'
>   make -f /home/acme/git/perf/tools/build/Makefile.build dir=. obj=pcitest
>   install -d -m 755 /usr/bin;   \
>   for program in /tmp/build/pci/pcitest pcitest.sh; do  \
> install $program /usr/bin;  \
>   done
>   install: cannot change permissions of ‘/usr/bin’: Operation not 
> permitted
>   install: cannot create regular file '/usr/bin/pcitest': Permission 
> denied
>   install: cannot create regular file '/usr/bin/pcitest.sh': Permission 
> denied
>   make: *** [Makefile:46: install] Error 1
>   make: Leaving directory '/home/acme/git/perf/tools/pci'
>   $ ls -la /tmp/build/pci/pcitest
>   -rwxrwxr-x. 1 acme acme 27152 May 13 13:52 /tmp/build/pci/pcitest
>   $ /tmp/build/pci/pcitest
>   can't open PCI Endpoint Test device: No such file or directory
>   $
> 
> Cc: Adrian Hunter 
> Cc: Gustavo Pimentel 
> Cc: Jiri Olsa 
> Cc: Kishon Vijay Abraham I 
> Cc: Lorenzo Pieralisi 
> Cc: Namhyung Kim 
> Fixes: 1ce78ce09430 ("tools: PCI: Change pcitest compiling process")
> Link: 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lkml.kernel.org_n_tip-2D9re6bd7eh9epi3koslkv3ocn-40git.kernel.org&d=DwIDaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=bkWxpLoW-f-E3EdiDCCa0_h0PicsViasSlvIpzZvPxs&m=aZjR5R91AJR402FRDDnYzXZezTHphZCiNh-9G8BO6vE&s=VzkT0XYvB37sVZ8-lOxWN7LsG6IvDTCRjCTUfertsRs&e=
> Signed-off-by: Arnaldo Carvalho de Melo 
> 
> diff --git a/tools/pci/Makefile b/tools/pci/Makefile
> index 46e4c2f318c9..f64da817bc03 100644
> --- a/tools/pci/Makefile
> +++ b/tools/pci/Makefile
> @@ -14,7 +14,7 @@ MAKEFLAGS += -r
>  
>  CFLAGS += -O2 -Wall -g -D_GNU_SOURCE -I$(OUTPUT)include
>  
> -ALL_TARGETS := pcitest pcitest.sh
> +ALL_TARGETS := pcitest
>  ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS))
>  
>  all: $(ALL_PROGRAMS)
> @@ -44,7 +44,7 @@ clean:
>  
>  install: $(ALL_PROGRAMS)
>   install -d -m 755 $(DESTDIR)$(bindir);  \
> - for program in $(ALL_PROGRAMS); do  \
> + for program in $(ALL_PROGRAMS) pcitest.sh; do   \
>   install $$program $(DESTDIR)$(bindir);  \
>   done
>

Re: [PATCH] mm: mmu_gather: remove __tlb_reset_range() for force flush

2019-05-14 Thread Mel Gorman

On Mon, May 13, 2019 at 05:06:03PM +, Nadav Amit wrote:
> > On May 13, 2019, at 9:37 AM, Will Deacon  wrote:
> > 
> > On Mon, May 13, 2019 at 09:11:38AM +, Nadav Amit wrote:
> >>> On May 13, 2019, at 1:36 AM, Peter Zijlstra  wrote:
> >>> 
> >>> On Thu, May 09, 2019 at 09:21:35PM +, Nadav Amit wrote:
> >>> 
> >>> And we can fix that by having tlb_finish_mmu() sync up. Never let a
> >>> concurrent tlb_finish_mmu() complete until all concurrenct mmu_gathers
> >>> have completed.
> >>> 
> >>> This should not be too hard to make happen.
> >> 
> >> This synchronization sounds much more expensive than what I proposed. 
> >> But I
> >> agree that cache-lines that move from one CPU to another might become 
> >> an
> >> issue. But I think that the scheme I suggested would minimize this 
> >> overhead.
> > 
> > Well, it would have a lot more unconditional atomic ops. My scheme only
> > waits when there is actual concurrency.
>  
>  Well, something has to give. I didn???t think that if the same core does 
>  the
>  atomic op it would be too expensive.
> >>> 
> >>> They're still at least 20 cycles a pop, uncontended.
> >>> 
> > I _think_ something like the below ought to work, but its not even been
> > near a compiler. The only problem is the unconditional wakeup; we can
> > play games to avoid that if we want to continue with this.
> > 
> > Ideally we'd only do this when there's been actual overlap, but I've not
> > found a sensible way to detect that.
> > 
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index 4ef4bbe78a1d..b70e35792d29 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -590,7 +590,12 @@ static inline void dec_tlb_flush_pending(struct 
> > mm_struct *mm)
> >  *
> >  * Therefore we must rely on tlb_flush_*() to guarantee order.
> >  */
> > -   atomic_dec(&mm->tlb_flush_pending);
> > +   if (atomic_dec_and_test(&mm->tlb_flush_pending)) {
> > +   wake_up_var(&mm->tlb_flush_pending);
> > +   } else {
> > +   wait_event_var(&mm->tlb_flush_pending,
> > +  
> > !atomic_read_acquire(&mm->tlb_flush_pending));
> > +   }
> > }
>  
>  It still seems very expensive to me, at least for certain workloads 
>  (e.g.,
>  Apache with multithreaded MPM).
> >>> 
> >>> Is that Apache-MPM workload triggering this lots? Having a known
> >>> benchmark for this stuff is good for when someone has time to play with
> >>> things.
> >> 
> >> Setting Apache2 with mpm_worker causes every request to go through
> >> mmap-writev-munmap flow on every thread. I didn???t run this workload after
> >> the patches that downgrade the mmap_sem to read before the page-table
> >> zapping were introduced. I presume these patches would allow the page-table
> >> zapping to be done concurrently, and therefore would hit this flow.
> > 
> > Hmm, I don't think so: munmap() still has to take the semaphore for write
> > initially, so it will be serialised against other munmap() threads even
> > after they've downgraded afaict.
> > 
> > The initial bug report was about concurrent madvise() vs munmap().
> 
> I guess you are right (and I???m wrong).
> 
> Short search suggests that ebizzy might be affected (a thread by Mel
> Gorman): https://lkml.org/lkml/2015/2/2/493
> 

Glibc has since been fixed to be less munmap/mmap intensive and the
system CPU usage of ebizzy is generally negligible unless configured so
specifically use mmap/munmap instead of malloc/free which is unrealistic
for good application behaviour.

-- 
Mel Gorman
SUSE Labs

Re: [PATCH 00/18] ARM/ARM64: Support hierarchical CPU arrangement for PSCI

2019-05-14 Thread Ulf Hansson

On Tue, 14 May 2019 at 10:08, Rafael J. Wysocki  wrote:
>
> On Mon, May 13, 2019 at 9:23 PM Ulf Hansson  wrote:
> >
> > This series enables support for hierarchical CPU arrangement, managed by 
> > PSCI
> > for ARM/ARM64. It's based on using the generic PM domain (genpd), which
> > recently was extended to manage devices belonging to CPUs.
>
> ACK for the patches touching cpuidle in this series (from the
> framework perspective), but I'm assuming it to be taken care of by
> ARM/ARM64 maintainers.

Thanks for the ack! Yes, this is for PSCI/ARM maintainers.

BTW, apologize for sending this in the merge window, but wanted to
take the opportunity for people to have a look before OSPM Pisa next
week.

Kind regards
Uffe

[PATCH V3 1/4] mm/hotplug: Reorder arch_remove_memory() call in __remove_memory()

2019-05-14 Thread Anshuman Khandual

Memory hot remove uses get_nid_for_pfn() while tearing down linked sysfs
entries between memory block and node. It first checks pfn validity with
pfn_valid_within() before fetching nid. With CONFIG_HOLES_IN_ZONE config
(arm64 has this enabled) pfn_valid_within() calls pfn_valid().

pfn_valid() is an arch implementation on arm64 (CONFIG_HAVE_ARCH_PFN_VALID)
which scans all mapped memblock regions with memblock_is_map_memory(). This
creates a problem in memory hot remove path which has already removed given
memory range from memory block with memblock_[remove|free] before arriving
at unregister_mem_sect_under_nodes(). Hence get_nid_for_pfn() returns -1
skipping subsequent sysfs_remove_link() calls leaving node <-> memory block
sysfs entries as is. Subsequent memory add operation hits BUG_ON() because
of existing sysfs entries.

[   62.007176] NUMA: Unknown node for memory at 0x68000, assuming node 0
[   62.052517] [ cut here ]
[   62.053211] kernel BUG at mm/memory_hotplug.c:1143!
[   62.053868] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[   62.054589] Modules linked in:
[   62.054999] CPU: 19 PID: 3275 Comm: bash Not tainted 
5.1.0-rc2-4-g28cea40b2683 #41
[   62.056274] Hardware name: linux,dummy-virt (DT)
[   62.057166] pstate: 4045 (nZcv daif +PAN -UAO)
[   62.058083] pc : add_memory_resource+0x1cc/0x1d8
[   62.058961] lr : add_memory_resource+0x10c/0x1d8
[   62.059842] sp : 168b3ce0
[   62.060477] x29: 168b3ce0 x28: 8005db546c00
[   62.061501] x27:  x26: 
[   62.062509] x25: 111ef000 x24: 111ef5d0
[   62.063520] x23:  x22: 0006bfff
[   62.064540] x21: ffef x20: 006c
[   62.065558] x19: 0068 x18: 0024
[   62.066566] x17:  x16: 
[   62.067579] x15:  x14: 8005e412e890
[   62.068588] x13: 8005d6b105d8 x12: 
[   62.069610] x11: 8005d6b10490 x10: 0040
[   62.070615] x9 : 8005e412e898 x8 : 8005e412e890
[   62.071631] x7 : 8005d6b105d8 x6 : 8005db546c00
[   62.072640] x5 : 0001 x4 : 0002
[   62.073654] x3 : 8005d7049480 x2 : 0002
[   62.074666] x1 : 0003 x0 : ffef
[   62.075685] Process bash (pid: 3275, stack limit = 0xd754280f)
[   62.076930] Call trace:
[   62.077411]  add_memory_resource+0x1cc/0x1d8
[   62.078227]  __add_memory+0x70/0xa8
[   62.078901]  probe_store+0xa4/0xc8
[   62.079561]  dev_attr_store+0x18/0x28
[   62.080270]  sysfs_kf_write+0x40/0x58
[   62.080992]  kernfs_fop_write+0xcc/0x1d8
[   62.081744]  __vfs_write+0x18/0x40
[   62.082400]  vfs_write+0xa4/0x1b0
[   62.083037]  ksys_write+0x5c/0xc0
[   62.083681]  __arm64_sys_write+0x18/0x20
[   62.084432]  el0_svc_handler+0x88/0x100
[   62.085177]  el0_svc+0x8/0xc

Re-ordering arch_remove_memory() with memblock_[free|remove] solves the
problem on arm64 as pfn_valid() behaves correctly and returns positive
as memblock for the address range still exists. arch_remove_memory()
removes applicable memory sections from zone with __remove_pages() and
tears down kernel linear mapping. Removing memblock regions afterwards
is safe because there is no other memblock (bootmem) allocator user that
late. So nobody is going to allocate from the removed range just to blow
up later. Also nobody should be using the bootmem allocated range else
we wouldn't allow to remove it. So reordering is indeed safe.

Acked-by: Michal Hocko 
Reviewed-by: David Hildenbrand 
Reviewed-by: Oscar Salvador 
Signed-off-by: Anshuman Khandual 
---
 mm/memory_hotplug.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0082d69..71d0d79 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1872,11 +1872,10 @@ void __ref __remove_memory(int nid, u64 start, u64 size)
 
/* remove memmap entry */
firmware_map_remove(start, start + size, "System RAM");
+   arch_remove_memory(nid, start, size, NULL);
memblock_free(start, size);
memblock_remove(start, size);
 
-   arch_remove_memory(nid, start, size, NULL);
-
try_offline_node(nid);
 
mem_hotplug_done();
-- 
2.7.4

[PATCH V3 3/4] arm64/mm: Inhibit huge-vmap with ptdump

2019-05-14 Thread Anshuman Khandual

From: Mark Rutland 

The arm64 ptdump code can race with concurrent modification of the
kernel page tables. At the time this was added, this was sound as:

* Modifications to leaf entries could result in stale information being
  logged, but would not result in a functional problem.

* Boot time modifications to non-leaf entries (e.g. freeing of initmem)
  were performed when the ptdump code cannot be invoked.

* At runtime, modifications to non-leaf entries only occurred in the
  vmalloc region, and these were strictly additive, as intermediate
  entries were never freed.

However, since commit:

  commit 324420bf91f6 ("arm64: add support for ioremap() block mappings")

... it has been possible to create huge mappings in the vmalloc area at
runtime, and as part of this existing intermediate levels of table my be
removed and freed.

It's possible for the ptdump code to race with this, and continue to
walk tables which have been freed (and potentially poisoned or
reallocated). As a result of this, the ptdump code may dereference bogus
addresses, which could be fatal.

Since huge-vmap is a TLB and memory optimization, we can disable it when
the runtime ptdump code is in use to avoid this problem.

Fixes: 324420bf91f60582 ("arm64: add support for ioremap() block mappings")
Signed-off-by: Mark Rutland 
Signed-off-by: Anshuman Khandual 
Cc: Ard Biesheuvel 
Cc: Catalin Marinas 
Cc: Will Deacon 
---
 arch/arm64/mm/mmu.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index ef82312..37a902c 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -955,13 +955,18 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys)
 
 int __init arch_ioremap_pud_supported(void)
 {
-   /* only 4k granule supports level 1 block mappings */
-   return IS_ENABLED(CONFIG_ARM64_4K_PAGES);
+   /*
+* Only 4k granule supports level 1 block mappings.
+* SW table walks can't handle removal of intermediate entries.
+*/
+   return IS_ENABLED(CONFIG_ARM64_4K_PAGES) &&
+  !IS_ENABLED(CONFIG_ARM64_PTDUMP_DEBUGFS);
 }
 
 int __init arch_ioremap_pmd_supported(void)
 {
-   return 1;
+   /* See arch_ioremap_pud_supported() */
+   return !IS_ENABLED(CONFIG_ARM64_PTDUMP_DEBUGFS);
 }
 
 int pud_set_huge(pud_t *pudp, phys_addr_t phys, pgprot_t prot)
-- 
2.7.4

[PATCH V3 2/4] arm64/mm: Hold memory hotplug lock while walking for kernel page table dump

2019-05-14 Thread Anshuman Khandual

The arm64 pagetable dump code can race with concurrent modification of the
kernel page tables. When a leaf entries are modified concurrently, the dump
code may log stale or inconsistent information for a VA range, but this is
otherwise not harmful.

When intermediate levels of table are freed, the dump code will continue to
use memory which has been freed and potentially reallocated for another
purpose. In such cases, the dump code may dereference bogus addressses,
leading to a number of potential problems.

Intermediate levels of table may by freed during memory hot-remove, or when
installing a huge mapping in the vmalloc region. To avoid racing with these
cases, take the memory hotplug lock when walking the kernel page table.

Signed-off-by: Anshuman Khandual 
---
 arch/arm64/mm/ptdump_debugfs.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/mm/ptdump_debugfs.c b/arch/arm64/mm/ptdump_debugfs.c
index 064163f..80171d1 100644
--- a/arch/arm64/mm/ptdump_debugfs.c
+++ b/arch/arm64/mm/ptdump_debugfs.c
@@ -7,7 +7,10 @@
 static int ptdump_show(struct seq_file *m, void *v)
 {
struct ptdump_info *info = m->private;
+
+   get_online_mems();
ptdump_walk_pgd(m, info);
+   put_online_mems();
return 0;
 }
 DEFINE_SHOW_ATTRIBUTE(ptdump);
-- 
2.7.4

[PATCH V3 0/4] arm64/mm: Enable memory hot remove

2019-05-14 Thread Anshuman Khandual

This series enables memory hot remove on arm64 after fixing a memblock
removal ordering problem in generic __remove_memory() and kernel page
table race conditions on arm64. This is based on the following arm64
working tree.

git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core

David had pointed out that the following patch is already in next/master
(58b11e136dcc14358) and will conflict with the last patch here. Will fix
the conflict once this series gets reviewed and agreed upon.

Author: David Hildenbrand 
Date:   Wed Apr 10 11:02:27 2019 +1000

mm/memory_hotplug: make __remove_pages() and arch_remove_memory() never fail

All callers of arch_remove_memory() ignore errors.  And we should really
try to remove any errors from the memory removal path.  No more errors are
reported from __remove_pages().  BUG() in s390x code in case
arch_remove_memory() is triggered.  We may implement that properly later.
WARN in case powerpc code failed to remove the section mapping, which is
better than ignoring the error completely right now.

Testing:

Tested memory hot remove on arm64 for 4K, 16K, 64K page config options with
all possible CONFIG_ARM64_VA_BITS and CONFIG_PGTABLE_LEVELS combinations. But
build tested on non arm64 platforms.

Changes in V3:
 
- Implemented most of the suggestions from Mark Rutland for remove_pagetable()
- Fixed applicable PGTABLE_LEVEL wrappers around pgtable page freeing functions
- Replaced 'direct' with 'sparse_vmap' in remove_pagetable() with inverted 
polarity
- Changed pointer names ('p' at end) and removed tmp from iterations
- Perform intermediate TLB invalidation while clearing pgtable entries
- Dropped flush_tlb_kernel_range() in remove_pagetable()
- Added flush_tlb_kernel_range() in remove_pte_table() instead
- Renamed page freeing functions for pgtable page and mapped pages
- Used page range size instead of order while freeing mapped or pgtable pages
- Removed all PageReserved() handling while freeing mapped or pgtable pages
- Replaced XXX_index() with XXX_offset() while walking the kernel page table
- Used READ_ONCE() while fetching individual pgtable entries
- Taken overall init_mm.page_table_lock instead of just while changing an entry
- Dropped previously added [pmd|pud]_index() which are not required anymore

- Added a new patch to protect kernel page table race condtion for ptdump
- Added a new patch from Mark Rutland to prevent huge-vmap with ptdump

Changes in V2: (https://lkml.org/lkml/2019/4/14/5)

- Added all received review and ack tags
- Split the series from ZONE_DEVICE enablement for better review
- Moved memblock re-order patch to the front as per Robin Murphy
- Updated commit message on memblock re-order patch per Michal Hocko
- Dropped [pmd|pud]_large() definitions
- Used existing [pmd|pud]_sect() instead of earlier [pmd|pud]_large()
- Removed __meminit and __ref tags as per Oscar Salvador
- Dropped unnecessary 'ret' init in arch_add_memory() per Robin Murphy
- Skipped calling into pgtable_page_dtor() for linear mapping page table
  pages and updated all relevant functions

Changes in V1: (https://lkml.org/lkml/2019/4/3/28)

Anshuman Khandual (3):
  mm/hotplug: Reorder arch_remove_memory() call in __remove_memory()
  arm64/mm: Hold memory hotplug lock while walking for kernel page table dump
  arm64/mm: Enable memory hot remove

Mark Rutland (1):
  arm64/mm: Inhibit huge-vmap with ptdump

 arch/arm64/Kconfig |   3 +
 arch/arm64/mm/mmu.c| 215 -
 arch/arm64/mm/ptdump_debugfs.c |   3 +
 mm/memory_hotplug.c|   3 +-
 4 files changed, 217 insertions(+), 7 deletions(-)

-- 
2.7.4

[PATCH V3 4/4] arm64/mm: Enable memory hot remove

2019-05-14 Thread Anshuman Khandual

Memory removal from an arch perspective involves tearing down two different
kernel based mappings i.e vmemmap and linear while releasing related page
table and any mapped pages allocated for given physical memory range to be
removed.

Define a common kernel page table tear down helper remove_pagetable() which
can be used to unmap given kernel virtual address range. In effect it can
tear down both vmemap or kernel linear mappings. This new helper is called
from both vmemamp_free() and ___remove_pgd_mapping() during memory removal.

For linear mapping there are no actual allocated pages which are mapped to
create the translation. Any pfn on a given entry is derived from physical
address (__va(PA) --> PA) whose linear translation is to be created. They
need not be freed as they were never allocated in the first place. But for
vmemmap which is a real virtual mapping (like vmalloc) physical pages are
allocated either from buddy or memblock which get mapped in the kernel page
table. These allocated and mapped pages need to be freed during translation
tear down. But page table pages need to be freed in both these cases.

These mappings need to be differentiated while deciding if a mapped page at
any level i.e [pte|pmd|pud]_page() should be freed or not. Callers for the
mapping tear down process should pass on 'sparse_vmap' variable identifying
kernel vmemmap mappings.

While here update arch_add_mempory() to handle __add_pages() failures by
just unmapping recently added kernel linear mapping. Now enable memory hot
remove on arm64 platforms by default with ARCH_ENABLE_MEMORY_HOTREMOVE.

This implementation is overall inspired from kernel page table tear down
procedure on X86 architecture.

Signed-off-by: Anshuman Khandual 
---
 arch/arm64/Kconfig  |   3 +
 arch/arm64/mm/mmu.c | 204 +++-
 2 files changed, 205 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1c0cb51..bb4e571 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -268,6 +268,9 @@ config HAVE_GENERIC_GUP
 config ARCH_ENABLE_MEMORY_HOTPLUG
def_bool y
 
+config ARCH_ENABLE_MEMORY_HOTREMOVE
+   def_bool y
+
 config SMP
def_bool y
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 37a902c..bd2d003 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -733,6 +733,177 @@ int kern_addr_valid(unsigned long addr)
 
return pfn_valid(pte_pfn(pte));
 }
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+static void free_hotplug_page_range(struct page *page, ssize_t size)
+{
+   WARN_ON(PageReserved(page));
+   free_pages((unsigned long)page_address(page), get_order(size));
+}
+
+static void free_hotplug_pgtable_page(struct page *page)
+{
+   free_hotplug_page_range(page, PAGE_SIZE);
+}
+
+static void free_pte_table(pte_t *ptep, pmd_t *pmdp, unsigned long addr)
+{
+   struct page *page;
+   int i;
+
+   for (i = 0; i < PTRS_PER_PTE; i++) {
+   if (!pte_none(ptep[i]))
+   return;
+   }
+
+   page = pmd_page(*pmdp);
+   pmd_clear(pmdp);
+   __flush_tlb_kernel_pgtable(addr);
+   free_hotplug_pgtable_page(page);
+}
+
+#if (CONFIG_PGTABLE_LEVELS > 2)
+static void free_pmd_table(pmd_t *pmdp, pud_t *pudp, unsigned long addr)
+{
+   struct page *page;
+   int i;
+
+   for (i = 0; i < PTRS_PER_PMD; i++) {
+   if (!pmd_none(pmdp[i]))
+   return;
+   }
+
+   page = pud_page(*pudp);
+   pud_clear(pudp);
+   __flush_tlb_kernel_pgtable(addr);
+   free_hotplug_pgtable_page(page);
+}
+#else
+static void free_pmd_table(pmd_t *pmdp, pud_t *pudp, unsigned long addr) { }
+#endif
+
+#if (CONFIG_PGTABLE_LEVELS > 3)
+static void free_pud_table(pud_t *pudp, pgd_t *pgdp, unsigned long addr)
+{
+   struct page *page;
+   int i;
+
+   for (i = 0; i < PTRS_PER_PUD; i++) {
+   if (!pud_none(pudp[i]))
+   return;
+   }
+
+   page = pgd_page(*pgdp);
+   pgd_clear(pgdp);
+   __flush_tlb_kernel_pgtable(addr);
+   free_hotplug_pgtable_page(page);
+}
+#else
+static void free_pud_table(pud_t *pudp, pgd_t *pgdp, unsigned long addr) { }
+#endif
+
+static void
+remove_pte_table(pmd_t *pmdp, unsigned long addr,
+   unsigned long end, bool sparse_vmap)
+{
+   struct page *page;
+   pte_t *ptep;
+   unsigned long start = addr;
+
+   for (; addr < end; addr += PAGE_SIZE) {
+   ptep = pte_offset_kernel(pmdp, addr);
+   if (!pte_present(*ptep))
+   continue;
+
+   if (sparse_vmap) {
+   page = pte_page(READ_ONCE(*ptep));
+   free_hotplug_page_range(page, PAGE_SIZE);
+   }
+   pte_clear(&init_mm, addr, ptep);
+   }
+   flush_tlb_kernel_range(start, end);
+}
+
+static void
+remove_pmd_table(pud_t *pudp, unsigned long addr,
+

Re: [PATCH] vsprintf: Do not break early boot with probing addresses

2019-05-14 Thread Geert Uytterhoeven

On Tue, May 14, 2019 at 10:29 AM David Laight  wrote:
> > And I like Steven's "(fault)" idea.
> > How about this:
> >
> >   if ptr < PAGE_SIZE  -> "(null)"
> >   if IS_ERR_VALUE(ptr)-> "(fault)"
> >
> >   -ss
>
> Or:
> if (ptr < PAGE_SIZE)
> return ptr ? "(null+)" : "(null)";
> if IS_ERR_VALUE(ptr)
> return "(errno)"

Do we care about the value? "(-E%u)"?

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

[PATCH 4/4] powerpc/64: reuse PPC32 static inline flush_dcache_range()

2019-05-14 Thread Christophe Leroy

This patch drops the assembly PPC64 version of flush_dcache_range()
and re-uses the PPC32 static inline version.

With GCC 8.1, the following code is generated:

void flush_test(unsigned long start, unsigned long stop)
{
flush_dcache_range(start, stop);
}

0130 <.flush_test>:
 130:   3d 22 00 00 addis   r9,r2,0
132: R_PPC64_TOC16_HA   .data+0x8
 134:   81 09 00 00 lwz r8,0(r9)
136: R_PPC64_TOC16_LO   .data+0x8
 138:   3d 22 00 00 addis   r9,r2,0
13a: R_PPC64_TOC16_HA   .data+0xc
 13c:   80 e9 00 00 lwz r7,0(r9)
13e: R_PPC64_TOC16_LO   .data+0xc
 140:   7d 48 00 d0 neg r10,r8
 144:   7d 43 18 38 and r3,r10,r3
 148:   7c 00 04 ac hwsync
 14c:   4c 00 01 2c isync
 150:   39 28 ff ff addir9,r8,-1
 154:   7c 89 22 14 add r4,r9,r4
 158:   7c 83 20 50 subfr4,r3,r4
 15c:   7c 89 3c 37 srd.r9,r4,r7
 160:   41 82 00 1c beq 17c <.flush_test+0x4c>
 164:   7d 29 03 a6 mtctr   r9
 168:   60 00 00 00 nop
 16c:   60 00 00 00 nop
 170:   7c 00 18 ac dcbf0,r3
 174:   7c 63 42 14 add r3,r3,r8
 178:   42 00 ff f8 bdnz170 <.flush_test+0x40>
 17c:   7c 00 04 ac hwsync
 180:   4c 00 01 2c isync
 184:   4e 80 00 20 blr
 188:   60 00 00 00 nop
 18c:   60 00 00 00 nop

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/cache.h  | 10 ++
 arch/powerpc/include/asm/cacheflush.h | 14 --
 arch/powerpc/kernel/misc_64.S | 29 -
 3 files changed, 18 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h
index 0009a0a82e86..45e3137ccd71 100644
--- a/arch/powerpc/include/asm/cache.h
+++ b/arch/powerpc/include/asm/cache.h
@@ -54,6 +54,16 @@ struct ppc64_caches {
 };
 
 extern struct ppc64_caches ppc64_caches;
+
+static inline u32 l1_cache_shift(void)
+{
+   return ppc64_caches.l1d.log_block_size;
+}
+
+static inline u32 l1_cache_bytes(void)
+{
+   return ppc64_caches.l1d.block_size;
+}
 #else
 static inline u32 l1_cache_shift(void)
 {
diff --git a/arch/powerpc/include/asm/cacheflush.h 
b/arch/powerpc/include/asm/cacheflush.h
index d405f18441cd..3cd7ce3dec8b 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -57,7 +57,6 @@ static inline void __flush_dcache_icache_phys(unsigned long 
physaddr)
 }
 #endif
 
-#ifdef CONFIG_PPC32
 /*
  * Write any modified data cache blocks out to memory and invalidate them.
  * Does not invalidate the corresponding instruction cache blocks.
@@ -70,9 +69,17 @@ static inline void flush_dcache_range(unsigned long start, 
unsigned long stop)
unsigned long size = stop - (unsigned long)addr + (bytes - 1);
unsigned long i;
 
+   if (IS_ENABLED(CONFIG_PPC64)) {
+   mb();   /* sync */
+   isync();
+   }
+
for (i = 0; i < size >> shift; i++, addr += bytes)
dcbf(addr);
mb();   /* sync */
+
+   if (IS_ENABLED(CONFIG_PPC64))
+   isync();
 }
 
 /*
@@ -112,11 +119,6 @@ static inline void invalidate_dcache_range(unsigned long 
start,
mb();   /* sync */
 }
 
-#endif /* CONFIG_PPC32 */
-#ifdef CONFIG_PPC64
-extern void flush_dcache_range(unsigned long start, unsigned long stop);
-#endif
-
 #define copy_to_user_page(vma, page, vaddr, dst, src, len) \
do { \
memcpy(dst, src, len); \
diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
index a4fd536efb44..1b0a42c50ef1 100644
--- a/arch/powerpc/kernel/misc_64.S
+++ b/arch/powerpc/kernel/misc_64.S
@@ -115,35 +115,6 @@ _ASM_NOKPROBE_SYMBOL(flush_icache_range)
 EXPORT_SYMBOL(flush_icache_range)
 
 /*
- * Like above, but only do the D-cache.
- *
- * flush_dcache_range(unsigned long start, unsigned long stop)
- *
- *flush all bytes from start to stop-1 inclusive
- */
-
-_GLOBAL_TOC(flush_dcache_range)
-   ld  r10,PPC64_CACHES@toc(r2)
-   lwz r7,DCACHEL1BLOCKSIZE(r10)   /* Get dcache block size */
-   addir5,r7,-1
-   andcr6,r3,r5/* round low to line bdy */
-   subfr8,r6,r4/* compute length */
-   add r8,r8,r5/* ensure we get enough */
-   lwz r9,DCACHEL1LOGBLOCKSIZE(r10)/* Get log-2 of dcache block size */
-   srw.r8,r8,r9/* compute line count */
-   beqlr   /* nothing to do? */
-   sync
-   isync
-   mtctr   r8
-0: dcbf0,r6
-   add r6,r6,r7
-   bdnz0b
-   sync
-   isync
-   blr
-EXPORT_SYMBOL(flush_dcache_range)
-
-/*
  * Flush a particular page from the data cache to RAM.
  * Note: this is necessary because the instruction cache does *not*
  * snoop from the data cache.
-- 
2.13.3

[PATCH 1/4] powerpc/64: flush_inval_dcache_range() becomes flush_dcache_range()

2019-05-14 Thread Christophe Leroy

On most arches having function flush_dcache_range(), including PPC32,
this function does a writeback and invalidation of the cache bloc.

On PPC64, flush_dcache_range() only does a writeback while
flush_inval_dcache_range() does the invalidation in addition.

In addition it looks like within arch/powerpc/, there are no PPC64
platforms using flush_dcache_range()

This patch drops the existing 64 bits version of flush_dcache_range()
and renames flush_inval_dcache_range() into flush_dcache_range().

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/cacheflush.h |  1 -
 arch/powerpc/kernel/misc_64.S | 27 ++-
 arch/powerpc/lib/pmem.c   |  8 
 arch/powerpc/mm/mem.c |  4 ++--
 arch/powerpc/sysdev/dart_iommu.c  |  2 +-
 drivers/macintosh/smu.c   |  4 ++--
 6 files changed, 11 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/include/asm/cacheflush.h 
b/arch/powerpc/include/asm/cacheflush.h
index d5a8d7bf0759..e9a40b110f1d 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -109,7 +109,6 @@ static inline void invalidate_dcache_range(unsigned long 
start,
 #endif /* CONFIG_PPC32 */
 #ifdef CONFIG_PPC64
 extern void flush_dcache_range(unsigned long start, unsigned long stop);
-extern void flush_inval_dcache_range(unsigned long start, unsigned long stop);
 #endif
 
 #define copy_to_user_page(vma, page, vaddr, dst, src, len) \
diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
index 262ba9481781..a4fd536efb44 100644
--- a/arch/powerpc/kernel/misc_64.S
+++ b/arch/powerpc/kernel/misc_64.S
@@ -121,31 +121,8 @@ EXPORT_SYMBOL(flush_icache_range)
  *
  *flush all bytes from start to stop-1 inclusive
  */
-_GLOBAL_TOC(flush_dcache_range)
 
-/*
- * Flush the data cache to memory 
- * 
- * Different systems have different cache line sizes
- */
-   ld  r10,PPC64_CACHES@toc(r2)
-   lwz r7,DCACHEL1BLOCKSIZE(r10)   /* Get dcache block size */
-   addir5,r7,-1
-   andcr6,r3,r5/* round low to line bdy */
-   subfr8,r6,r4/* compute length */
-   add r8,r8,r5/* ensure we get enough */
-   lwz r9,DCACHEL1LOGBLOCKSIZE(r10)/* Get log-2 of dcache block 
size */
-   srw.r8,r8,r9/* compute line count */
-   beqlr   /* nothing to do? */
-   mtctr   r8
-0: dcbst   0,r6
-   add r6,r6,r7
-   bdnz0b
-   sync
-   blr
-EXPORT_SYMBOL(flush_dcache_range)
-
-_GLOBAL(flush_inval_dcache_range)
+_GLOBAL_TOC(flush_dcache_range)
ld  r10,PPC64_CACHES@toc(r2)
lwz r7,DCACHEL1BLOCKSIZE(r10)   /* Get dcache block size */
addir5,r7,-1
@@ -164,7 +141,7 @@ _GLOBAL(flush_inval_dcache_range)
sync
isync
blr
-
+EXPORT_SYMBOL(flush_dcache_range)
 
 /*
  * Flush a particular page from the data cache to RAM.
diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
index 53c018762e1c..36e08bf850e0 100644
--- a/arch/powerpc/lib/pmem.c
+++ b/arch/powerpc/lib/pmem.c
@@ -23,14 +23,14 @@
 void arch_wb_cache_pmem(void *addr, size_t size)
 {
unsigned long start = (unsigned long) addr;
-   flush_inval_dcache_range(start, start + size);
+   flush_dcache_range(start, start + size);
 }
 EXPORT_SYMBOL(arch_wb_cache_pmem);
 
 void arch_invalidate_pmem(void *addr, size_t size)
 {
unsigned long start = (unsigned long) addr;
-   flush_inval_dcache_range(start, start + size);
+   flush_dcache_range(start, start + size);
 }
 EXPORT_SYMBOL(arch_invalidate_pmem);
 
@@ -43,7 +43,7 @@ long __copy_from_user_flushcache(void *dest, const void 
__user *src,
unsigned long copied, start = (unsigned long) dest;
 
copied = __copy_from_user(dest, src, size);
-   flush_inval_dcache_range(start, start + size);
+   flush_dcache_range(start, start + size);
 
return copied;
 }
@@ -53,7 +53,7 @@ void *memcpy_flushcache(void *dest, const void *src, size_t 
size)
unsigned long start = (unsigned long) dest;
 
memcpy(dest, src, size);
-   flush_inval_dcache_range(start, start + size);
+   flush_dcache_range(start, start + size);
 
return dest;
 }
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index cd525d709072..39e66f033995 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -125,7 +125,7 @@ int __ref arch_add_memory(int nid, u64 start, u64 size, 
struct vmem_altmap *altm
start, start + size, rc);
return -EFAULT;
}
-   flush_inval_dcache_range(start, start + size);
+   flush_dcache_range(start, start + size);
 
return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
 }
@@ -153,7 +153,7 @@ int __ref arch_remove_memory(int nid, u64 start, u64 size,
 
/* Remove htab bolted m

[PATCH 2/4] powerpc/32: activate ARCH_HAS_PMEM_API and ARCH_HAS_UACCESS_FLUSHCACHE

2019-05-14 Thread Christophe Leroy

PPC32 also have flush_dcache_range() so it can also support
ARCH_HAS_PMEM_API and ARCH_HAS_UACCESS_FLUSHCACHE without changes.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index d7996cfaceca..cf6e30f637be 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -127,13 +127,13 @@ config PPC
select ARCH_HAS_KCOV
select ARCH_HAS_MMIOWB  if PPC64
select ARCH_HAS_PHYS_TO_DMA
-   select ARCH_HAS_PMEM_APIif PPC64
+   select ARCH_HAS_PMEM_API
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_MEMBARRIER_CALLBACKS
select ARCH_HAS_SCALED_CPUTIME  if VIRT_CPU_ACCOUNTING_NATIVE 
&& PPC64
select ARCH_HAS_STRICT_KERNEL_RWX   if ((PPC_BOOK3S_64 || PPC32) && 
!RELOCATABLE && !HIBERNATION)
select ARCH_HAS_TICK_BROADCAST  if GENERIC_CLOCKEVENTS_BROADCAST
-   select ARCH_HAS_UACCESS_FLUSHCACHE  if PPC64
+   select ARCH_HAS_UACCESS_FLUSHCACHE
select ARCH_HAS_UBSAN_SANITIZE_ALL
select ARCH_HAS_ZONE_DEVICE if PPC_BOOK3S_64
select ARCH_HAVE_NMI_SAFE_CMPXCHG
-- 
2.13.3

[PATCH 3/4] powerpc/32: define helpers to get L1 cache sizes.

2019-05-14 Thread Christophe Leroy

This patch defines C helpers to retrieve the size of
cache blocks and uses them in the cacheflush functions.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/cache.h  | 16 ++--
 arch/powerpc/include/asm/cacheflush.h | 24 +++-
 2 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h
index 40ea5b3781c6..0009a0a82e86 100644
--- a/arch/powerpc/include/asm/cache.h
+++ b/arch/powerpc/include/asm/cache.h
@@ -33,7 +33,8 @@
 
 #define IFETCH_ALIGN_BYTES (1 << IFETCH_ALIGN_SHIFT)
 
-#if defined(__powerpc64__) && !defined(__ASSEMBLY__)
+#if !defined(__ASSEMBLY__)
+#ifdef CONFIG_PPC64
 
 struct ppc_cache_info {
u32 size;
@@ -53,7 +54,18 @@ struct ppc64_caches {
 };
 
 extern struct ppc64_caches ppc64_caches;
-#endif /* __powerpc64__ && ! __ASSEMBLY__ */
+#else
+static inline u32 l1_cache_shift(void)
+{
+   return L1_CACHE_SHIFT;
+}
+
+static inline u32 l1_cache_bytes(void)
+{
+   return L1_CACHE_BYTES;
+}
+#endif
+#endif /* ! __ASSEMBLY__ */
 
 #if defined(__ASSEMBLY__)
 /*
diff --git a/arch/powerpc/include/asm/cacheflush.h 
b/arch/powerpc/include/asm/cacheflush.h
index e9a40b110f1d..d405f18441cd 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -64,11 +64,13 @@ static inline void __flush_dcache_icache_phys(unsigned long 
physaddr)
  */
 static inline void flush_dcache_range(unsigned long start, unsigned long stop)
 {
-   void *addr = (void *)(start & ~(L1_CACHE_BYTES - 1));
-   unsigned long size = stop - (unsigned long)addr + (L1_CACHE_BYTES - 1);
+   unsigned long shift = l1_cache_shift();
+   unsigned long bytes = l1_cache_bytes();
+   void *addr = (void *)(start & ~(bytes - 1));
+   unsigned long size = stop - (unsigned long)addr + (bytes - 1);
unsigned long i;
 
-   for (i = 0; i < size >> L1_CACHE_SHIFT; i++, addr += L1_CACHE_BYTES)
+   for (i = 0; i < size >> shift; i++, addr += bytes)
dcbf(addr);
mb();   /* sync */
 }
@@ -80,11 +82,13 @@ static inline void flush_dcache_range(unsigned long start, 
unsigned long stop)
  */
 static inline void clean_dcache_range(unsigned long start, unsigned long stop)
 {
-   void *addr = (void *)(start & ~(L1_CACHE_BYTES - 1));
-   unsigned long size = stop - (unsigned long)addr + (L1_CACHE_BYTES - 1);
+   unsigned long shift = l1_cache_shift();
+   unsigned long bytes = l1_cache_bytes();
+   void *addr = (void *)(start & ~(bytes - 1));
+   unsigned long size = stop - (unsigned long)addr + (bytes - 1);
unsigned long i;
 
-   for (i = 0; i < size >> L1_CACHE_SHIFT; i++, addr += L1_CACHE_BYTES)
+   for (i = 0; i < size >> shift; i++, addr += bytes)
dcbst(addr);
mb();   /* sync */
 }
@@ -97,11 +101,13 @@ static inline void clean_dcache_range(unsigned long start, 
unsigned long stop)
 static inline void invalidate_dcache_range(unsigned long start,
   unsigned long stop)
 {
-   void *addr = (void *)(start & ~(L1_CACHE_BYTES - 1));
-   unsigned long size = stop - (unsigned long)addr + (L1_CACHE_BYTES - 1);
+   unsigned long shift = l1_cache_shift();
+   unsigned long bytes = l1_cache_bytes();
+   void *addr = (void *)(start & ~(bytes - 1));
+   unsigned long size = stop - (unsigned long)addr + (bytes - 1);
unsigned long i;
 
-   for (i = 0; i < size >> L1_CACHE_SHIFT; i++, addr += L1_CACHE_BYTES)
+   for (i = 0; i < size >> shift; i++, addr += bytes)
dcbi(addr);
mb();   /* sync */
 }
-- 
2.13.3

Re: [PATCH V3 1/4] mm/hotplug: Reorder arch_remove_memory() call in __remove_memory()

2019-05-14 Thread David Hildenbrand

On 14.05.19 11:00, Anshuman Khandual wrote:
> Memory hot remove uses get_nid_for_pfn() while tearing down linked sysfs
> entries between memory block and node. It first checks pfn validity with
> pfn_valid_within() before fetching nid. With CONFIG_HOLES_IN_ZONE config
> (arm64 has this enabled) pfn_valid_within() calls pfn_valid().
> 
> pfn_valid() is an arch implementation on arm64 (CONFIG_HAVE_ARCH_PFN_VALID)
> which scans all mapped memblock regions with memblock_is_map_memory(). This
> creates a problem in memory hot remove path which has already removed given
> memory range from memory block with memblock_[remove|free] before arriving
> at unregister_mem_sect_under_nodes(). Hence get_nid_for_pfn() returns -1
> skipping subsequent sysfs_remove_link() calls leaving node <-> memory block
> sysfs entries as is. Subsequent memory add operation hits BUG_ON() because
> of existing sysfs entries.
> 
> [   62.007176] NUMA: Unknown node for memory at 0x68000, assuming node 0
> [   62.052517] [ cut here ]
> [   62.053211] kernel BUG at mm/memory_hotplug.c:1143!
> [   62.053868] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> [   62.054589] Modules linked in:
> [   62.054999] CPU: 19 PID: 3275 Comm: bash Not tainted 
> 5.1.0-rc2-4-g28cea40b2683 #41
> [   62.056274] Hardware name: linux,dummy-virt (DT)
> [   62.057166] pstate: 4045 (nZcv daif +PAN -UAO)
> [   62.058083] pc : add_memory_resource+0x1cc/0x1d8
> [   62.058961] lr : add_memory_resource+0x10c/0x1d8
> [   62.059842] sp : 168b3ce0
> [   62.060477] x29: 168b3ce0 x28: 8005db546c00
> [   62.061501] x27:  x26: 
> [   62.062509] x25: 111ef000 x24: 111ef5d0
> [   62.063520] x23:  x22: 0006bfff
> [   62.064540] x21: ffef x20: 006c
> [   62.065558] x19: 0068 x18: 0024
> [   62.066566] x17:  x16: 
> [   62.067579] x15:  x14: 8005e412e890
> [   62.068588] x13: 8005d6b105d8 x12: 
> [   62.069610] x11: 8005d6b10490 x10: 0040
> [   62.070615] x9 : 8005e412e898 x8 : 8005e412e890
> [   62.071631] x7 : 8005d6b105d8 x6 : 8005db546c00
> [   62.072640] x5 : 0001 x4 : 0002
> [   62.073654] x3 : 8005d7049480 x2 : 0002
> [   62.074666] x1 : 0003 x0 : ffef
> [   62.075685] Process bash (pid: 3275, stack limit = 0xd754280f)
> [   62.076930] Call trace:
> [   62.077411]  add_memory_resource+0x1cc/0x1d8
> [   62.078227]  __add_memory+0x70/0xa8
> [   62.078901]  probe_store+0xa4/0xc8
> [   62.079561]  dev_attr_store+0x18/0x28
> [   62.080270]  sysfs_kf_write+0x40/0x58
> [   62.080992]  kernfs_fop_write+0xcc/0x1d8
> [   62.081744]  __vfs_write+0x18/0x40
> [   62.082400]  vfs_write+0xa4/0x1b0
> [   62.083037]  ksys_write+0x5c/0xc0
> [   62.083681]  __arm64_sys_write+0x18/0x20
> [   62.084432]  el0_svc_handler+0x88/0x100
> [   62.085177]  el0_svc+0x8/0xc
> 
> Re-ordering arch_remove_memory() with memblock_[free|remove] solves the
> problem on arm64 as pfn_valid() behaves correctly and returns positive
> as memblock for the address range still exists. arch_remove_memory()
> removes applicable memory sections from zone with __remove_pages() and
> tears down kernel linear mapping. Removing memblock regions afterwards
> is safe because there is no other memblock (bootmem) allocator user that
> late. So nobody is going to allocate from the removed range just to blow
> up later. Also nobody should be using the bootmem allocated range else
> we wouldn't allow to remove it. So reordering is indeed safe.
> 
> Acked-by: Michal Hocko 
> Reviewed-by: David Hildenbrand 
> Reviewed-by: Oscar Salvador 
> Signed-off-by: Anshuman Khandual 
> ---
>  mm/memory_hotplug.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 0082d69..71d0d79 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1872,11 +1872,10 @@ void __ref __remove_memory(int nid, u64 start, u64 
> size)
>  
>   /* remove memmap entry */
>   firmware_map_remove(start, start + size, "System RAM");
> + arch_remove_memory(nid, start, size, NULL);
>   memblock_free(start, size);
>   memblock_remove(start, size);
>  
> - arch_remove_memory(nid, start, size, NULL);
> -
>   try_offline_node(nid);
>  
>   mem_hotplug_done();
> 

I think you have to rebase this patch to -next (and soon to linus master).

-- 

Thanks,

David / dhildenb

Re: [PATCH V3 4/4] arm64/mm: Enable memory hot remove

2019-05-14 Thread David Hildenbrand

On 14.05.19 11:00, Anshuman Khandual wrote:
> Memory removal from an arch perspective involves tearing down two different
> kernel based mappings i.e vmemmap and linear while releasing related page
> table and any mapped pages allocated for given physical memory range to be
> removed.
> 
> Define a common kernel page table tear down helper remove_pagetable() which
> can be used to unmap given kernel virtual address range. In effect it can
> tear down both vmemap or kernel linear mappings. This new helper is called
> from both vmemamp_free() and ___remove_pgd_mapping() during memory removal.
> 
> For linear mapping there are no actual allocated pages which are mapped to
> create the translation. Any pfn on a given entry is derived from physical
> address (__va(PA) --> PA) whose linear translation is to be created. They
> need not be freed as they were never allocated in the first place. But for
> vmemmap which is a real virtual mapping (like vmalloc) physical pages are
> allocated either from buddy or memblock which get mapped in the kernel page
> table. These allocated and mapped pages need to be freed during translation
> tear down. But page table pages need to be freed in both these cases.
> 
> These mappings need to be differentiated while deciding if a mapped page at
> any level i.e [pte|pmd|pud]_page() should be freed or not. Callers for the
> mapping tear down process should pass on 'sparse_vmap' variable identifying
> kernel vmemmap mappings.
> 
> While here update arch_add_mempory() to handle __add_pages() failures by
> just unmapping recently added kernel linear mapping. Now enable memory hot
> remove on arm64 platforms by default with ARCH_ENABLE_MEMORY_HOTREMOVE.
> 
> This implementation is overall inspired from kernel page table tear down
> procedure on X86 architecture.
> 
> Signed-off-by: Anshuman Khandual 
> ---
>  arch/arm64/Kconfig  |   3 +
>  arch/arm64/mm/mmu.c | 204 
> +++-
>  2 files changed, 205 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 1c0cb51..bb4e571 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -268,6 +268,9 @@ config HAVE_GENERIC_GUP
>  config ARCH_ENABLE_MEMORY_HOTPLUG
>   def_bool y
>  
> +config ARCH_ENABLE_MEMORY_HOTREMOVE
> + def_bool y
> +
>  config SMP
>   def_bool y
>  
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 37a902c..bd2d003 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -733,6 +733,177 @@ int kern_addr_valid(unsigned long addr)
>  
>   return pfn_valid(pte_pfn(pte));
>  }
> +
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +static void free_hotplug_page_range(struct page *page, ssize_t size)
> +{
> + WARN_ON(PageReserved(page));
> + free_pages((unsigned long)page_address(page), get_order(size));
> +}
> +
> +static void free_hotplug_pgtable_page(struct page *page)
> +{
> + free_hotplug_page_range(page, PAGE_SIZE);
> +}
> +
> +static void free_pte_table(pte_t *ptep, pmd_t *pmdp, unsigned long addr)
> +{
> + struct page *page;
> + int i;
> +
> + for (i = 0; i < PTRS_PER_PTE; i++) {
> + if (!pte_none(ptep[i]))
> + return;
> + }
> +
> + page = pmd_page(*pmdp);
> + pmd_clear(pmdp);
> + __flush_tlb_kernel_pgtable(addr);
> + free_hotplug_pgtable_page(page);
> +}
> +
> +#if (CONFIG_PGTABLE_LEVELS > 2)
> +static void free_pmd_table(pmd_t *pmdp, pud_t *pudp, unsigned long addr)
> +{
> + struct page *page;
> + int i;
> +
> + for (i = 0; i < PTRS_PER_PMD; i++) {
> + if (!pmd_none(pmdp[i]))
> + return;
> + }
> +
> + page = pud_page(*pudp);
> + pud_clear(pudp);
> + __flush_tlb_kernel_pgtable(addr);
> + free_hotplug_pgtable_page(page);
> +}
> +#else
> +static void free_pmd_table(pmd_t *pmdp, pud_t *pudp, unsigned long addr) { }
> +#endif
> +
> +#if (CONFIG_PGTABLE_LEVELS > 3)
> +static void free_pud_table(pud_t *pudp, pgd_t *pgdp, unsigned long addr)
> +{
> + struct page *page;
> + int i;
> +
> + for (i = 0; i < PTRS_PER_PUD; i++) {
> + if (!pud_none(pudp[i]))
> + return;
> + }
> +
> + page = pgd_page(*pgdp);
> + pgd_clear(pgdp);
> + __flush_tlb_kernel_pgtable(addr);
> + free_hotplug_pgtable_page(page);
> +}
> +#else
> +static void free_pud_table(pud_t *pudp, pgd_t *pgdp, unsigned long addr) { }
> +#endif
> +
> +static void
> +remove_pte_table(pmd_t *pmdp, unsigned long addr,
> + unsigned long end, bool sparse_vmap)
> +{
> + struct page *page;
> + pte_t *ptep;
> + unsigned long start = addr;
> +
> + for (; addr < end; addr += PAGE_SIZE) {
> + ptep = pte_offset_kernel(pmdp, addr);
> + if (!pte_present(*ptep))
> + continue;
> +
> + if (sparse_vmap) {
> + page = pte_page(READ_ONCE(*ptep));
> +

Re: [PATCH V3 0/4] arm64/mm: Enable memory hot remove

2019-05-14 Thread David Hildenbrand

On 14.05.19 11:00, Anshuman Khandual wrote:
> This series enables memory hot remove on arm64 after fixing a memblock
> removal ordering problem in generic __remove_memory() and kernel page
> table race conditions on arm64. This is based on the following arm64
> working tree.
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
> 
> David had pointed out that the following patch is already in next/master
> (58b11e136dcc14358) and will conflict with the last patch here. Will fix
> the conflict once this series gets reviewed and agreed upon.

I should read the cover letter first, so ignore my comments :)

> 
> Author: David Hildenbrand 
> Date:   Wed Apr 10 11:02:27 2019 +1000
> 
> mm/memory_hotplug: make __remove_pages() and arch_remove_memory() never 
> fail
> 
> All callers of arch_remove_memory() ignore errors.  And we should really
> try to remove any errors from the memory removal path.  No more errors are
> reported from __remove_pages().  BUG() in s390x code in case
> arch_remove_memory() is triggered.  We may implement that properly later.
> WARN in case powerpc code failed to remove the section mapping, which is
> better than ignoring the error completely right now.
> 
> Testing:
> 
> Tested memory hot remove on arm64 for 4K, 16K, 64K page config options with
> all possible CONFIG_ARM64_VA_BITS and CONFIG_PGTABLE_LEVELS combinations. But
> build tested on non arm64 platforms.
> 
> Changes in V3:
>  
> - Implemented most of the suggestions from Mark Rutland for remove_pagetable()
> - Fixed applicable PGTABLE_LEVEL wrappers around pgtable page freeing 
> functions
> - Replaced 'direct' with 'sparse_vmap' in remove_pagetable() with inverted 
> polarity
> - Changed pointer names ('p' at end) and removed tmp from iterations
> - Perform intermediate TLB invalidation while clearing pgtable entries
> - Dropped flush_tlb_kernel_range() in remove_pagetable()
> - Added flush_tlb_kernel_range() in remove_pte_table() instead
> - Renamed page freeing functions for pgtable page and mapped pages
> - Used page range size instead of order while freeing mapped or pgtable pages
> - Removed all PageReserved() handling while freeing mapped or pgtable pages
> - Replaced XXX_index() with XXX_offset() while walking the kernel page table
> - Used READ_ONCE() while fetching individual pgtable entries
> - Taken overall init_mm.page_table_lock instead of just while changing an 
> entry
> - Dropped previously added [pmd|pud]_index() which are not required anymore
> 
> - Added a new patch to protect kernel page table race condtion for ptdump
> - Added a new patch from Mark Rutland to prevent huge-vmap with ptdump
> 
> Changes in V2: (https://lkml.org/lkml/2019/4/14/5)
> 
> - Added all received review and ack tags
> - Split the series from ZONE_DEVICE enablement for better review
> - Moved memblock re-order patch to the front as per Robin Murphy
> - Updated commit message on memblock re-order patch per Michal Hocko
> - Dropped [pmd|pud]_large() definitions
> - Used existing [pmd|pud]_sect() instead of earlier [pmd|pud]_large()
> - Removed __meminit and __ref tags as per Oscar Salvador
> - Dropped unnecessary 'ret' init in arch_add_memory() per Robin Murphy
> - Skipped calling into pgtable_page_dtor() for linear mapping page table
>   pages and updated all relevant functions
> 
> Changes in V1: (https://lkml.org/lkml/2019/4/3/28)
> 
> Anshuman Khandual (3):
>   mm/hotplug: Reorder arch_remove_memory() call in __remove_memory()
>   arm64/mm: Hold memory hotplug lock while walking for kernel page table dump
>   arm64/mm: Enable memory hot remove
> 
> Mark Rutland (1):
>   arm64/mm: Inhibit huge-vmap with ptdump
> 
>  arch/arm64/Kconfig |   3 +
>  arch/arm64/mm/mmu.c| 215 
> -
>  arch/arm64/mm/ptdump_debugfs.c |   3 +
>  mm/memory_hotplug.c|   3 +-
>  4 files changed, 217 insertions(+), 7 deletions(-)
> 


-- 

Thanks,

David / dhildenb

Re: regulator: BD71837: possible regression

2019-05-14 Thread Mark Brown

On Tue, May 14, 2019 at 06:14:41AM +, Vaittinen, Matti wrote:

> I am not sure but perhaps the regulator core is changed so that this
> parent/child relation must be modelled using -supply properties in
> device-tree. Are you able to bisect the change which breaks this? There
> may be other regulator drivers doing the same as bd718x7 is (which
> means trusiting to setting the supply_name in desc to be enough - and
> without deeper understanding I'd say it should be enough).

The framework will look for the parent regulator and warn if it can't
find it but it should still instantiate it if the mapping is a hard
failure (as opposed to a probe deferral).

> If this change is intentional and buck6-supply and buck7-supply are bow
> required also in DT, then we should reflect this fact also in bindings
> doc for BD71837 and BD71847.

It is always and has always been best practice to wire up the regulators
as completely as possible; this is less error prone and gives you more
ability to take advantage of framework improvements.

signature.asc
Description: PGP signature

Re: [PATCH RFC 0/4] mm/ksm: add option to automerge VMAs

2019-05-14 Thread Kirill Tkhai

On 14.05.2019 09:30, Oleksandr Natalenko wrote:
> Hi.
> 
> On Mon, May 13, 2019 at 03:37:56PM +0300, Kirill Tkhai wrote:
>>> Yes, I get your point. But the intention is to avoid another hacky trick
>>> (LD_PRELOAD), thus *something* should *preferably* be done on the
>>> kernel level instead.
>>
>> I don't think so. Does userspace hack introduce some overhead? It does not
>> look so. Why should we think about mergeable VMAs in page fault handler?!
>> This is the last thing we want to think in page fault handler.
>>
>> Also, there is difficult synchronization in page fault handlers, and it's
>> easy to make a mistake. So, there is a mistake in [3/4], and you call
>> ksm_enter() with mmap_sem read locked, while normal way is to call it
>> with write lock (see madvise_need_mmap_write()).
>>
>> So, let's don't touch this path. Small optimization for unlikely case will
>> introduce problems in optimization for likely case in the future.
> 
> Yup, you're right, I've missed the fact that write lock is needed there.
> Re-vamping locking there is not my intention, so lets find another
> solution.
> 
>>> Also, just for the sake of another piece of stats here:
>>>
>>> $ echo "$(cat /sys/kernel/mm/ksm/pages_sharing) * 4 / 1024" | bc
>>> 526
>>
>> This all requires attentive analysis. The number looks pretty big for me.
>> What are the pages you get merged there? This may be just zero pages,
>> you have identical.
>>
>> E.g., your browser want to work fast. It introduces smart schemes,
>> and preallocates many pages in background (mmap + write 1 byte to a page),
>> so in further it save some time (no page fault + alloc), when page is
>> really needed. But your change merges these pages and kills this
>> optimization. Sounds not good, does this?
>>
>> I think, we should not think we know and predict better than application
>> writers, what they want from kernel. Let's people decide themselves
>> in dependence of their workload. The only exception is some buggy
>> or old applications, which impossible to change, so force madvise
>> workaround may help. But only in case there are really such applications...
>>
>> I'd researched what pages you have duplicated in these 526 MB. Maybe
>> you find, no action is required or a report to userspace application
>> to use madvise is needed.
> 
> OK, I agree, this is a good argument to move decision to userspace.
> 
>>> 2) what kinds of opt-out we should maintain? Like, what if force_madvise
>>> is called, but the task doesn't want some VMAs to be merged? This will
>>> required new flag anyway, it seems. And should there be another
>>> write-only file to unmerge everything forcibly for specific task?
>>
>> For example,
>>
>> Merge:
>> #echo $task > /sys/kernel/mm/ksm/force_madvise
> 
> Immediate question: what should be actually done on this? I see 2
> options:
> 
> 1) mark all VMAs as mergeable + set some flag for mmap() to mark all
> further allocations as mergeable as well;
> 2) just mark all the VMAs as mergeable; userspace can call this
> periodically to mark new VMAs.
> 
> My prediction is that 2) is less destructive, and the decision is
> preserved predominantly to userspace, thus it would be a desired option.

Let's see, how we use KSM now. It's good for virtual machines: people
install the same distribution in several VMs, and they have the same
packages and the same files. When you read a file inside VM, its pages
are file cache for the VM, but they are anonymous pages for host kernel.

Hypervisor marks VM memory as mergeable, and host KSM merges the same
anonymous pages together. Many of file cache inside VM is constant
content, so we have good KSM compression on such the file pages.
The result we have is explainable and expected.

But we don't know anything about pages, you have merged on your laptop.
We can't make any assumptions before analysis of applications, which
produce such the pages. Let's check what happens before we try to implement
some specific design (if we really need something to implement).

The rest is just technical details. We may implement everything we need
on top of this (even implement a polling of /proc/[pid]/maps and write
a task and address of vma to force_madvise or similar file).

Kirill

Re: [PATCH RT v2] Fix a lockup in wait_for_completion() and friends

2019-05-14 Thread Sebastian Andrzej Siewior

On 2019-05-14 10:43:56 [+0200], Peter Zijlstra wrote:
> Now.. that will fix it, but I think it is also wrong.
> 
> The problem being that it violates FIFO, something that might be more
> important on -RT than elsewhere.

Wouldn't -RT be more about waking the task with the highest priority
instead the one that waited the longest?

> The regular wait API seems confused/inconsistent when it uses
> autoremove_wake_function and default_wake_function, which doesn't help,
> but we can easily support this with swait -- the problematic thing is
> the custom wake functions, we musn't do that.
> 
> (also, mingo went and renamed a whole bunch of wait_* crap and didn't do
> the same to swait_ so now its named all different :/)
> 
> Something like the below perhaps.

This still violates FIFO because a task can do wait_for_completion(),
not enqueue itself on the list because it noticed a pending wake and
leave. The list order is preserved, we have that.
But this a completion list. We have probably multiple worker waiting for
something to do so all of those should be of equal priority, maybe one
for each core or so. So it shouldn't matter which one we wake up.

Corey, would it make any change which waiter is going to be woken up?

Sebastian

Re: [baylibre-upstreaming] [PATCH 0/3] mmc: meson-gx: add ddr-access-quirk support

2019-05-14 Thread guillaume La Roque



On 5/13/19 11:15 AM, Neil Armstrong wrote:
> On the Amlogic G12A SoC family, (only) the SDIO controller fails to access
> the data from DDR, leading to a broken controller.
>
> Add the amlogic,ddr-access-quirk property so signal this particular
> controller has this bug and needs a quirk to work properly.
>
> But each MMC controller has 1,5KiB of SRAM after the registers, that can
> be used as bounce buffer to avoid direct DDR access from the integrated
> DMAs (this SRAM may be used by the boot ROM when DDR is not yet initialized).
>
> The quirk is to disable the chained descriptor for this controller, and
> use this SRAM memory zone as buffer for the bounce buffer fallback mode.
>
> The performance hit hasn't been evaluated, but the fix has been tested
> using a WiFi AP6398S SDIO module, and the iperf3 Bandwidth measurement gave
> 55.2 Mbits/sec over a 63 Hours long test, with the SDIO ios set as High-Speed
> at 50MHz clock. It gave around 170 Mbits/sec as SDR104 and 200MHz clock.
>
> Neil Armstrong (3):
>   dt-bindings: mmc: meson-gx: add ddr-access-quirk property
>   mmc: meson-gx: add ddr-access-quirk
>   arm64: dts: meson-g12a: add ddr-access-quirk property to SDIO
> controller
>
>  .../bindings/mmc/amlogic,meson-gx.txt |  4 ++
>  arch/arm64/boot/dts/amlogic/meson-g12a.dtsi   |  1 +
>  drivers/mmc/host/meson-gx-mmc.c   | 65 +++
>  3 files changed, 57 insertions(+), 13 deletions(-)
>
Test with SEI510 board no problem or regression seen

Tested-by: Guillaume La Roque

[PATCH 2/3] arm64: dts: meson: u200: add sd and emmc

2019-05-14 Thread Jerome Brunet

Enable eMMC and SDCard on the g12a u200 board

Signed-off-by: Jerome Brunet 
---
 .../boot/dts/amlogic/meson-g12a-u200.dts  | 42 +++
 1 file changed, 42 insertions(+)

diff --git a/arch/arm64/boot/dts/amlogic/meson-g12a-u200.dts 
b/arch/arm64/boot/dts/amlogic/meson-g12a-u200.dts
index 7cc3e2d6a4f1..972926121beb 100644
--- a/arch/arm64/boot/dts/amlogic/meson-g12a-u200.dts
+++ b/arch/arm64/boot/dts/amlogic/meson-g12a-u200.dts
@@ -31,6 +31,11 @@
};
};
 
+   emmc_pwrseq: emmc-pwrseq {
+   compatible = "mmc-pwrseq-emmc";
+   reset-gpios = <&gpio BOOT_12 GPIO_ACTIVE_LOW>;
+   };
+
hdmi-connector {
compatible = "hdmi-connector";
type = "a";
@@ -164,6 +169,43 @@
pinctrl-names = "default";
 };
 
+/* SD card */
+&sd_emmc_b {
+   status = "okay";
+   pinctrl-0 = <&sdcard_c_pins>;
+   pinctrl-1 = <&sdcard_clk_gate_c_pins>;
+   pinctrl-names = "default", "clk-gate";
+
+   bus-width = <4>;
+   cap-sd-highspeed;
+   max-frequency = <5000>;
+   disable-wp;
+
+   cd-gpios = <&gpio GPIOC_6 GPIO_ACTIVE_LOW>;
+   vmmc-supply = <&vddao_3v3>;
+   vqmmc-supply = <&vddao_3v3>;
+};
+
+/* eMMC */
+&sd_emmc_c {
+   status = "okay";
+   pinctrl-0 = <&emmc_pins>, <&emmc_ds_pins>;
+   pinctrl-1 = <&emmc_clk_gate_pins>;
+   pinctrl-names = "default", "clk-gate";
+
+   bus-width = <8>;
+   cap-mmc-highspeed;
+   mmc-ddr-1_8v;
+   mmc-hs200-1_8v;
+   max-frequency = <2>;
+   non-removable;
+   disable-wp;
+
+   mmc-pwrseq = <&emmc_pwrseq>;
+   vmmc-supply = <&vcc_3v3>;
+   vqmmc-supply = <&flash_1v8>;
+};
+
 &uart_AO {
status = "okay";
pinctrl-0 = <&uart_ao_a_pins>;
-- 
2.20.1

[PATCH 1/3] arm64: dts: meson: g12a: add mmc nodes

2019-05-14 Thread Jerome Brunet

Add port B (sdcard) and port C (eMMC) pinctrl and controllers nodes to
the g12a DT.

Signed-off-by: Jerome Brunet 
---
 arch/arm64/boot/dts/amlogic/meson-g12a.dtsi | 124 
 1 file changed, 124 insertions(+)

diff --git a/arch/arm64/boot/dts/amlogic/meson-g12a.dtsi 
b/arch/arm64/boot/dts/amlogic/meson-g12a.dtsi
index 2f4f4dd54cba..b2f08fc96568 100644
--- a/arch/arm64/boot/dts/amlogic/meson-g12a.dtsi
+++ b/arch/arm64/boot/dts/amlogic/meson-g12a.dtsi
@@ -185,6 +185,48 @@
};
};
 
+   emmc_pins: emmc {
+   mux-0 {
+   groups = "emmc_nand_d0",
+"emmc_nand_d1",
+"emmc_nand_d2",
+"emmc_nand_d3",
+"emmc_nand_d4",
+"emmc_nand_d5",
+"emmc_nand_d6",
+"emmc_nand_d7",
+"emmc_cmd";
+   function = "emmc";
+   bias-pull-up;
+   drive-strength-microamp 
= <4000>;
+   };
+
+   mux-1 {
+   groups = "emmc_clk";
+   function = "emmc";
+   bias-disable;
+   drive-strength-microamp 
= <4000>;
+   };
+   };
+
+   emmc_ds_pins: emmc-ds {
+   mux {
+   groups = "emmc_nand_ds";
+   function = "emmc";
+   bias-pull-down;
+   drive-strength-microamp 
= <4000>;
+   };
+   };
+
+   emmc_clk_gate_pins: emmc_clk_gate {
+   mux {
+   groups = "BOOT_8";
+   function = 
"gpio_periphs";
+   bias-pull-down;
+   drive-strength-microamp 
= <4000>;
+   };
+   };
+
hdmitx_ddc_pins: hdmitx_ddc {
mux {
groups = "hdmitx_sda",
@@ -290,6 +332,64 @@
};
};
 
+   sdcard_c_pins: sdcard_c {
+   mux-0 {
+   groups = "sdcard_d0_c",
+"sdcard_d1_c",
+"sdcard_d2_c",
+"sdcard_d3_c",
+"sdcard_cmd_c";
+   function = "sdcard";
+   bias-pull-up;
+   drive-strength-microamp 
= <4000>;
+   };
+
+   mux-1 {
+   groups = "sdcard_clk_c";
+   function = "sdcard";
+   bias-disable;
+   drive-strength-microamp 
= <4000>;
+   };
+   };
+
+   sdcard_clk_gate_c_pins: 
sdcard_clk_gate_c {
+

Re: [PATCH V3 2/4] arm64/mm: Hold memory hotplug lock while walking for kernel page table dump

2019-05-14 Thread David Hildenbrand

On 14.05.19 11:00, Anshuman Khandual wrote:
> The arm64 pagetable dump code can race with concurrent modification of the
> kernel page tables. When a leaf entries are modified concurrently, the dump
> code may log stale or inconsistent information for a VA range, but this is
> otherwise not harmful.
> 
> When intermediate levels of table are freed, the dump code will continue to
> use memory which has been freed and potentially reallocated for another
> purpose. In such cases, the dump code may dereference bogus addressses,
> leading to a number of potential problems.
> 
> Intermediate levels of table may by freed during memory hot-remove, or when
> installing a huge mapping in the vmalloc region. To avoid racing with these
> cases, take the memory hotplug lock when walking the kernel page table.
> 
> Signed-off-by: Anshuman Khandual 
> ---
>  arch/arm64/mm/ptdump_debugfs.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/arm64/mm/ptdump_debugfs.c b/arch/arm64/mm/ptdump_debugfs.c
> index 064163f..80171d1 100644
> --- a/arch/arm64/mm/ptdump_debugfs.c
> +++ b/arch/arm64/mm/ptdump_debugfs.c
> @@ -7,7 +7,10 @@
>  static int ptdump_show(struct seq_file *m, void *v)
>  {
>   struct ptdump_info *info = m->private;
> +
> + get_online_mems();

The name of that function is somewhat stale now, it used to refer to
online/offlining of pages only. The underlying lock is the
mem_hotplug_lock. Maybe we should rename that function some day ...

>   ptdump_walk_pgd(m, info);
> + put_online_mems();
>   return 0;

Acked-by: David Hildenbrand 

>  }
>  DEFINE_SHOW_ATTRIBUTE(ptdump);
> 


-- 

Thanks,

David / dhildenb

[PATCH 0/3] arm64: dts: meson: g12a: add mmc B and C

2019-05-14 Thread Jerome Brunet

This patchset adds the MMC controller B and C to the g12a SoC as well
as the u200 and sei510 boards.

MMC controller A has been left out on purpose. This controller is
special on this SoC family and will be added later on.

Notice the use of the pinconf DT property 'drive-strength-microamp'.
Support for this property is not yet merged in meson pinctrl driver but
the DT part as been acked by the DT maintainer [0] so it should be safe
to use.

[0]: https://lkml.kernel.org/r/20190513152451.GA25690@bogus

Jerome Brunet (3):
  arm64: dts: meson: g12a: add mmc nodes
  arm64: dts: meson: u200: add sd and emmc
  arm64: dts: meson: sei510: add sd and emmc

 .../boot/dts/amlogic/meson-g12a-sei510.dts|  42 ++
 .../boot/dts/amlogic/meson-g12a-u200.dts  |  42 ++
 arch/arm64/boot/dts/amlogic/meson-g12a.dtsi   | 124 ++
 3 files changed, 208 insertions(+)

-- 
2.20.1

[PATCH 3/3] arm64: dts: meson: sei510: add sd and emmc

2019-05-14 Thread Jerome Brunet

Enable eMMC and SDCard on the g12a sei510 board

Signed-off-by: Jerome Brunet 
---
 .../boot/dts/amlogic/meson-g12a-sei510.dts| 42 +++
 1 file changed, 42 insertions(+)

diff --git a/arch/arm64/boot/dts/amlogic/meson-g12a-sei510.dts 
b/arch/arm64/boot/dts/amlogic/meson-g12a-sei510.dts
index 61fb30047d7f..bb45e3577ff5 100644
--- a/arch/arm64/boot/dts/amlogic/meson-g12a-sei510.dts
+++ b/arch/arm64/boot/dts/amlogic/meson-g12a-sei510.dts
@@ -45,6 +45,11 @@
};
};
 
+   emmc_pwrseq: emmc-pwrseq {
+   compatible = "mmc-pwrseq-emmc";
+   reset-gpios = <&gpio BOOT_12 GPIO_ACTIVE_LOW>;
+   };
+
hdmi-connector {
compatible = "hdmi-connector";
type = "a";
@@ -161,6 +166,43 @@
vref-supply = <&vddio_ao1v8>;
 };
 
+/* SD card */
+&sd_emmc_b {
+   status = "okay";
+   pinctrl-0 = <&sdcard_c_pins>;
+   pinctrl-1 = <&sdcard_clk_gate_c_pins>;
+   pinctrl-names = "default", "clk-gate";
+
+   bus-width = <4>;
+   cap-sd-highspeed;
+   max-frequency = <5000>;
+   disable-wp;
+
+   cd-gpios = <&gpio GPIOC_6 GPIO_ACTIVE_LOW>;
+   vmmc-supply = <&vddao_3v3>;
+   vqmmc-supply = <&vddao_3v3>;
+};
+
+/* eMMC */
+&sd_emmc_c {
+   status = "okay";
+   pinctrl-0 = <&emmc_pins>, <&emmc_ds_pins>;
+   pinctrl-1 = <&emmc_clk_gate_pins>;
+   pinctrl-names = "default", "clk-gate";
+
+   bus-width = <8>;
+   cap-mmc-highspeed;
+   mmc-ddr-1_8v;
+   mmc-hs200-1_8v;
+   max-frequency = <2>;
+   non-removable;
+   disable-wp;
+
+   mmc-pwrseq = <&emmc_pwrseq>;
+   vmmc-supply = <&vddao_3v3>;
+   vqmmc-supply = <&emmc_1v8>;
+};
+
 &uart_A {
status = "okay";
pinctrl-0 = <&uart_a_pins>, <&uart_a_cts_rts_pins>;
-- 
2.20.1

Re: [PATCH] printk: Monitor change of console loglevel.

2019-05-14 Thread Sergey Senozhatsky

On (05/11/19 00:19), Tetsuo Handa wrote:
> We are seeing syzbot reports [1] where printk() messages prior to panic()
> are missing for unknown reason. To test whether it is due to some testcase
> changing console loglevel, let's panic() as soon as console loglevel has
> changed. This patch is intended for testing on linux-next.git only, and
> will be removed after we found what is wrong.

Clone linux-next, apply the patch, push to a github/gitlab repo,
configure syzbot to pull from github/gitlab? Adding temp patches
to linux-next is hard and apparently not exactly what linux-next
is used for these days.

-ss

Re: [stable/4.14.y PATCH 0/3] mmc: Fix a potential resource leak when shutting down request queue.

2019-05-14 Thread Greg Kroah-Hartman

On Mon, May 13, 2019 at 11:55:18AM -0600, Raul E Rangel wrote:
> I think we should cherry-pick 41e3efd07d5a02c80f503e29d755aa1bbb4245de
> https://lore.kernel.org/patchwork/patch/856512/ into 4.14. It fixes a
> potential resource leak when shutting down the request queue.

Potential meaning "it does happen", or "it can happen if we do this", or
just "maybe it might happen, we really do not know?"

> Once this patch is applied, there is a potential for a null pointer 
> dereference.
> That's what the second patch fixes.

What is the git id of that upstream fix?

> The third patch is just an optimization to stop processing earlier.

That's not how stable kernels work :(

> See https://patchwork.kernel.org/patch/10925469/ for the initial motivation.

I don't understand the motivation from that link at all :(

> This commit applies to v4.14.116. It is already included in 4.19. 4.19 doesn't
> suffer from the null pointer dereference because later commits migrate the mmc
> stack to blk-mq.

What are those later commits?

> I tested this patch set by randomly connecting/disconnecting the SD
> card. I got over 189650 itarations without a problem.

And if you do not have these patches, on 4.14.y, how many iterations
cause a problem?  If you just apply the first patch, does that work?

_EVERY_ time we take a patch that is not upstream, something usually is
broken and needs to be fixed.  We have a long long long history of this,
so if you want to have a patch that is not upstream applied to a stable
kernel release, you need a whole lot of justification and explanation
and begging.  And you need to be around to fix the fallout for when it
breaks :)

thanks,

greg k-h

Re: [PATCH V3 3/4] arm64/mm: Inhibit huge-vmap with ptdump

2019-05-14 Thread David Hildenbrand

On 14.05.19 11:00, Anshuman Khandual wrote:
> From: Mark Rutland 
> 
> The arm64 ptdump code can race with concurrent modification of the
> kernel page tables. At the time this was added, this was sound as:
> 
> * Modifications to leaf entries could result in stale information being
>   logged, but would not result in a functional problem.
> 
> * Boot time modifications to non-leaf entries (e.g. freeing of initmem)
>   were performed when the ptdump code cannot be invoked.
> 
> * At runtime, modifications to non-leaf entries only occurred in the
>   vmalloc region, and these were strictly additive, as intermediate
>   entries were never freed.
> 
> However, since commit:
> 
>   commit 324420bf91f6 ("arm64: add support for ioremap() block mappings")
> 
> ... it has been possible to create huge mappings in the vmalloc area at
> runtime, and as part of this existing intermediate levels of table my be
> removed and freed.
> 
> It's possible for the ptdump code to race with this, and continue to
> walk tables which have been freed (and potentially poisoned or
> reallocated). As a result of this, the ptdump code may dereference bogus
> addresses, which could be fatal.
> 
> Since huge-vmap is a TLB and memory optimization, we can disable it when
> the runtime ptdump code is in use to avoid this problem.

I wonder if there is another way to protect from such a race happening.
(IOW, a lock). But as you say, it's a debug feature, so this is an easy fix.

Looks good to me (with limited arm64 code insight :) )

> 
> Fixes: 324420bf91f60582 ("arm64: add support for ioremap() block mappings")
> Signed-off-by: Mark Rutland 
> Signed-off-by: Anshuman Khandual 
> Cc: Ard Biesheuvel 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> ---
>  arch/arm64/mm/mmu.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index ef82312..37a902c 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -955,13 +955,18 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys)
>  
>  int __init arch_ioremap_pud_supported(void)
>  {
> - /* only 4k granule supports level 1 block mappings */
> - return IS_ENABLED(CONFIG_ARM64_4K_PAGES);
> + /*
> +  * Only 4k granule supports level 1 block mappings.
> +  * SW table walks can't handle removal of intermediate entries.
> +  */
> + return IS_ENABLED(CONFIG_ARM64_4K_PAGES) &&
> +!IS_ENABLED(CONFIG_ARM64_PTDUMP_DEBUGFS);
>  }
>  
>  int __init arch_ioremap_pmd_supported(void)
>  {
> - return 1;
> + /* See arch_ioremap_pud_supported() */
> + return !IS_ENABLED(CONFIG_ARM64_PTDUMP_DEBUGFS);
>  }
>  
>  int pud_set_huge(pud_t *pudp, phys_addr_t phys, pgprot_t prot)
> 


-- 

Thanks,

David / dhildenb

Re: [LTP] LTP: Syscalls: 274 failures: EROFS(30): Read-only file system

2019-05-14 Thread Heiko Carstens

On Mon, May 13, 2019 at 03:16:11AM -0400, Jan Stancek wrote:
> - Original Message -
> > We have noticed 274 syscall test failures on x86_64 and i386 due to
> > Make the temporary directory in one shot using mkdtemp failed.
> > tst_tmpdir.c:264: BROK: tst_tmpdir:
> > mkdtemp(/scratch/ltp-7D8vAcYeFG/OXuquJ) failed: EROFS
> 
> Looks like ext4 bug:
> 
> [ 1916.032087] EXT4-fs error (device sda): ext4_find_extent:909: inode #8: 
> comm jbd2/sda-8: pblk 121667583 bad header/extent: invalid extent entries - 
> magic f30a, entries 8, max 340(340), depth 0(0)
> [ 1916.073840] jbd2_journal_bmap: journal block not found at offset 4455 on 
> sda-8
> [ 1916.081071] Aborting journal on device sda-8.
> [ 1916.348652] EXT4-fs error (device sda): ext4_journal_check_start:61: 
> Detected aborted journal
> [ 1916.357222] EXT4-fs (sda): Remounting filesystem read-only
> 
> So best place for report is likely linux-e...@vger.kernel.org

Actually adding the mailing list, since there has been at least one
other report about ext4 filesystem corruption.

FWIW, I've seen the above also at least once on s390 when using a
kernel built with git commit 47782361aca2.

> > 
> > Failed log:
> > 
> > pread01 1  TBROK  :  tst_tmpdir.c:264: tst_tmpdir:
> > mkdtemp(/scratch/ltp-7D8vAcYeFG/preAUvXAE) failed: errno=EROFS(30):
> > Read-only file system
> > pread01 2  TBROK  :  tst_tmpdir.c:264: Remaining cases broken
> > 
> > full test log,
> > --
> > https://lkft.validation.linaro.org/scheduler/job/711826#L7834
> > 
> > LTP Version: 20190115
> > 
> > Kernel bad commit:
> > 
> > git branch master
> > git commit dd5001e21a991b731d659857cd07acc7a13e6789
> > git describe v5.1-3486-gdd5001e21a99
> > git repo https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> > 
> > Kernel good commit:
> > 
> > git branch master
> > git commit d3511f53bb2475f2a4e8460bee5a1ae6dea2a433
> > git describe v5.1-3385-gd3511f53bb24
> > git repo https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> > 
> > Best regards
> > Naresh Kamboju
> > 
> 
> -- 
> Mailing list info: https://lists.linux.it/listinfo/ltp

Re: [PATCH] serial: 8250: Add support for using platform_device resources

2019-05-14 Thread Andy Shevchenko

On Tue, May 14, 2019 at 10:24 AM Esben Haabendal  wrote:
> Andy Shevchenko  writes:
> > On Tue, May 07, 2019 at 02:22:18PM +0200, Esben Haabendal wrote:

> We are on repeat here.  I don't agree with you here.  I have a simple
> generic 8250 (16550A) compatible device, and cannot use it in a mfd
> driver using the standard mfd-core framework.

> The lacking of support for platform_get_resource() in the generic
> serial8250 driver is not a feature.  It should be supported, just as it
> is in several of the specialized 8250 drivers.

We are going circles here.
What exactly prevents you to use it? Presence of request_mem_region()?

> It would still mean that I would have revert to not using convenient and
> otherwise fully appropriate API calls like pci_request_regions() and
> mfd_add_devices().

Yes, here is the issue. 8250 requires the parent not to *request*
resources. Because child handles IO access itself.

> The mfd driver in question is for a PCI device.  Not being able to
> request the PCI regions seems silly.

Nope. Otherwise, the parent which *doesn't handle* IO on behalf of
child should not request its resources.

> Not being able to register all child devices with the call introduced
> for that sole purpose also seems silly.

> Please take a look at https://lkml.org/lkml/2019/4/9/576
> ("[PATCH v2 2/4] mfd: ioc3: Add driver for SGI IOC3 chip")

Thank you for this link.
Now, look at this comment:

+ /*
+ * Map all IOC3 registers.  These are shared between subdevices
+ * so the main IOC3 module manages them.
+ */

Is it your case? Can we see the code?

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH v2 3/5] arm64: Fix incorrect irqflag restore for priority masking

2019-05-14 Thread Julien Thierry




On 07/05/2019 09:36, Marc Zyngier wrote:
> On 29/04/2019 17:00, Julien Thierry wrote:
>> When using IRQ priority masking to disable interrupts, in order to deal
>> with the PSR.I state, local_irq_save() would convert the I bit into a
>> PMR value (GIC_PRIO_IRQOFF). This resulted in local_irq_restore()
>> potentially modifying the value of PMR in undesired location due to the
>> state of PSR.I upon flag saving [1].
>>
>> In an attempt to solve this issue in a less hackish manner, introduce
>> a bit (GIC_PRIO_IGNORE_PMR) for the PMR values that can represent
>> whether PSR.I is being used to disable interrupts, in which case it
>> takes precedence of the status of interrupt masking via PMR.
>>
>> GIC_PRIO_IGNORE_PMR is chosen such that ( |
>> GIC_PRIO_IGNORE_PMR) does not mask more interrupts than  as
>> some sections (e.g. arch_cpu_idle(), interrupt acknowledge path)
>> requires PMR not to mask interrupts that could be signaled to the
>> CPU when using only PSR.I.
>>
>> [1] https://www.spinics.net/lists/arm-kernel/msg716956.html
>>
>> Fixes: commit 4a503217ce37 ("arm64: irqflags: Use ICC_PMR_EL1 for interrupt 
>> masking")
>> Signed-off-by: Julien Thierry 
>> Reported-by: Zenghui Yu 
>> Cc: Steven Rostedt 
>> Cc: Wei Li 
>> Cc: Catalin Marinas 
>> Cc: Will Deacon 
>> Cc: Christoffer Dall 
>> Cc: Marc Zyngier 
>> Cc: James Morse 
>> Cc: Suzuki K Pouloze 
>> Cc: Oleg Nesterov 
>> ---
>>  arch/arm64/include/asm/arch_gicv3.h |  4 ++-
>>  arch/arm64/include/asm/assembler.h  |  9 +
>>  arch/arm64/include/asm/daifflags.h  | 22 
>>  arch/arm64/include/asm/irqflags.h   | 69 
>> -
>>  arch/arm64/include/asm/kvm_host.h   |  4 ++-
>>  arch/arm64/include/asm/ptrace.h | 10 --
>>  arch/arm64/kernel/entry.S   | 30 +---
>>  arch/arm64/kernel/process.c |  2 +-
>>  arch/arm64/kernel/smp.c |  8 +++--
>>  arch/arm64/kvm/hyp/switch.c |  2 +-
>>  10 files changed, 100 insertions(+), 60 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/arch_gicv3.h 
>> b/arch/arm64/include/asm/arch_gicv3.h
>> index 14b41dd..3102c9a 100644
>> --- a/arch/arm64/include/asm/arch_gicv3.h
>> +++ b/arch/arm64/include/asm/arch_gicv3.h
>> @@ -163,7 +163,9 @@ static inline bool gic_prio_masking_enabled(void)
>>
>>  static inline void gic_pmr_mask_irqs(void)
>>  {
>> -BUILD_BUG_ON(GICD_INT_DEF_PRI <= GIC_PRIO_IRQOFF);
>> +BUILD_BUG_ON(GICD_INT_DEF_PRI < (GIC_PRIO_IRQOFF |
>> + GIC_PRIO_IGNORE_PMR));
>> +BUILD_BUG_ON(GICD_INT_DEF_PRI >= GIC_PRIO_IRQON);
>>  gic_write_pmr(GIC_PRIO_IRQOFF);
>>  }
>>
>> diff --git a/arch/arm64/include/asm/assembler.h 
>> b/arch/arm64/include/asm/assembler.h
>> index c5308d0..601154d 100644
>> --- a/arch/arm64/include/asm/assembler.h
>> +++ b/arch/arm64/include/asm/assembler.h
>> @@ -62,6 +62,15 @@
>>  msr daifclr, #(8 | 4 | 1)
>>  .endm
>>
>> +.macro  suspend_irq_prio_masking, tmp:req
>> +#ifdef CONFIG_ARM64_PSEUDO_NMI
>> +alternative_if ARM64_HAS_IRQ_PRIO_MASKING
>> +mov \tmp, #(GIC_PRIO_IRQON | GIC_PRIO_IGNORE_PMR)
>> +msr_s   SYS_ICC_PMR_EL1, \tmp
>> +alternative_else_nop_endif
>> +#endif
>> +.endm
>> +
>>  /*
>>   * Save/restore interrupts.
>>   */
>> diff --git a/arch/arm64/include/asm/daifflags.h 
>> b/arch/arm64/include/asm/daifflags.h
>> index db452aa..a32ece9 100644
>> --- a/arch/arm64/include/asm/daifflags.h
>> +++ b/arch/arm64/include/asm/daifflags.h
>> @@ -18,6 +18,7 @@
>>
>>  #include 
>>
>> +#include 
>>  #include 
>>
>>  #define DAIF_PROCCTX0
>> @@ -32,6 +33,11 @@ static inline void local_daif_mask(void)
>>  :
>>  :
>>  : "memory");
>> +
>> +/* Don't really care for a dsb here, we don't intend to enable IRQs */
>> +if (system_uses_irq_prio_masking())
>> +gic_write_pmr(GIC_PRIO_IRQON | GIC_PRIO_IGNORE_PMR);
>> +
>>  trace_hardirqs_off();
>>  }
>>
>> @@ -43,7 +49,7 @@ static inline unsigned long local_daif_save(void)
>>
>>  if (system_uses_irq_prio_masking()) {
>>  /* If IRQs are masked with PMR, reflect it in the flags */
>> -if (read_sysreg_s(SYS_ICC_PMR_EL1) <= GIC_PRIO_IRQOFF)
>> +if (read_sysreg_s(SYS_ICC_PMR_EL1) != GIC_PRIO_IRQON)
>>  flags |= PSR_I_BIT;
>>  }
>>
>> @@ -59,14 +65,16 @@ static inline void local_daif_restore(unsigned long 
>> flags)
>>  if (!irq_disabled) {
>>  trace_hardirqs_on();
>>
>> -if (system_uses_irq_prio_masking())
>> -arch_local_irq_enable();
>> -} else if (!(flags & PSR_A_BIT)) {
>> +if (system_uses_irq_prio_masking()) {
>> +gic_write_pmr(GIC_PRIO_IRQON);
>> +dsb(sy);
>> +}
>> +} else if (system_uses_irq_prio_masking()) {
>>  /*
>>   * If interrupts are disabled but we can take
>>   * asyn

Re: [PATCH 1/4] spi: For controllers that need realtime always use the pump thread

2019-05-14 Thread Mark Brown

On Mon, May 13, 2019 at 01:24:57PM -0700, Doug Anderson wrote:
> On Sun, May 12, 2019 at 10:05 AM Mark Brown  wrote:

> > If performance is important you probably also want to avoid the context
> > thrashing - executing in the calling context is generally a substantial
> > performance boost.  I can see this causing problems further down the
> > line when someone else turns up with a different requirement, perhaps in
> > an application where the caller does actually have a raised priority
> > themselves and just wanted to make sure that the thread wasn't lower
> > than they are.  I guess it'd be nice if we could check what priority the
> > calling thread has and make a decision based on that but there don't
> > seem to be any facilities for doing that which I can see right now.

> In my case performance is 2nd place to a transfer not getting
> interrupted once started (so we don't break the 8ms rule of the EC).

That's great but other users do care very much about performance and are
also interested in both priority control and avoiding context thrashing.

> My solution in v2 of my series is to take out the forcing in the case
> that the controller wanted "rt" priority and then to add "force" to
> the parameter name.  If someone wants rt priority for the thread but
> doesn't want to force all transfers to the thread we can later add a
> different parameter for that?

I think that's going to be the common case for this.  Forcing context
thrashing is really not something anyone else is asking for.


signature.asc
Description: PGP signature

Re: [PATCH V5 1/4] spi: tegra114: add support for gpio based CS

2019-05-14 Thread Jon Hunter



On 14/05/2019 06:03, Sowjanya Komatineni wrote:
> This patch adds support for GPIO based CS control through SPI core
> function spi_set_cs.
> 
> Signed-off-by: Sowjanya Komatineni 
Can you elaborate on the use-case where this is needed? I am curious
what platforms are using this and why they would not use the dedicated
CS signals.

Cheers
Jon

-- 
nvpublic

[PATCH] objtool: doc: Fix one-file exception Makefile directive

2019-05-14 Thread Raphael Gault

The directive specified in the documentation to add an exception
for a single file in a Makefile was inverted.

Signed-off-by: Raphael Gault 
---
 tools/objtool/Documentation/stack-validation.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/objtool/Documentation/stack-validation.txt 
b/tools/objtool/Documentation/stack-validation.txt
index 3995735a878f..cd17ee022072 100644
--- a/tools/objtool/Documentation/stack-validation.txt
+++ b/tools/objtool/Documentation/stack-validation.txt
@@ -306,7 +306,7 @@ ignore it:
 
 - To skip validation of a file, add
 
-OBJECT_FILES_NON_STANDARD_filename.o := n
+OBJECT_FILES_NON_STANDARD_filename.o := y
 
   to the Makefile.
 
-- 
2.17.1

Re: [PATCH] serial: 8250: Add support for using platform_device resources

2019-05-14 Thread Andy Shevchenko

On Tue, May 14, 2019 at 12:23 PM Andy Shevchenko
 wrote:
> On Tue, May 14, 2019 at 10:24 AM Esben Haabendal  wrote:

> > Please take a look at https://lkml.org/lkml/2019/4/9/576
> > ("[PATCH v2 2/4] mfd: ioc3: Add driver for SGI IOC3 chip")
>
> Thank you for this link.
> Now, look at this comment:
>
> + /*
> + * Map all IOC3 registers.  These are shared between subdevices
> + * so the main IOC3 module manages them.
> + */
>
> Is it your case? Can we see the code?

They do not request resources by the way.
You may do the same, I told you this several times.

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH] EDAC, mc: Fix edac_mc_find() in case no device is found

2019-05-14 Thread Borislav Petkov

On Tue, May 14, 2019 at 07:25:58AM +, Robert Richter wrote:
> The function should return NULL in case no device is found, but it
> always returns the last checked mc device from the list even if the
> index did not match. This patch fixes this.
> 
> I did some analysis why this did not raise any issues for about 3
> years and the reason is that edac_mc_find() is mostly used to search
> for existing devices. Thus, the bug is not triggered.
> 
> Fixes: c73e8833bec5 ("EDAC, mc: Fix locking around mc_devices list")
> Signed-off-by: Robert Richter 
> ---
>  drivers/edac/edac_mc.c | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
> index 13594ffadcb3..aeeaaf30b38a 100644
> --- a/drivers/edac/edac_mc.c
> +++ b/drivers/edac/edac_mc.c
> @@ -688,10 +688,9 @@ struct mem_ctl_info *edac_mc_find(int idx)
>   mci = list_entry(item, struct mem_ctl_info, link);
>  
>   if (mci->mc_idx >= idx) {
> - if (mci->mc_idx == idx) {
> - goto unlock;
> - }
> - break;
> + if (mci->mc_idx != idx)
> + mci = NULL;
> + goto unlock;
>   }
>   }

Can we simplify this silly code even more pls? I'm pasting the whole
function instead of a diff for clarity:

---
struct mem_ctl_info *edac_mc_find(int idx)
{
struct mem_ctl_info *mci = NULL;
struct list_head *item;

mutex_lock(&mem_ctls_mutex);

list_for_each(item, &mc_devices) {
mci = list_entry(item, struct mem_ctl_info, link);
if (mci->mc_idx == idx)
goto unlock;
}

mci = NULL;

unlock:
mutex_unlock(&mem_ctls_mutex);
return mci;
---

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

1 2 3 4 5 6 7 >

1 - 100 of 661 matches

Mail list logo