Re: [PATCH 2/2] powerpc: Fix DABR write in the case of DAWR disabled

2018-04-01 Thread Nicholas Piggin
On Sun,  1 Apr 2018 15:50:36 +1000
Nicholas Piggin  wrote:

> flush_thread will call set_breakpoint via set_debug_reg_defaults,
> and cause POWER8 and above CPUs without the DAWR feature to try
> to write to DAWR.
  
  DABR

> 
> Cc: Michael Neuling 
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/kernel/process.c | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 24a591b4dbe9..bfcf5437083c 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -718,7 +718,8 @@ static void set_debug_reg_defaults(struct thread_struct 
> *thread)
>  {
>   thread->hw_brk.address = 0;
>   thread->hw_brk.type = 0;
> - set_breakpoint(&thread->hw_brk);
> + if (ppc_breakpoint_available())
> + set_breakpoint(&thread->hw_brk);
>  }
>  #endif /* !CONFIG_HAVE_HW_BREAKPOINT */
>  #endif   /* CONFIG_PPC_ADV_DEBUG_REGS */
> @@ -814,10 +815,14 @@ void __set_breakpoint(struct arch_hw_breakpoint *brk)
>  {
>   memcpy(this_cpu_ptr(¤t_brk), brk, sizeof(*brk));
>  
> - if (cpu_has_feature(CPU_FTR_DAWR))
> - set_dawr(brk);
> - else
> + if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
> + if (cpu_has_feature(CPU_FTR_DAWR))
> + set_dawr(brk);
> + else
> + WARN_ON_ONCE(1);
> + } else {
>   set_dabr(brk);
> + }
>  }
>  
>  void set_breakpoint(struct arch_hw_breakpoint *brk)



[PATCH 0/3] stop secondaries for reboot/shutdown

2018-04-01 Thread Nicholas Piggin
I'm rebasing and resending this series because the hard
lockup messages are being seen in the field.

Since last time, one of the patches was merged, and patches
were split and reordered a bit. No major changes.

This still hasn't been tested with the FSP firmware update.

Thanks,
Nick

Nicholas Piggin (3):
  powerpc: use NMI IPI for smp_send_stop
  powerpc: hard disable irqs in smp_send_stop loop
  powerpc/powernv: Always stop secondaries before reboot/shutdown

 arch/powerpc/include/asm/opal.h |  2 +-
 arch/powerpc/kernel/smp.c   | 13 +++--
 arch/powerpc/platforms/powernv/opal-flash.c | 28 +---
 arch/powerpc/platforms/powernv/setup.c  | 15 +--
 4 files changed, 18 insertions(+), 40 deletions(-)

-- 
2.16.3



[PATCH v2 1/3] powerpc: use NMI IPI for smp_send_stop

2018-04-01 Thread Nicholas Piggin
Use the NMI IPI rather than smp_call_function for smp_send_stop.
Have stopped CPUs hard disable interrupts rather than just soft
disable.

This function is used in crash/panic/shutdown paths to bring other
CPUs down as quickly and reliably as possible, and minimizing their
potential to cause trouble.

Avoiding the Linux smp_call_function infrastructure and (if supported)
using true NMI IPIs makes this more robust.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/smp.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index cfc08b099c49..db88660bf6bd 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -565,7 +565,11 @@ void crash_send_ipi(void (*crash_ipi_callback)(struct 
pt_regs *))
 }
 #endif
 
+#ifdef CONFIG_NMI_IPI
+static void stop_this_cpu(struct pt_regs *regs)
+#else
 static void stop_this_cpu(void *dummy)
+#endif
 {
/* Remove this CPU */
set_cpu_online(smp_processor_id(), false);
@@ -577,7 +581,11 @@ static void stop_this_cpu(void *dummy)
 
 void smp_send_stop(void)
 {
+#ifdef CONFIG_NMI_IPI
+   smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, stop_this_cpu, 100);
+#else
smp_call_function(stop_this_cpu, NULL, 0);
+#endif
 }
 
 struct thread_info *current_set[NR_CPUS];
-- 
2.16.3



[PATCH v2 2/3] powerpc: hard disable irqs in smp_send_stop loop

2018-04-01 Thread Nicholas Piggin
The hard lockup watchdog can fire under local_irq_disable
on platforms with irq soft masking.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/smp.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index db88660bf6bd..e16ec7b3b427 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -574,9 +574,10 @@ static void stop_this_cpu(void *dummy)
/* Remove this CPU */
set_cpu_online(smp_processor_id(), false);
 
-   local_irq_disable();
+   hard_irq_disable();
+   spin_begin();
while (1)
-   ;
+   spin_cpu_relax();
 }
 
 void smp_send_stop(void)
-- 
2.16.3



[PATCH v2 3/3] powerpc/powernv: Always stop secondaries before reboot/shutdown

2018-04-01 Thread Nicholas Piggin
Currently powernv reboot and shutdown requests just leave secondaries
to do their own things. This is undesirable because they can trigger
any number of watchdogs while waiting for reboot, but also we don't
know what else they might be doing -- they might be causing trouble,
trampling memory, etc.

The opal scheduled flash update code already ran into watchdog problems
due to flashing taking a long time, and it was fixed with 2196c6f1ed
("powerpc/powernv: Return secondary CPUs to firmware before FW update"),
which returns secondaries to opal. It's been found that regular reboots
can take over 10 seconds, which can result in the hard lockup watchdog
firing,

  reboot: Restarting system
  [  360.038896709,5] OPAL: Reboot request...
  Watchdog CPU:0 Hard LOCKUP
  Watchdog CPU:44 detected Hard LOCKUP other CPUS:16
  Watchdog CPU:16 Hard LOCKUP
  watchdog: BUG: soft lockup - CPU#16 stuck for 3s! [swapper/16:0]

This patch removes the special case for flash update, and calls
smp_send_stop in all cases before calling reboot/shutdown.

smp_send_stop could return CPUs to OPAL, the main reason not to is
that the request could come from a NMI that interrupts OPAL code,
so re-entry to OPAL can cause a number of problems. Putting
secondaries into simple spin loops improves the chances of a
successful reboot.

Cc: Vasant Hegde 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/opal.h |  2 +-
 arch/powerpc/platforms/powernv/opal-flash.c | 28 +---
 arch/powerpc/platforms/powernv/setup.c  | 15 +--
 3 files changed, 7 insertions(+), 38 deletions(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index dde60089d0d4..7159e1a6a61a 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -325,7 +325,7 @@ struct rtc_time;
 extern unsigned long opal_get_boot_time(void);
 extern void opal_nvram_init(void);
 extern void opal_flash_update_init(void);
-extern void opal_flash_term_callback(void);
+extern void opal_flash_update_print_message(void);
 extern int opal_elog_init(void);
 extern void opal_platform_dump_init(void);
 extern void opal_sys_param_init(void);
diff --git a/arch/powerpc/platforms/powernv/opal-flash.c 
b/arch/powerpc/platforms/powernv/opal-flash.c
index 1cb0b895a236..b37015101bf6 100644
--- a/arch/powerpc/platforms/powernv/opal-flash.c
+++ b/arch/powerpc/platforms/powernv/opal-flash.c
@@ -303,26 +303,9 @@ static int opal_flash_update(int op)
return rc;
 }
 
-/* Return CPUs to OPAL before starting FW update */
-static void flash_return_cpu(void *info)
-{
-   int cpu = smp_processor_id();
-
-   if (!cpu_online(cpu))
-   return;
-
-   /* Disable IRQ */
-   hard_irq_disable();
-
-   /* Return the CPU to OPAL */
-   opal_return_cpu();
-}
-
 /* This gets called just before system reboots */
-void opal_flash_term_callback(void)
+void opal_flash_update_print_message(void)
 {
-   struct cpumask mask;
-
if (update_flash_data.status != FLASH_IMG_READY)
return;
 
@@ -333,15 +316,6 @@ void opal_flash_term_callback(void)
 
/* Small delay to help getting the above message out */
msleep(500);
-
-   /* Return secondary CPUs to firmware */
-   cpumask_copy(&mask, cpu_online_mask);
-   cpumask_clear_cpu(smp_processor_id(), &mask);
-   if (!cpumask_empty(&mask))
-   smp_call_function_many(&mask,
-  flash_return_cpu, NULL, false);
-   /* Hard disable interrupts */
-   hard_irq_disable();
 }
 
 /*
diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index 5f963286232f..ef8c9ce53a61 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -201,17 +201,12 @@ static void pnv_prepare_going_down(void)
 */
opal_event_shutdown();
 
-   /* Soft disable interrupts */
-   local_irq_disable();
+   /* Print flash update message if one is scheduled. */
+   opal_flash_update_print_message();
 
-   /*
-* Return secondary CPUs to firwmare if a flash update
-* is pending otherwise we will get all sort of error
-* messages about CPU being stuck etc.. This will also
-* have the side effect of hard disabling interrupts so
-* past this point, the kernel is effectively dead.
-*/
-   opal_flash_term_callback();
+   smp_send_stop();
+
+   hard_irq_disable();
 }
 
 static void  __noreturn pnv_restart(char *cmd)
-- 
2.16.3



Re: [PATCH 2/2] smp: introduce kick_active_cpus_sync()

2018-04-01 Thread Yury Norov
On Tue, Mar 27, 2018 at 11:21:17AM +0100, Will Deacon wrote:
> On Sun, Mar 25, 2018 at 08:50:04PM +0300, Yury Norov wrote:
> > kick_all_cpus_sync() forces all CPUs to sync caches by sending broadcast 
> > IPI.
> > If CPU is in extended quiescent state (idle task or nohz_full userspace), 
> > this
> > work may be done at the exit of this state. Delaying synchronization helps 
> > to
> > save power if CPU is in idle state and decrease latency for real-time tasks.
> > 
> > This patch introduces kick_active_cpus_sync() and uses it in mm/slab and 
> > arm64
> > code to delay syncronization.
> > 
> > For task isolation (https://lkml.org/lkml/2017/11/3/589), IPI to the CPU 
> > running
> > isolated task would be fatal, as it breaks isolation. The approach with 
> > delaying
> > of synchronization work helps to maintain isolated state.
> > 
> > I've tested it with test from task isolation series on ThunderX2 for more 
> > than
> > 10 hours (10k giga-ticks) without breaking isolation.
> > 
> > Signed-off-by: Yury Norov 
> > ---
> >  arch/arm64/kernel/insn.c |  2 +-
> >  include/linux/smp.h  |  2 ++
> >  kernel/smp.c | 24 
> >  mm/slab.c|  2 +-
> >  4 files changed, 28 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> > index 2718a77da165..9d7c492e920e 100644
> > --- a/arch/arm64/kernel/insn.c
> > +++ b/arch/arm64/kernel/insn.c
> > @@ -291,7 +291,7 @@ int __kprobes aarch64_insn_patch_text(void *addrs[], 
> > u32 insns[], int cnt)
> >  * synchronization.
> >  */
> > ret = aarch64_insn_patch_text_nosync(addrs[0], 
> > insns[0]);
> > -   kick_all_cpus_sync();
> > +   kick_active_cpus_sync();
> > return ret;
> > }
> > }
> 
> I think this means that runtime modifications to the kernel text might not
> be picked up by CPUs coming out of idle. Shouldn't we add an ISB on that
> path to avoid executing stale instructions?
> 
> Will

commit 153ae9d5667e7baab4d48c48e8ec30fbcbd86f1e
Author: Yury Norov 
Date:   Sat Mar 31 15:05:23 2018 +0300

Hi Will, Paul,

On my system there are 3 paths that go thru rcu_dynticks_eqs_exit(),
and so require isb().

First path starts at gic_handle_irq() on secondary_start_kernel stack.
gic_handle_irq() already issues isb(), and so we can do nothing.

Second path starts at el0_svc entry; and third path is the exit from
do_idle() on secondary_start_kernel stack.

For do_idle() path there is arch_cpu_idle_exit() hook that is not used by
arm64 at now, so I picked it. And for el0_svc, I've introduced isb_if_eqs
macro and call it at the beginning of el0_svc_naked.

I've tested it on ThunderX2 machine, and it works for me.

Below is my call traces and patch for them. If you OK with it, I think I'm
ready to submit v2 (but maybe split this patch for better readability).

Yury

[  585.412095] Call trace:
[  585.412097] [] dump_backtrace+0x0/0x380
[  585.412099] [] show_stack+0x14/0x20
[  585.412101] [] dump_stack+0x98/0xbc
[  585.412104] [] rcu_dynticks_eqs_exit+0x68/0x70
[  585.412105] [] rcu_irq_enter+0x48/0x50
[  585.412106] [] irq_enter+0xc/0x70
[  585.412108] [] __handle_domain_irq+0x3c/0x120
[  585.412109] [] gic_handle_irq+0xc4/0x180
[  585.412110] Exception stack(0xfc001130fe20 to 0xfc001130ff60)
[  585.412112] fe20: 00a0  0001 

[  585.412113] fe40: 028f6f0b 0020 0013cd6f53963b31 

[  585.412144] fe60: 0002 fc001130fed0 0b80 
3400
[  585.412146] fe80:  0001  
01db
[  585.412147] fea0: fc0008247a78 03ff86dc61f8 0014 
fc0008fc
[  585.412149] fec0: fc00090143e8 fc0009014000 fc0008fc94a0 

[  585.412150] fee0:  fe8f46bb1700  

[  585.412152] ff00:  fc001130ff60 fc0008085034 
fc001130ff60
[  585.412153] ff20: fc0008085038 00400149 fc0009014000 
fc0008fc94a0
[  585.412155] ff40:   fc001130ff60 
fc0008085038
[  585.412156] [] el1_irq+0xb0/0x124
[  585.412158] [] arch_cpu_idle+0x10/0x18
[  585.412159] [] do_idle+0x10c/0x1d8
[  585.412160] [] cpu_startup_entry+0x24/0x28
[  585.412162] [] secondary_start_kernel+0x15c/0x1a0
[  585.412164] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
4.14.0-isolation-160735-g59b71c1-dirty #18

[  585.412058] Call trace:
[  585.412060] [] dump_backtrace+0x0/0x380
[  585.412062] [] show_stack+0x14/0x20
[  585.412064] [] dump_stack+0x98/0xbc
[  585.412066] [] rcu_dynticks_eqs_exit+0x68/0x70
[  585.412068] [] rcu_eqs_exit.isra.23+0x64/0x80
[  585.412069] [] rcu_user_exit+0xc/0x18
[  585.412071] [] __context_tracking_exit.part.2+0x74/0x98
[  585.412072] [] contex

Re: [PATCH 2/2] smp: introduce kick_active_cpus_sync()

2018-04-01 Thread Paul E. McKenney
On Sun, Apr 01, 2018 at 02:11:08PM +0300, Yury Norov wrote:
> On Tue, Mar 27, 2018 at 11:21:17AM +0100, Will Deacon wrote:
> > On Sun, Mar 25, 2018 at 08:50:04PM +0300, Yury Norov wrote:
> > > kick_all_cpus_sync() forces all CPUs to sync caches by sending broadcast 
> > > IPI.
> > > If CPU is in extended quiescent state (idle task or nohz_full userspace), 
> > > this
> > > work may be done at the exit of this state. Delaying synchronization 
> > > helps to
> > > save power if CPU is in idle state and decrease latency for real-time 
> > > tasks.
> > > 
> > > This patch introduces kick_active_cpus_sync() and uses it in mm/slab and 
> > > arm64
> > > code to delay syncronization.
> > > 
> > > For task isolation (https://lkml.org/lkml/2017/11/3/589), IPI to the CPU 
> > > running
> > > isolated task would be fatal, as it breaks isolation. The approach with 
> > > delaying
> > > of synchronization work helps to maintain isolated state.
> > > 
> > > I've tested it with test from task isolation series on ThunderX2 for more 
> > > than
> > > 10 hours (10k giga-ticks) without breaking isolation.
> > > 
> > > Signed-off-by: Yury Norov 
> > > ---
> > >  arch/arm64/kernel/insn.c |  2 +-
> > >  include/linux/smp.h  |  2 ++
> > >  kernel/smp.c | 24 
> > >  mm/slab.c|  2 +-
> > >  4 files changed, 28 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> > > index 2718a77da165..9d7c492e920e 100644
> > > --- a/arch/arm64/kernel/insn.c
> > > +++ b/arch/arm64/kernel/insn.c
> > > @@ -291,7 +291,7 @@ int __kprobes aarch64_insn_patch_text(void *addrs[], 
> > > u32 insns[], int cnt)
> > >* synchronization.
> > >*/
> > >   ret = aarch64_insn_patch_text_nosync(addrs[0], 
> > > insns[0]);
> > > - kick_all_cpus_sync();
> > > + kick_active_cpus_sync();
> > >   return ret;
> > >   }
> > >   }
> > 
> > I think this means that runtime modifications to the kernel text might not
> > be picked up by CPUs coming out of idle. Shouldn't we add an ISB on that
> > path to avoid executing stale instructions?
> > 
> > Will
> 
> commit 153ae9d5667e7baab4d48c48e8ec30fbcbd86f1e
> Author: Yury Norov 
> Date:   Sat Mar 31 15:05:23 2018 +0300
> 
> Hi Will, Paul,
> 
> On my system there are 3 paths that go thru rcu_dynticks_eqs_exit(),
> and so require isb().
> 
> First path starts at gic_handle_irq() on secondary_start_kernel stack.
> gic_handle_irq() already issues isb(), and so we can do nothing.
> 
> Second path starts at el0_svc entry; and third path is the exit from
> do_idle() on secondary_start_kernel stack.
> 
> For do_idle() path there is arch_cpu_idle_exit() hook that is not used by
> arm64 at now, so I picked it. And for el0_svc, I've introduced isb_if_eqs
> macro and call it at the beginning of el0_svc_naked.
> 
> I've tested it on ThunderX2 machine, and it works for me.
> 
> Below is my call traces and patch for them. If you OK with it, I think I'm
> ready to submit v2 (but maybe split this patch for better readability).

I must defer to Will on this one.

Thanx, Paul

> Yury
> 
> [  585.412095] Call trace:
> [  585.412097] [] dump_backtrace+0x0/0x380
> [  585.412099] [] show_stack+0x14/0x20
> [  585.412101] [] dump_stack+0x98/0xbc
> [  585.412104] [] rcu_dynticks_eqs_exit+0x68/0x70
> [  585.412105] [] rcu_irq_enter+0x48/0x50
> [  585.412106] [] irq_enter+0xc/0x70
> [  585.412108] [] __handle_domain_irq+0x3c/0x120
> [  585.412109] [] gic_handle_irq+0xc4/0x180
> [  585.412110] Exception stack(0xfc001130fe20 to 0xfc001130ff60)
> [  585.412112] fe20: 00a0  0001 
> 
> [  585.412113] fe40: 028f6f0b 0020 0013cd6f53963b31 
> 
> [  585.412144] fe60: 0002 fc001130fed0 0b80 
> 3400
> [  585.412146] fe80:  0001  
> 01db
> [  585.412147] fea0: fc0008247a78 03ff86dc61f8 0014 
> fc0008fc
> [  585.412149] fec0: fc00090143e8 fc0009014000 fc0008fc94a0 
> 
> [  585.412150] fee0:  fe8f46bb1700  
> 
> [  585.412152] ff00:  fc001130ff60 fc0008085034 
> fc001130ff60
> [  585.412153] ff20: fc0008085038 00400149 fc0009014000 
> fc0008fc94a0
> [  585.412155] ff40:   fc001130ff60 
> fc0008085038
> [  585.412156] [] el1_irq+0xb0/0x124
> [  585.412158] [] arch_cpu_idle+0x10/0x18
> [  585.412159] [] do_idle+0x10c/0x1d8
> [  585.412160] [] cpu_startup_entry+0x24/0x28
> [  585.412162] [] secondary_start_kernel+0x15c/0x1a0
> [  585.412164] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
> 4.14.0-isolation-160735-g59b

[PATCH v6 14/16] powerpc: Switch to generic free_initrd_mem.

2018-04-01 Thread Shea Levy
The first patch in this series added a weakly-defined generic
implementation, which is functionally identical to the
architecture-specific one removed here.

Series boot-tested on RISC-V (which now uses the generic
implementation) and x86_64 (which doesn't).

Signed-off-by: Shea Levy 
---
 arch/powerpc/mm/mem.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index fe8c61149fb8..e85b2a3cd264 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -404,13 +404,6 @@ void free_initmem(void)
free_initmem_default(POISON_FREE_INITMEM);
 }
 
-#ifdef CONFIG_BLK_DEV_INITRD
-void __init free_initrd_mem(unsigned long start, unsigned long end)
-{
-   free_reserved_area((void *)start, (void *)end, -1, "initrd");
-}
-#endif
-
 /*
  * This is called when a page has been modified by the kernel.
  * It just marks the page as not i-cache clean.  We do the i-cache
-- 
2.16.2



Re: [PATCH v4 14/16] powerpc: Use generic free_initrd_mem.

2018-04-01 Thread Shea Levy
Hi Michael,

Michael Ellerman  writes:

> Shea Levy  writes:
>
>> Joe Perches  writes:
>>
>>> On Wed, 2018-03-28 at 16:36 -0400, Shea Levy wrote:
 Signed-off-by: Shea Levy 
>>>
>>> Most people seem to want some form of commit message
>>> and not just your sign-off.
>>>
>>
>> Ah, if the subject is insufficient I can add some more detail.
>
> Yeah please do.
>
> Seeing this patch in isolation, with no change log, I might think it's
> safe for me to just apply it.
>
> But that would break the build because I don't have patch 1.
>
> So for starters you need to explain that part, eg something like:
>
>   A previous patch in the series added a weak definition of
>   free_initrd_mem() in init/initramfs.c.
>
>   The powerpc implementation is identical, so it can be removed allowing
>   the generic version to be used.
>
>
> Then you could also tell me if you did/didn't build/boot test it.

Thanks for the feedback, can you let me know if the recently posted v6
fits the bill?

>
> cheers

Thanks,
Shea


signature.asc
Description: PGP signature


Re: [PATCH] Extract initrd free logic from arch-specific code.

2018-04-01 Thread Shea Levy
Hi Ingo,

Ingo Molnar  writes:

> * Shea Levy  wrote:
>
>> Now only those architectures that have custom initrd free requirements
>> need to define free_initrd_mem.
>> 
>> Signed-off-by: Shea Levy 
>
> Please put the Kconfig symbol name this patch introduces both into the title, 
> so 
> that people know what to grep for.
>
>> ---
>>  arch/alpha/mm/init.c  |  8 
>>  arch/arc/mm/init.c|  7 ---
>>  arch/arm/Kconfig  |  1 +
>>  arch/arm64/Kconfig|  1 +
>>  arch/blackfin/Kconfig |  1 +
>>  arch/c6x/mm/init.c|  7 ---
>>  arch/cris/Kconfig |  1 +
>>  arch/frv/mm/init.c| 11 ---
>>  arch/h8300/mm/init.c  |  7 ---
>>  arch/hexagon/Kconfig  |  1 +
>>  arch/ia64/Kconfig |  1 +
>>  arch/m32r/Kconfig |  1 +
>>  arch/m32r/mm/init.c   | 11 ---
>>  arch/m68k/mm/init.c   |  7 ---
>>  arch/metag/Kconfig|  1 +
>>  arch/microblaze/mm/init.c |  7 ---
>>  arch/mips/Kconfig |  1 +
>>  arch/mn10300/Kconfig  |  1 +
>>  arch/nios2/mm/init.c  |  7 ---
>>  arch/openrisc/mm/init.c   |  7 ---
>>  arch/parisc/mm/init.c |  7 ---
>>  arch/powerpc/mm/mem.c |  7 ---
>>  arch/riscv/mm/init.c  |  6 --
>>  arch/s390/Kconfig |  1 +
>>  arch/score/Kconfig|  1 +
>>  arch/sh/mm/init.c |  7 ---
>>  arch/sparc/Kconfig|  1 +
>>  arch/tile/Kconfig |  1 +
>>  arch/um/kernel/mem.c  |  7 ---
>>  arch/unicore32/Kconfig|  1 +
>>  arch/x86/Kconfig  |  1 +
>>  arch/xtensa/Kconfig   |  1 +
>>  init/initramfs.c  |  7 +++
>>  usr/Kconfig   |  4 
>>  34 files changed, 28 insertions(+), 113 deletions(-)
>
> Please also put it into Documentation/features/.
>

I switched this patch series (the latest revision v6 was just posted) to
using weak symbols instead of Kconfig. Does it still warrant documentation?

>
>> diff --git a/usr/Kconfig b/usr/Kconfig
>> index 43658b8a975e..7a94f6df39bf 100644
>> --- a/usr/Kconfig
>> +++ b/usr/Kconfig
>> @@ -233,3 +233,7 @@ config INITRAMFS_COMPRESSION
>>  default ".lzma" if RD_LZMA
>>  default ".bz2"  if RD_BZIP2
>>  default ""
>> +
>> +config HAVE_ARCH_FREE_INITRD_MEM
>> +bool
>> +default n
>
> Help text would be nice, to tell arch maintainers what the purpose of this 
> switch 
> is.
>
> Also, a nit, I think this should be named "ARCH_HAS_FREE_INITRD_MEM", which 
> is the 
> dominant pattern:
>
> triton:~/tip> git grep 'select.*ARCH' arch/x86/Kconfig* | cut -f2 | cut -d_ 
> -f1-2 | sort | uniq -c | sort -n
> ...
>   2 select ARCH_USES
>   2 select ARCH_WANTS
>   3 select ARCH_MIGHT
>   3 select ARCH_WANT
>   4 select ARCH_SUPPORTS
>   4 select ARCH_USE
>  16 select HAVE_ARCH
>  23 select ARCH_HAS
>
> It also reads nicely in English:
>
>   "arch has free_initrd_mem()"
>
> While the other makes little sense:
>
>   "have arch free_initrd_mem()"
>
> ?
>
> Thanks,
>
>   Ingo

Thanks,
Shea


signature.asc
Description: PGP signature


[PATCH v3 0/7] powerpc/fadump: Improvements and fixes for firmware-assisted dump.

2018-04-01 Thread Mahesh J Salgaonkar
One of the primary issues with Firmware Assisted Dump (fadump) on Power
is that it needs a large amount of memory to be reserved. This reserved
memory is used for saving the contents of old crashed kernel's memory before
fadump capture kernel uses old kernel's memory area to boot. However, This
reserved memory area stays unused until system crash and isn't available
for production kernel to use.

Instead of setting aside a significant chunk of memory that nobody can use,
take advantage Linux kernel's Contiguous Memory Allocator (CMA) feature,
to reserve a significant chunk of memory that the kernel is prevented from
using , but applications are free to use it.

Patch 1 moves the metadata region to the start of the reserved area for easy
handling/detection of metadata region in second kernel.
Patch 3 implements the usage of CMA region to allow production kernel to
use that memory for applications usage, making fadump reservationless.
Patch 4-7 fixes various fadump issues and bugs.

Changes in V3:
- patch 1 & 2: move metadata region and documentation update.
- patch 7: Un-register the faudmp on kexec path


---

Mahesh Salgaonkar (7):
  powerpc/fadump: Move the metadata region to start of the reserved area.
  powerpc/fadump: Update documentation to reflect the metadata region 
movement.
  powerpc/fadump: Reservationless firmware assisted dump
  powerpc/fadump: exclude memory holes while reserving memory in second 
kernel.
  powerpc/fadump: throw proper error message on fadump registration failure.
  powerpc/fadump: Do not allow hot-remove memory from fadump reserved area.
  powerpc/fadump: un-register fadump on kexec path.


 Documentation/powerpc/firmware-assisted-dump.txt |   31 ++
 arch/powerpc/include/asm/fadump.h|6 
 arch/powerpc/kernel/fadump.c |  275 +++---
 arch/powerpc/platforms/pseries/hotplug-memory.c  |7 -
 4 files changed, 271 insertions(+), 48 deletions(-)

--
Signature



[PATCH v3 1/7] powerpc/fadump: Move the metadata region to start of the reserved area.

2018-04-01 Thread Mahesh J Salgaonkar
From: Mahesh Salgaonkar 

Currently the metadata region that holds crash info structure and ELF core
header is placed towards the end of reserved memory area. This patch places
it at the beginning of the reserved memory area.

Signed-off-by: Mahesh Salgaonkar 
---
 arch/powerpc/include/asm/fadump.h |4 ++
 arch/powerpc/kernel/fadump.c  |   92 -
 2 files changed, 83 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump.h 
b/arch/powerpc/include/asm/fadump.h
index 5a23010af600..44b6ebfe9be6 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -61,6 +61,9 @@
 #define FADUMP_UNREGISTER  2
 #define FADUMP_INVALIDATE  3
 
+/* Number of dump sections requested by kernel */
+#define FADUMP_NUM_SECTIONS4
+
 /* Dump status flag */
 #define FADUMP_ERROR_FLAG  0x2000
 
@@ -119,6 +122,7 @@ struct fadump_mem_struct {
struct fadump_section   cpu_state_data;
struct fadump_section   hpte_region;
struct fadump_section   rmr_region;
+   struct fadump_section   metadata_region;
 };
 
 /* Firmware-assisted dump configuration details. */
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 3c2c2688918f..80552fd7d5f8 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -188,17 +188,48 @@ static void fadump_show_config(void)
pr_debug("Boot memory size  : %lx\n", fw_dump.boot_memory_size);
 }
 
+static unsigned long get_fadump_metadata_size(void)
+{
+   unsigned long size = 0;
+
+   /*
+* If fadump is active then look into fdm_active to get metadata
+* size set by previous kernel.
+*/
+   if (fw_dump.dump_active) {
+   size = fdm_active->cpu_state_data.destination_address -
+   fdm_active->metadata_region.source_address;
+   goto out;
+   }
+   size += sizeof(struct fadump_crash_info_header);
+   size += sizeof(struct elfhdr); /* ELF core header.*/
+   size += sizeof(struct elf_phdr); /* place holder for cpu notes */
+   /* Program headers for crash memory regions. */
+   size += sizeof(struct elf_phdr) * (memblock_num_regions(memory) + 2);
+
+   size = PAGE_ALIGN(size);
+out:
+   pr_debug("fadump Metadata size is %ld\n", size);
+   return size;
+}
+
 static unsigned long init_fadump_mem_struct(struct fadump_mem_struct *fdm,
unsigned long addr)
 {
+   uint16_t num_sections = 0;
+   unsigned long metadata_base = 0;
+
if (!fdm)
return 0;
 
+   /* Skip the fadump metadata area. */
+   metadata_base = addr;
+   addr += get_fadump_metadata_size();
+
memset(fdm, 0, sizeof(struct fadump_mem_struct));
addr = addr & PAGE_MASK;
 
fdm->header.dump_format_version = cpu_to_be32(0x0001);
-   fdm->header.dump_num_sections = cpu_to_be16(3);
fdm->header.dump_status_flag = 0;
fdm->header.offset_first_dump_section =
cpu_to_be32((u32)offsetof(struct fadump_mem_struct, 
cpu_state_data));
@@ -222,6 +253,7 @@ static unsigned long init_fadump_mem_struct(struct 
fadump_mem_struct *fdm,
fdm->cpu_state_data.source_address = 0;
fdm->cpu_state_data.source_len = 
cpu_to_be64(fw_dump.cpu_state_data_size);
fdm->cpu_state_data.destination_address = cpu_to_be64(addr);
+   num_sections++;
addr += fw_dump.cpu_state_data_size;
 
/* hpte region section */
@@ -230,6 +262,7 @@ static unsigned long init_fadump_mem_struct(struct 
fadump_mem_struct *fdm,
fdm->hpte_region.source_address = 0;
fdm->hpte_region.source_len = cpu_to_be64(fw_dump.hpte_region_size);
fdm->hpte_region.destination_address = cpu_to_be64(addr);
+   num_sections++;
addr += fw_dump.hpte_region_size;
 
/* RMA region section */
@@ -238,8 +271,25 @@ static unsigned long init_fadump_mem_struct(struct 
fadump_mem_struct *fdm,
fdm->rmr_region.source_address = cpu_to_be64(RMA_START);
fdm->rmr_region.source_len = cpu_to_be64(fw_dump.boot_memory_size);
fdm->rmr_region.destination_address = cpu_to_be64(addr);
+   num_sections++;
addr += fw_dump.boot_memory_size;
 
+   /*
+* fadump metadata section.
+* Add an entry with source len a zero. We are only intereseted in
+* source address which will help us to detect the location of
+* metadata area where faump header and elf core header is placed.
+*/
+   fdm->metadata_region.request_flag = cpu_to_be32(FADUMP_REQUEST_FLAG);
+   fdm->metadata_region.source_data_type =
+   cpu_to_be16(FADUMP_REAL_MODE_REGION);
+   fdm->metadata_region.source_address = cpu_to_be64(metadata_base);
+   fdm->metadata_region.source_len = 0;
+   fdm->metadata_region.destination_ad

[PATCH v3 2/7] powerpc/fadump: Update documentation to reflect the metadata region movement.

2018-04-01 Thread Mahesh J Salgaonkar
From: Mahesh Salgaonkar 

Metadata region that holds fadump header and ELF header is now placed at
the start of reserved memory area. Update the documentation accordingly

Signed-off-by: Mahesh Salgaonkar 
---
 Documentation/powerpc/firmware-assisted-dump.txt |   31 +-
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/Documentation/powerpc/firmware-assisted-dump.txt 
b/Documentation/powerpc/firmware-assisted-dump.txt
index bdd344aa18d9..c77f8dc9231f 100644
--- a/Documentation/powerpc/firmware-assisted-dump.txt
+++ b/Documentation/powerpc/firmware-assisted-dump.txt
@@ -115,17 +115,24 @@ be kept permanently reserved, so that it can act as a 
receptacle
 for a copy of the boot memory content in addition to CPU state
 and HPTE region, in the case a crash does occur.
 
+The first kernel, during fadump registration, prepares ELF core header
+that contains necessary information for the coredump of the 1st kernel
+which includes its physical memory layout. This ELF header and some more
+additional data is stored in the area called metadata region at the start
+of the reserved memory area.
+
   o Memory Reservation during first kernel
 
   Low memory Top of memory
   0  boot memory size   |
   |   ||<--Reserved dump area -->|  |
   V   V|   Permanent Reservation |  V
-  +---+--/ /---+---++---++--+
-  |   ||CPU|HPTE|  DUMP |ELF |  |
-  +---+--/ /---+---++---++--+
-|   ^
-|   |
+  +---+--/ /---++---++---+--+
+  |   ||ELF |CPU|HPTE|  DUMP |  |
+  +---+-/ /++---++---+--+
+|^  ^
+||  |
+|metadata region|
 \   /
  ---
   Boot memory content gets transferred to
@@ -137,14 +144,14 @@ and HPTE region, in the case a crash does occur.
 
   Low memoryTop of memory
   0  boot memory size   |
-  |   |<- Reserved dump area --- -->|
+  |   |<- Reserved dump area -->|
   V   V V
-  +---+--/ /---+---++---++--+
-  |   ||CPU|HPTE|  DUMP |ELF |  |
-  +---+--/ /---+---++---++--+
-|  |
-V  V
-   Used by second/proc/vmcore
+  +---+--/ /---+++---+--+
+  |   ||ELF |CPU|HPTE|  DUMP |  |
+  +---+--/ /---+++---+--+
+| |
+V V
+   Used by second /proc/vmcore
kernel to boot
Fig. 2
 



[PATCH v3 3/7] powerpc/fadump: Reservationless firmware assisted dump

2018-04-01 Thread Mahesh J Salgaonkar
From: Mahesh Salgaonkar 

One of the primary issues with Firmware Assisted Dump (fadump) on Power
is that it needs a large amount of memory to be reserved. On large
systems with TeraBytes of memory, this reservation can be quite
significant.

In some cases, fadump fails if the memory reserved is insufficient, or
if the reserved memory was DLPAR hot-removed.

In the normal case, post reboot, the preserved memory is filtered to
extract only relevant areas of interest using the makedumpfile tool.
While the tool provides flexibility to determine what needs to be part
of the dump and what memory to filter out, all supported distributions
default this to "Capture only kernel data and nothing else".

We take advantage of this default and the Linux kernel's Contiguous
Memory Allocator (CMA) to fundamentally change the memory reservation
model for fadump. Fadump can now only capture kernel memory.

Instead of setting aside a significant chunk of memory nobody can use,
this patch uses CMA instead, to reserve a significant chunk of memory
that the kernel is prevented from using (due to MIGRATE_CMA), but
applications are free to use it.

Essentially, on a P9 LPAR with 2 cores, 8GB RAM and current upstream:
[root@zzxx-yy10 ~]# free -m
  totalusedfree  shared  buff/cache   available
Mem:   7557 1936822  12 5416725
Swap:  4095   04095

With this patch:
[root@zzxx-yy10 ~]# free -m
  totalusedfree  shared  buff/cache   available
Mem:   8133 1947464  12 4757338
Swap:  4095   04095

Changes made here are completely transparent to how fadump has
traditionally worked.

Thanks to Aneesh Kumar and Anshuman Khandual for helping us understand
CMA and its usage.

TODO:
- Handle case where CMA reservation spans nodes.

Signed-off-by: Ananth N Mavinakayanahalli 
Signed-off-by: Mahesh Salgaonkar 
---
 arch/powerpc/kernel/fadump.c |  122 --
 1 file changed, 104 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 80552fd7d5f8..011f1aa7abab 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -45,11 +46,57 @@
 static struct fw_dump fw_dump;
 static struct fadump_mem_struct fdm;
 static const struct fadump_mem_struct *fdm_active;
+static struct cma *fadump_cma;
 
 static DEFINE_MUTEX(fadump_mutex);
 struct fad_crash_memory_ranges crash_memory_ranges[INIT_CRASHMEM_RANGES];
 int crash_mem_ranges;
 
+/*
+ * fadump_cma_reserve() - reserve area for fadump memory reservation
+ *
+ * This function reserves memory from early allocator. It should be
+ * called by arch specific code once the memblock allocator
+ * has been activated.
+ */
+int __init fadump_cma_reserve(void)
+{
+   unsigned long long base, size;
+   int rc;
+
+   if (!fw_dump.fadump_enabled)
+   return 0;
+
+   base = fw_dump.reserve_dump_area_start;
+   size = fw_dump.reserve_dump_area_size;
+   pr_debug("Original reserve area base %ld, size %ld\n",
+   (unsigned long)base >> 20,
+   (unsigned long)size >> 20);
+   if (!size)
+   return 0;
+
+   rc = cma_declare_contiguous(base, size, 0, 0, 0, false,
+   "fadump_cma", &fadump_cma);
+   if (rc) {
+   printk(KERN_ERR "fadump: Failed to reserve cma area for "
+   "firmware-assisted dump, %d\n", rc);
+   fw_dump.reserve_dump_area_size = 0;
+   return 0;
+   }
+   /*
+* So we now have cma area reserved for fadump. base may be different
+* from what we requested.
+*/
+   fw_dump.reserve_dump_area_start = cma_get_base(fadump_cma);
+   fw_dump.reserve_dump_area_size = cma_get_size(fadump_cma);
+   printk("Reserved %ldMB cma area at %ldMB for firmware-assisted dump "
+   "(System RAM: %ldMB)\n",
+   cma_get_size(fadump_cma) >> 20,
+   (unsigned long)cma_get_base(fadump_cma) >> 20,
+   (unsigned long)(memblock_phys_mem_size() >> 20));
+   return 1;
+}
+
 /* Scan the Firmware Assisted dump configuration details. */
 int __init early_init_dt_scan_fw_dump(unsigned long node,
const char *uname, int depth, void *data)
@@ -381,7 +428,7 @@ static unsigned long get_fadump_area_size(void)
 }
 
 static inline unsigned long get_fadump_metadata_base(
-   const struct fadump_mem_struct *fdm)
+   const struct fadump_mem_struct *fdm)
 {
return be64_to_cpu(fdm->metadata_region.source_address);
 }
@@ -449,8 +496,9 @@ int 

[PATCH v3 4/7] powerpc/fadump: exclude memory holes while reserving memory in second kernel.

2018-04-01 Thread Mahesh J Salgaonkar
From: Mahesh Salgaonkar 

The second kernel, during early boot after the crash, reserves rest of
the memory above boot memory size to make sure it does not touch any of the
dump memory area. It uses memblock_reserve() that reserves the specified
memory region irrespective of memory holes present within that region.
There are chances where previous kernel would have hot removed some of
its memory leaving memory holes behind. In such cases fadump kernel reports
incorrect number of reserved pages through arch_reserved_kernel_pages()
hook causing kernel to hang or panic.

Fix this by excluding memory holes while reserving rest of the memory
above boot memory size during second kernel boot after crash.

Signed-off-by: Mahesh Salgaonkar 
---
 arch/powerpc/kernel/fadump.c |   17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 011f1aa7abab..a497e9fb93fe 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -433,6 +433,21 @@ static inline unsigned long get_fadump_metadata_base(
return be64_to_cpu(fdm->metadata_region.source_address);
 }
 
+static void fadump_memblock_reserve(unsigned long base, unsigned long size)
+{
+   struct memblock_region *reg;
+   unsigned long start, end;
+
+   for_each_memblock(memory, reg) {
+   start = max(base, (unsigned long)reg->base);
+   end = reg->base + reg->size;
+   end = min(base + size, end);
+
+   if (start < end)
+   memblock_reserve(start, end - start);
+   }
+}
+
 int __init fadump_reserve_mem(void)
 {
unsigned long base, size, memory_boundary;
@@ -487,7 +502,7 @@ int __init fadump_reserve_mem(void)
 */
base = fw_dump.boot_memory_size;
size = memory_boundary - base;
-   memblock_reserve(base, size);
+   fadump_memblock_reserve(base, size);
printk(KERN_INFO "Reserved %ldMB of memory at %ldMB "
"for saving crash dump\n",
(unsigned long)(size >> 20),



[PATCH v3 5/7] powerpc/fadump: throw proper error message on fadump registration failure.

2018-04-01 Thread Mahesh J Salgaonkar
From: Mahesh Salgaonkar 

fadump fails to register when there are holes in reserved memory area.
This can happen if user has hot-removed a memory that falls in the fadump
reserved memory area. Throw a meaningful error message to the user in
such case.

Signed-off-by: Mahesh Salgaonkar 
---
 arch/powerpc/kernel/fadump.c |   33 +
 1 file changed, 33 insertions(+)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index a497e9fb93fe..59aaf0357a52 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -216,6 +216,36 @@ static int is_boot_memory_area_contiguous(void)
return ret;
 }
 
+/*
+ * Returns 1, if there are no holes in reserved memory area,
+ * 0 otherwise.
+ */
+static int is_reserved_memory_area_contiguous(void)
+{
+   struct memblock_region *reg;
+   unsigned long start, end;
+   unsigned long d_start = fw_dump.reserve_dump_area_start;
+   unsigned long d_end = d_start + fw_dump.reserve_dump_area_size;
+   int ret = 0;
+
+   for_each_memblock(memory, reg) {
+   start = max(d_start, (unsigned long)reg->base);
+   end = min(d_end, (unsigned long)(reg->base + reg->size));
+   if (d_start < end) {
+   /* Memory hole from d_start to start */
+   if (start > d_start)
+   break;
+
+   if (end == d_end) {
+   ret = 1;
+   break;
+   }
+   d_start = end + 1;
+   }
+   }
+   return ret;
+}
+
 /* Print firmware assisted dump configurations for debugging purpose. */
 static void fadump_show_config(void)
 {
@@ -602,6 +632,9 @@ static int register_fw_dump(struct fadump_mem_struct *fdm)
if (!is_boot_memory_area_contiguous())
pr_err("Can't have holes in boot memory area while "
   "registering fadump\n");
+   else if (!is_reserved_memory_area_contiguous())
+   pr_err("Can't have holes in reserved memory area while"
+  " registering fadump\n");
 
printk(KERN_ERR "Failed to register firmware-assisted kernel"
" dump. Parameter Error(%d).\n", rc);



[PATCH v3 6/7] powerpc/fadump: Do not allow hot-remove memory from fadump reserved area.

2018-04-01 Thread Mahesh J Salgaonkar
From: Mahesh Salgaonkar 

For fadump to work successfully there should not be any holes in reserved
memory ranges where kernel has asked firmware to move the content of old
kernel memory in event of crash. But this memory area is currently not
protected from hot-remove operations. Hence, fadump service can fail to
re-register after the hot-remove operation, if hot-removed memory belongs
to fadump reserved region. To avoid this make sure that memory from fadump
reserved area is not hot-removable if fadump is registered.

However, if user still wants to remove that memory, he can do so by
manually stopping fadump service before hot-remove operation.

Signed-off-by: Mahesh Salgaonkar 
---
 arch/powerpc/include/asm/fadump.h   |2 +-
 arch/powerpc/kernel/fadump.c|   10 --
 arch/powerpc/platforms/pseries/hotplug-memory.c |7 +--
 3 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump.h 
b/arch/powerpc/include/asm/fadump.h
index 44b6ebfe9be6..d16dc77107a8 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -207,7 +207,7 @@ struct fad_crash_memory_ranges {
unsigned long long  size;
 };
 
-extern int is_fadump_boot_memory_area(u64 addr, ulong size);
+extern int is_fadump_memory_area(u64 addr, ulong size);
 extern int early_init_dt_scan_fw_dump(unsigned long node,
const char *uname, int depth, void *data);
 extern int fadump_reserve_mem(void);
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 59aaf0357a52..2c3c7e655eec 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -162,13 +162,19 @@ int __init early_init_dt_scan_fw_dump(unsigned long node,
 
 /*
  * If fadump is registered, check if the memory provided
- * falls within boot memory area.
+ * falls within boot memory area and reserved memory area.
  */
-int is_fadump_boot_memory_area(u64 addr, ulong size)
+int is_fadump_memory_area(u64 addr, ulong size)
 {
+   u64 d_start = fw_dump.reserve_dump_area_start;
+   u64 d_end = d_start + fw_dump.reserve_dump_area_size;
+
if (!fw_dump.dump_registered)
return 0;
 
+   if (((addr + size) > d_start) && (addr <= d_end))
+   return 1;
+
return (addr + size) > RMA_START && addr <= fw_dump.boot_memory_size;
 }
 
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index c1578f54c626..e4c658cda3a7 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -389,8 +389,11 @@ static bool lmb_is_removable(struct drmem_lmb *lmb)
phys_addr = lmb->base_addr;
 
 #ifdef CONFIG_FA_DUMP
-   /* Don't hot-remove memory that falls in fadump boot memory area */
-   if (is_fadump_boot_memory_area(phys_addr, block_sz))
+   /*
+* Don't hot-remove memory that falls in fadump boot memory area
+* and memory that is reserved for capturing old kernel memory.
+*/
+   if (is_fadump_memory_area(phys_addr, block_sz))
return false;
 #endif
 



[PATCH v3 7/7] powerpc/fadump: un-register fadump on kexec path.

2018-04-01 Thread Mahesh J Salgaonkar
From: Mahesh Salgaonkar 

otherwise the fadump registration in new kexec-ed kernel complains that
fadump is already registered. This makes new kernel to continue using
fadump registered by previous kernel which may lead to invalid vmcore
generation. Hence this patch fixes this issue by un-registering fadump
in fadump_cleanup() which is called during kexec path so that new kernel
can register fadump with new valid values.

Signed-off-by: Mahesh Salgaonkar 
---
 arch/powerpc/kernel/fadump.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 2c3c7e655eec..cb496c8f40f3 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1345,6 +1345,9 @@ void fadump_cleanup(void)
init_fadump_mem_struct(&fdm,
get_fadump_metadata_base(fdm_active));
fadump_invalidate_dump(&fdm);
+   } else if (fw_dump.dump_registered) {
+   /* Un-register Firmware-assisted dump if it was registered. */
+   fadump_unregister_dump(&fdm);
}
 }
 



Re: [PATCH] Extract initrd free logic from arch-specific code.

2018-04-01 Thread Ingo Molnar

* Shea Levy  wrote:

> > Please also put it into Documentation/features/.
> 
> I switched this patch series (the latest revision v6 was just posted) to
> using weak symbols instead of Kconfig. Does it still warrant documentation?

Probably not.

Thanks,

Ingo