During DLPAR operations, The newly added CPUs will start in halted mode.
Kernel will then take sometime to initialize those cpu interally and
start them using "start-cpu" rtas call. However if a kexec-crash is
occurred in between this window (till the new cpu has been initialized),
The kexec nmi will try to reset all-other-cpus from the crashing cpu,
Which will lead to firmware starting the uninitialized cpus aswell. This
will lead to kdump kernel to hang during bringup.

Sample Log:
  [175993.028231][ T1502] NIP [00007fffb953f394] 0x7fffb953f394
  [175993.028314][ T1502] LR [00007fffb953f394] 0x7fffb953f394
  [175993.028390][ T1502] --- interrupt: 3000
  [    5.519483][    T1] Processor 0 is stuck.
  [   11.089481][    T1] Processor 1 is stuck.

To Fix this, Only issue the system-reset hcall to CPUs that have
actually been started by the kernel.

Cc: Madhavan Srinivasan <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Shrikanth Hegde <[email protected]>
Cc: Nysal Jan K.A. <[email protected]>
Cc: Vishal Chourasia <[email protected]>
Cc: Ritesh Harjani <[email protected]>
Cc: Sourabh Jain <[email protected]>
Signed-off-by: Shivang Upadhyay <[email protected]>
---
 arch/powerpc/platforms/pseries/smp.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/smp.c 
b/arch/powerpc/platforms/pseries/smp.c
index db99725e752b..e5518cf71094 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -173,10 +173,24 @@ static void dbell_or_ic_cause_ipi(int cpu)
 
 static int pseries_cause_nmi_ipi(int cpu)
 {
-       int hwcpu;
+       int hwcpu, k;
 
        if (cpu == NMI_IPI_ALL_OTHERS) {
-               hwcpu = H_SIGNAL_SYS_RESET_ALL_OTHERS;
+
+               for_each_present_cpu(k) {
+                       if (k != smp_processor_id()) {
+                               hwcpu = get_hard_smp_processor_id(k);
+
+                               /* it is possible that cpu is present,
+                                * but not started yet
+                                */
+                               if (paca_ptrs[hwcpu]->cpu_start == 1)
+                                       plpar_signal_sys_reset(hwcpu);
+                       }
+               }
+
+               return 1;
+
        } else {
                if (cpu < 0) {
                        WARN_ONCE(true, "incorrect cpu parameter %d", cpu);
-- 
2.52.0


Reply via email to