On Fri, 10 Nov 2017 22:08:32 +1100 Michael Ellerman <m...@ellerman.id.au> wrote:
> Nicholas Piggin <npig...@gmail.com> writes: > > > Currently powernv reboot and shutdown requests just leave secondaries > > to do their own things. This is undesirable because they can trigger > > any number of watchdogs while waiting for reboot, but also we don't > > know what else they might be doing, or they might be stuck somewhere > > causing trouble. > > > > The opal scheduled flash update code already ran into watchdog problems > > due to flashing taking a long time, but it's possible for regular > > reboots to trigger problems too (this is with watchdog_thresh set to 1, > > but I have seen it with watchdog_thresh at the default value once too): > > > > reboot: Restarting system > > [ 360.038896709,5] OPAL: Reboot request... > > Watchdog CPU:0 Hard LOCKUP > > Watchdog CPU:44 detected Hard LOCKUP other CPUS:16 > > Watchdog CPU:16 Hard LOCKUP > > watchdog: BUG: soft lockup - CPU#16 stuck for 3s! [swapper/16:0] > > > > So remove the special case for flash update, and unconditionally do > > smp_send_stop before rebooting. > > > > Return the CPUs to Linux stop loops rather than OPAL. The reason for > > this is that the path to firmware is longer, and the CPUs may have > > been interrupted from firmware, which may cause problems to re-enter > > it. It's better to put them into a simple spin loop to maximize the > > chance of a successful reboot. > > I always assumed we had to send the CPUs back to OPAL for the flashing > procedure. Is it OK to leave them in Linux? According to the comment and changelog 2196c6f1ed66eef23df3b478cfe71661ae83726e It was added just to keep secondaries from going silly. Vasant, can you remember details? Thanks, Nick