Nicholas Piggin <npig...@gmail.com> writes: > On Fri, 10 Nov 2017 22:08:32 +1100 > Michael Ellerman <m...@ellerman.id.au> wrote: > >> Nicholas Piggin <npig...@gmail.com> writes: >> >> > Currently powernv reboot and shutdown requests just leave secondaries >> > to do their own things. This is undesirable because they can trigger >> > any number of watchdogs while waiting for reboot, but also we don't >> > know what else they might be doing, or they might be stuck somewhere >> > causing trouble. >> > >> > The opal scheduled flash update code already ran into watchdog problems >> > due to flashing taking a long time, but it's possible for regular >> > reboots to trigger problems too (this is with watchdog_thresh set to 1, >> > but I have seen it with watchdog_thresh at the default value once too): >> > >> > reboot: Restarting system >> > [ 360.038896709,5] OPAL: Reboot request... >> > Watchdog CPU:0 Hard LOCKUP >> > Watchdog CPU:44 detected Hard LOCKUP other CPUS:16 >> > Watchdog CPU:16 Hard LOCKUP >> > watchdog: BUG: soft lockup - CPU#16 stuck for 3s! [swapper/16:0] >> > >> > So remove the special case for flash update, and unconditionally do >> > smp_send_stop before rebooting. >> > >> > Return the CPUs to Linux stop loops rather than OPAL. The reason for >> > this is that the path to firmware is longer, and the CPUs may have >> > been interrupted from firmware, which may cause problems to re-enter >> > it. It's better to put them into a simple spin loop to maximize the >> > chance of a successful reboot. >> >> I always assumed we had to send the CPUs back to OPAL for the flashing >> procedure. Is it OK to leave them in Linux? > > According to the comment and changelog > > 2196c6f1ed66eef23df3b478cfe71661ae83726e > > It was added just to keep secondaries from going silly. Vasant, can > you remember details?
OK. My worry is that we've established an implicit contract with skiboot on how we do this, and now we're looking to change it. So I guess we just want to confirm that the skiboot code has not grown a dependency on us returning CPUs, and then we should probably document what the expectations are in eg. the OPAL_FLASH_UPDATE docs. cheers