On Tue, Jul 18, 2017 at 10:53:01AM +0530, Nikunj A Dadhania wrote: > David Gibson <da...@gibson.dropbear.id.au> writes: > > > On Mon, Jul 17, 2017 at 09:46:39AM +0530, Nikunj A Dadhania wrote: > >> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG. > >> > >> When reset happens, all the CPUs are in halted state. First CPU is brought > >> out > >> of reset and secondary CPUs would be initialized by the guest kernel using > >> a > >> rtas call start-cpu. > >> > >> However, in case of TCG, decrementer interrupts keep on coming and waking > >> the > >> secondary CPUs up. > >> > >> These secondary CPUs would see the decrementer interrupt pending, which > >> makes > >> cpu::has_work() to bring them out of wait loop and start executing > >> tcg_exec_cpu(). > >> > >> The problem with this is all the CPUs wake up and start booting SLOF image, > >> causing the following exception(4 CPUs TCG VM): > > > > Ok, I'm still trying to understand why the behaviour on reboot is > > different from the first boot. > > During first boot, the cpu is in the stopped state, so > cpus.c:cpu_thread_is_idle returns true and CPU remains in halted state > until rtas start-cpu. Therefore, we never check the cpu_has_work() > > In case of reboot, all CPUs are resumed after reboot. So we check the > next condition cpu_has_work() in cpu_thread_is_idle(), where we see a > DECR interrupt and remove the CPU from halted state as the CPU has > work.
Ok, so it sounds like we should set stopped on all the secondary CPUs on reset as well. What's causing them to be resumed after the reset at the moment? > > AFAICT on initial boot, the LPCR will > > have DEE / PECE3 enabled. So why aren't we getting the same problem > > then? > > Regards > Nikunj > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature