On Mon, 18 Jun 2018 13:42:37 +1000 David Gibson <da...@gibson.dropbear.id.au> wrote:
> On Fri, Apr 20, 2018 at 05:39:42PM +0200, Greg Kurz wrote: > > On Fri, 20 Apr 2018 11:15:01 +0200 > > Greg Kurz <gr...@kaod.org> wrote: > > > > > On Fri, 20 Apr 2018 16:34:37 +1000 > > > David Gibson <da...@gibson.dropbear.id.au> wrote: > > > > > > > On Thu, Apr 19, 2018 at 03:48:23PM +0200, Greg Kurz wrote: > > > > > On Tue, 17 Apr 2018 17:17:13 +1000 > > > > > David Gibson <da...@gibson.dropbear.id.au> wrote: > > > > > > > > > > > af81cf323c1 "spapr: CPU hotplug support" added a direct call to > > > > > > spapr_cpu_reset() in spapr_cpu_init(), as well as registering it as > > > > > > a > > > > > > reset callback. That was in order to make sure that the reset > > > > > > function > > > > > > got called for a newly hotplugged cpu, which would miss the global > > > > > > machine > > > > > > reset. > > > > > > > > > > > > However, this change means that spapr_cpu_reset() gets called twice > > > > > > for > > > > > > normal cold-plugged cpus: once from spapr_cpu_init(), then again > > > > > > during > > > > > > the system reset. As well as being ugly in its redundancy, the > > > > > > first call > > > > > > happens before the machine reset calls have happened, which will > > > > > > cause > > > > > > problems for some things we're going to want to add. > > > > > > > > > > > > So, we remove the reset call from spapr_cpu_init(). We instead put > > > > > > an > > > > > > explicit reset call in the hotplug specific path. > > > > > > > > > > > > Signed-off-by: David Gibson <da...@gibson.dropbear.id.au> > > > > > > --- > > > > > > > > > > I had sent a tentative patch to do something similar earlier this > > > > > year: > > > > > > > > > > https://patchwork.ozlabs.org/patch/862116/ > > > > > > > > > > but it got nacked for several reasons, one of them being you were > > > > > "always wary of using the hotplugged parameter, because what qemu > > > > > means by it often doesn't line up with what PAPR means by it." > > > > > > > > Yeah, I was and am wary of that, but convinced myself it was correct > > > > in this case (which doesn't really interact with the PAPR meaning of > > > > hotplug). > > > > > > > > > > hw/ppc/spapr.c | 6 ++++-- > > > > > > hw/ppc/spapr_cpu_core.c | 13 ++++++++++++- > > > > > > include/hw/ppc/spapr_cpu_core.h | 2 ++ > > > > > > 3 files changed, 18 insertions(+), 3 deletions(-) > > > > > > > > > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > > > > > index 7b2bc4e25d..81b50af3b5 100644 > > > > > > --- a/hw/ppc/spapr.c > > > > > > +++ b/hw/ppc/spapr.c > > > > > > @@ -3370,9 +3370,11 @@ static void spapr_core_plug(HotplugHandler > > > > > > *hotplug_dev, DeviceState *dev, > > > > > > > > > > > > if (hotplugged) { > > > > > > > > > > ... but you rely on it here. Can you explain why it is > > > > > okay now ? > > > > > > > > So the value I actually need here is "wasn't present at the last > > > > system reset" (with false positives being mostly harmless, but not > > > > false negatives). > > > > > > > > > > Hmm... It is rather the other way around, sth like "will be caught > > > by the initial machine reset". > > > > > > > > Also, if QEMU is started with -incoming and the CPU core > > > > > is hotplugged before migration begins, the following will > > > > > return false: > > > > > > > > > > static inline bool spapr_drc_hotplugged(DeviceState *dev) > > > > > { > > > > > return dev->hotplugged && !runstate_check(RUN_STATE_INMIGRATE); > > > > > } > > > > > > > > > > and the CPU core won't be reset. > > > > > > > > Uh... spapr_dtc_hotplugged() would definitely be wrong here, which is > > > > why I'm not using it. > > > > > > > > > > This is how hotplugged is set in spapr_core_plug(): > > > > > > bool hotplugged = spapr_drc_hotplugged(dev); > > > > > > but to detect the "will be caught by the initial machine reset" condition, > > > we only need to check dev->hotplugged actually. > > > > > > > > > > > > > > /* > > > > > > - * Send hotplug notification interrupt to the guest > > > > > > only > > > > > > - * in case of hotplugged CPUs. > > > > > > + * For hotplugged CPUs, we need to reset them (they > > > > > > missed > > > > > > + * out on the system reset), and send the guest a > > > > > > + * notification > > > > > > */ > > > > > > + spapr_cpu_core_reset(core); > > > > > > > > > > spapr_cpu_reset() also sets the compat mode, which is used > > > > > to set some properties in the DT, ie, this should be called > > > > > before spapr_populate_hotplug_cpu_dt(). > > > > > > > > Good point. I've moved the reset to fix that. > > > > > > > > Thinking of it again: since cold-plugged devices reach this before machine > > reset, we would then attach to the DRC a DT blob based on a non-reset CPU > > :-\ > > > > Instead of registering a reset handler for each individual CPUs, maybe > > we should rather register it a the CPU core level. The handler would > > first reset all CPUs in the core and then setup the DRC for new cores only, > > like it is done currently in spapr_core_plug(). > > > > spapr_core_plug() would then just need to register the reset handler, > > and call it only for hotplugged cores. > > Handling the resets via the core level might be a good idea anyway, > but I don't think it can address the problem we're hitting here. > > I've investigated further and I'm pretty sure we can't fix this > without generic code changes. cpu_common_realizefn() (which is called > by our ppc cpu realize hook via the parent_realize chain) contains > this: > > if (dev->hotplugged) { > cpu_synchronize_post_init(cpu); > cpu_resume(cpu); > } > Yes, these come from: commit 13eed94ed5617b98e657163490584dc2a0cc4b32 Author: Igor Mammedov <imamm...@redhat.com> Date: Tue Apr 23 10:29:36 2013 +0200 cpu: Call cpu_synchronize_post_init() from DeviceClass::realize() If hotplugged, synchronize CPU state to KVM. Signed-off-by: Igor Mammedov <imamm...@redhat.com> Reviewed-by: Eduardo Habkost <ehabk...@redhat.com> Signed-off-by: Andreas Färber <afaer...@suse.de> and commit 6afb4721f3e45da727110470a61aafcd6682395e Author: Igor Mammedov <imamm...@redhat.com> Date: Tue Apr 23 10:29:38 2013 +0200 cpu: Resume CPU from DeviceClass::realize() if hot-plugged Signed-off-by: Igor Mammedov <imamm...@redhat.com> Signed-off-by: Andreas Färber <afaer...@suse.de> > So, as soon as the hotplugged cpu is realized, it's running, which > means by the time we call the plug() hotplug handler we're too late to > do any reset initialization. > > I think there are two ways to look at this: > > 1) The reset handlers are specifically about *system* reset, not > device reset, and so we shoudln't really expect them to be called for > hotplugged devices. If we want to share reset initialization with > "initial" initialization, we should explicitly call the reset handler > from the (realize time) init code.. which is what we do now. > > 2) Common core realize should _not_ activate the cpu. Instead that > should be the plug() handler's job. This would require changing the > x86 cpu plug handler (and whatever else) to kick off the cpu after > realization. > > For now I'm inclined to just let it stay at (1). > Maybe Igor and Eduardo (now Cc'd) can provide a hint about 2) ? > The problem I had which I thought required this, doesn't after all. I > came up with a different solution that involves moving the spapr caps > initialization earlier, instead of the cpu reset later. That turned > out to be substantially easier than I first thought, and regardless of > what we do above long term, I think it's a better way to handle the > caps. >
pgpggdSY6xFTR.pgp
Description: OpenPGP digital signature