Markus Armbruster <arm...@redhat.com> writes: > Peter Xu <pet...@redhat.com> writes: > >> On Tue, Jan 09, 2024 at 10:22:31PM +0100, Philippe Mathieu-Daudé wrote: >>> Hi Fabiano, >>> >>> On 9/1/24 21:21, Fabiano Rosas wrote: >>> > Cédric Le Goater <c...@kaod.org> writes: >>> > >>> > > On 1/9/24 18:40, Fabiano Rosas wrote: >>> > > > Cédric Le Goater <c...@kaod.org> writes: >>> > > > >>> > > > > On 1/3/24 20:53, Fabiano Rosas wrote: >>> > > > > > Philippe Mathieu-Daudé <phi...@linaro.org> writes: >>> > > > > > >>> > > > > > > +Peter/Fabiano >>> > > > > > > >>> > > > > > > On 2/1/24 17:41, Cédric Le Goater wrote: >>> > > > > > > > On 1/2/24 17:15, Philippe Mathieu-Daudé wrote: >>> > > > > > > > > Hi Cédric, >>> > > > > > > > > >>> > > > > > > > > On 2/1/24 15:55, Cédric Le Goater wrote: >>> > > > > > > > > > On 12/12/23 17:29, Philippe Mathieu-Daudé wrote: >>> > > > > > > > > > > Hi, >>> > > > > > > > > > > >>> > > > > > > > > > > When a MPCore cluster is used, the Cortex-A cores >>> > > > > > > > > > > belong the the >>> > > > > > > > > > > cluster container, not to the board/soc layer. This >>> > > > > > > > > > > series move >>> > > > > > > > > > > the creation of vCPUs to the MPCore private container. >>> > > > > > > > > > > >>> > > > > > > > > > > Doing so we consolidate the QOM model, moving common >>> > > > > > > > > > > code in a >>> > > > > > > > > > > central place (abstract MPCore parent). >>> > > > > > > > > > >>> > > > > > > > > > Changing the QOM hierarchy has an impact on the state of >>> > > > > > > > > > the machine >>> > > > > > > > > > and some fixups are then required to maintain migration >>> > > > > > > > > > compatibility. >>> > > > > > > > > > This can become a real headache for KVM machines like >>> > > > > > > > > > virt for which >>> > > > > > > > > > migration compatibility is a feature, less for emulated >>> > > > > > > > > > ones. >>> > > > > > > > > >>> > > > > > > > > All changes are either moving properties (which are not >>> > > > > > > > > migrated) >>> > > > > > > > > or moving non-migrated QOM members (i.e. pointers of >>> > > > > > > > > ARMCPU, which >>> > > > > > > > > is still migrated elsewhere). So I don't see any obvious >>> > > > > > > > > migration >>> > > > > > > > > problem, but I might be missing something, so I Cc'ed Juan >>> > > > > > > > > :> >>> > > > > > >>> > > > > > FWIW, I didn't spot anything problematic either. >>> > > > > > >>> > > > > > I've ran this through my migration compatibility series [1] and it >>> > > > > > doesn't regress aarch64 migration from/to 8.2. The tests use '-M >>> > > > > > virt -cpu max', so the cortex-a7 and cortex-a15 are not covered. >>> > > > > > I don't >>> > > > > > think we even support migration of anything non-KVM on arm. >>> > > > > >>> > > > > it happens we do. >>> > > > > >>> > > > >>> > > > Oh, sorry, I didn't mean TCG here. Probably meant to say something >>> > > > like >>> > > > non-KVM-capable cpus, as in 32-bit. Nevermind. >>> > > >>> > > Theoretically, we should be able to migrate to a TCG guest. Well, this >>> > > worked in the past for PPC. When I was doing more KVM related changes, >>> > > this was very useful for dev. Also, some machines are partially >>> > > emulated. >>> > > Anyhow I agree this is not a strong requirement and we often break it. >>> > > Let's focus on KVM only. >>> > > >>> > > > > > 1- https://gitlab.com/farosas/qemu/-/jobs/5853599533 >>> > > > > >>> > > > > yes it depends on the QOM hierarchy and virt seems immune to the >>> > > > > changes. >>> > > > > Good. >>> > > > > >>> > > > > However, changing the QOM topology clearly breaks migration compat, >>> > > > >>> > > > Well, "clearly" is relative =) You've mentioned pseries and aspeed >>> > > > already, do you have a pointer to one of those cases were we broke >>> > > > migration >>> > > >>> > > Regarding pseries, migration compat broke because of 5bc8d26de20c >>> > > ("spapr: allocate the ICPState object from under sPAPRCPUCore") which >>> > > is similar to the changes proposed by this series, it impacts the QOM >>> > > hierarchy. Here is the workaround/fix from Greg : 46f7afa37096 >>> > > ("spapr: fix migration of ICPState objects from/to older QEMU") which >>> > > is quite an headache and this turned out to raise another problem some >>> > > months ago ... :/ That's why I sent [1] to prepare removal of old >>> > > machines and workarounds becoming a burden. >>> > >>> > This feels like something that could be handled by the vmstate code >>> > somehow. The state is there, just under a different path. >>> >>> What, the QOM path is used in migration? ... >> >> Hopefully not..
Unfortunately the original fix doesn't mention _what_ actually broke with migration. I assumed the QOM path was needed because otherwise I don't think the fix makes sense. The thread discussing that patch also directly mentions the QOM path: https://www.mail-archive.com/qemu-devel@nongnu.org/msg450912.html But I probably misunderstood something while reading that thread. >> >>> >>> See recent discussions on "QOM path stability": >>> https://lore.kernel.org/qemu-devel/zzfyvlmcxbcia...@redhat.com/ >>> https://lore.kernel.org/qemu-devel/87jzojbxt7....@pond.sub.org/ >>> https://lore.kernel.org/qemu-devel/87v883by34....@pond.sub.org/ >> >> If I read it right, the commit 46f7afa37096 example is pretty special that >> the QOM path more or less decided more than the hierachy itself but changes >> the existances of objects. > > Let's see whether I got this... > > We removed some useless objects, moved the useful ones to another home. > The move changed their QOM path. > > The problem was the removal of useless objects, because this also > removed their vmstate. If you checkout at the removal commit (5bc8d26de20c), the vmstate has been kept untouched. > > The fix was adding the vmstate back as a dummy. Since the vmstate was kept I don't see why would we need a dummy. The incoming migration stream would still have the state, only at a different point in the stream. It's surprising to me that that would cause an issue, but I'm not well versed in that code. > > The QOM patch changes are *not* part of the problem. The only explanation I can come up with is that after the patch migration has broken after a hotplug or similar operation. In such situation, the preallocated state would always be present before the patch, but sometimes not present after the patch in case, say, a hot-unplug has taken away a cpu + ICPState.