On 2025-08-14 15:24, Peter Xu wrote: > On Thu, Aug 14, 2025 at 05:42:23PM +0200, Juraj Marcin wrote: > > Fair point, I'll then continue with the PING/PONG solution, the first > > implementation I have seems to be working to resolve Issue 1. > > > > For rarer split brain, we'll rely on block device locks/mgmt to resolve > > and change the failure handling, so it registers errors from disk > > activation. > > > > As tested, there should be no problems with the destination > > transitioning to POSTCOPY_PAUSED, since the VM was not started yet. > > > > However, to prevent the source side from transitioning to > > POSTCOPY_PAUSED, I think adding a new state is still the best option. > > > > I tried keeping the migration states as they are now and just rely on an > > attribute of MigrationState if 3rd PONG was received, however, this > > collides with (at least) migrate_pause tests, that are waiting for > > POSTCOPY_ACTIVE, and then pause the migration triggering the source to > > resume. We could maybe work around it by waiting for the 3rd pong > > instead, but I am not sure if it is possible from tests, or by not > > resuming if migrate_pause command is executed? > > > > I also tried extending the span of the DEVICE state, but some functions > > behave differently depending on if they are in postcopy or not, using > > the migration_in_postcopy() function, but adding the DEVICE there isn't > > working either. And treating the DEVICE state sometimes as postcopy and > > sometimes as not seems just too messy, if it would even be possible. > > Yeah, it might indeed be a bit messy. > > Is it possible to find a middle ground? E.g. add postcopy-setup status, > but without any new knob to enable it? Just to describe the period of time > where dest QEMU haven't started running but started loading device states.
Yes, as the ping/pong solution doesn't require any changes in the protocol, there's no need for a new capability and the new state can be always used. > > The hope is libvirt (which, AFAIU, always enables the "events" capability) > can ignore the new postcopy-setup status transition, then maybe we can also > introduce the postcopy-setup and make it always appear. > > Thanks, > > -- > Peter Xu >