On 2025-08-14 15:24, Peter Xu wrote:
> On Thu, Aug 14, 2025 at 05:42:23PM +0200, Juraj Marcin wrote:
> > Fair point, I'll then continue with the PING/PONG solution, the first
> > implementation I have seems to be working to resolve Issue 1.
> > 
> > For rarer split brain, we'll rely on block device locks/mgmt to resolve
> > and change the failure handling, so it registers errors from disk
> > activation.
> > 
> > As tested, there should be no problems with the destination
> > transitioning to POSTCOPY_PAUSED, since the VM was not started yet.
> > 
> > However, to prevent the source side from transitioning to
> > POSTCOPY_PAUSED, I think adding a new state is still the best option.
> > 
> > I tried keeping the migration states as they are now and just rely on an
> > attribute of MigrationState if 3rd PONG was received, however, this
> > collides with (at least) migrate_pause tests, that are waiting for
> > POSTCOPY_ACTIVE, and then pause the migration triggering the source to
> > resume. We could maybe work around it by waiting for the 3rd pong
> > instead, but I am not sure if it is possible from tests, or by not
> > resuming if migrate_pause command is executed?
> > 
> > I also tried extending the span of the DEVICE state, but some functions
> > behave differently depending on if they are in postcopy or not, using
> > the migration_in_postcopy() function, but adding the DEVICE there isn't
> > working either. And treating the DEVICE state sometimes as postcopy and
> > sometimes as not seems just too messy, if it would even be possible.
> 
> Yeah, it might indeed be a bit messy.
> 
> Is it possible to find a middle ground?  E.g. add postcopy-setup status,
> but without any new knob to enable it?  Just to describe the period of time
> where dest QEMU haven't started running but started loading device states.

Yes, as the ping/pong solution doesn't require any changes in the
protocol, there's no need for a new capability and the new state can be
always used.

> 
> The hope is libvirt (which, AFAIU, always enables the "events" capability)
> can ignore the new postcopy-setup status transition, then maybe we can also
> introduce the postcopy-setup and make it always appear.
> 
> Thanks,
> 
> -- 
> Peter Xu
> 


Reply via email to