On 9 Oct 2024, at 12:10 AM, Fabiano Rosas <faro...@suse.de> wrote:

!-------------------------------------------------------------------|
 CAUTION: External Email

|-------------------------------------------------------------------!

Peter Xu <pet...@redhat.com<mailto:pet...@redhat.com>> writes:

On Tue, Oct 08, 2024 at 11:20:03AM -0300, Fabiano Rosas wrote:
Peter Xu <pet...@redhat.com> writes:

On Mon, Oct 07, 2024 at 03:44:51PM +0000, Shivam Kumar wrote:
If the client calls the QMP command to reset the migration
capabilities after the migration status is set to failed or cancelled

Is cancelled ok?

Asked because I think migrate_fd_cleanup() should still be in CANCELLING
stage there, so no one can disable multifd capability before that, it
should fail the QMP command.

But FAILED indeed looks problematic.

IIUC it's not only to multifd alone - is it a race condition that
migrate_fd_cleanup() can be invoked without migration_is_running() keeps
being true?  Then I wonder what happens if a concurrent QMP "migrate"
happens together with migrate_fd_cleanup(), even with multifd always off.

Do we perhaps need to cleanup everything before the state changes to
FAILED?


Should we make CANCELLED the only terminal state aside from COMPLETED?
So migrate_fd_cleanup would set CANCELLED whenever it sees either
CANCELLING or FAILED.

I think that may be a major ABI change that can be risky, as we normally
see CANCELLED to be user's choice.

Ok, I misunderstood your proposal.


If we really want an ABI change, we could also introduce FAILING too, but I
wonder what I replied in the other email could work without any ABI change,
but close the gap on this race.

I don't think we want a FAILING state, but indeed something else that
conveys the same meaning as CANCELLING. I have already suggested
something similar in our TODO list[1]. We need a clear indication of
both "cancelling" and "failing" that's decoupled from the state ABI. Of
course we're talking only about "failing" here, we can leave
"cancelling" which is more complex for another time maybe.

What multifd does with ->exiting seems sane to me.

1- 
https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.qemu.org_ToDo_LiveMigration-23Migration-5Fcancel-5Fconcurrency&d=DwIBAg&c=s883GpUCOChKOHiocYtGcg&r=4hVFP4-J13xyn-OcN0apTCh8iKZRosf5OJTQePXBMB8&m=8BHh6O05G9bfMxWIM951LFPPGU1RqpOpPUOd646hGmzh7_Aes30zw81Pj4OAxVmc&s=xqf0rCR4tKMBpr7flPSuGtGkAFy5txwi0Wf_Sa-MR84&e=
Having flags to track the 'cancelling' and ‘failing’ states makes
sense. I think they should be a part of MigrationState itself. I will
send follow-up patches.

However, can this patch be accpeted as a cosmetic change? To me, it
makes sense to check 'multifd_send_state' and not migration multifd
capability before cleaning 'multifd_send_state'.  And this also helps
with one race at least (with qmp_migrate_set_capabilities).
Please let me know if you have different thoughts.

Reply via email to