One of the callers of wait_for_close_co() already sets the state to SAVE_STATE_DONE before, but that is not fully correct, because at that moment, the operation is not fully done. In particular, if closing the target later fails, the state would even be set to SAVE_STATE_ERROR afterwards. DONE -> ERROR is not a valid state transition. Although, it should not matter in practice as long as the relevant QMP commands are sequential.
The other caller does not set the state and so there seems to be a race that could lead to the state not getting set at all. This is because before this commit, the wait_for_close_co() function could return early when there is no target file anymore. This can only happen when canceling and needs to happen right around the time when the snapshot is already finishing and closing the target. Simply avoid the early return and always set the state within the wait_for_close_co() function rather than at the call site. Signed-off-by: Fiona Ebner <f.eb...@proxmox.com> --- migration/savevm-async.c | 33 +++++++++++++++------------------ 1 file changed, 15 insertions(+), 18 deletions(-) diff --git a/migration/savevm-async.c b/migration/savevm-async.c index 1e79fce9ba..e63dc6d8a3 100644 --- a/migration/savevm-async.c +++ b/migration/savevm-async.c @@ -450,23 +450,22 @@ static void coroutine_fn wait_for_close_co(void *opaque) { int64_t timeout; - if (!snap_state.target) { - DPRINTF("savevm-end: no target file open\n"); - return; - } - - /* wait until cleanup is done before returning, this ensures that after this - * call exits the statefile will be closed and can be removed immediately */ - DPRINTF("savevm-end: waiting for cleanup\n"); - timeout = 30L * 1000 * 1000 * 1000; - qemu_co_sleep_ns_wakeable(&snap_state.target_close_wait, - QEMU_CLOCK_REALTIME, timeout); if (snap_state.target) { - save_snapshot_error("timeout waiting for target file close in " - "qmp_savevm_end"); - /* we cannot assume the snapshot finished in this case, so leave the - * state alone - caller has to figure something out */ - return; + /* wait until cleanup is done before returning, this ensures that after this + * call exits the statefile will be closed and can be removed immediately */ + DPRINTF("savevm-end: waiting for cleanup\n"); + timeout = 30L * 1000 * 1000 * 1000; + qemu_co_sleep_ns_wakeable(&snap_state.target_close_wait, + QEMU_CLOCK_REALTIME, timeout); + if (snap_state.target) { + save_snapshot_error("timeout waiting for target file close in " + "qmp_savevm_end"); + /* we cannot assume the snapshot finished in this case, so leave the + * state alone - caller has to figure something out */ + return; + } + } else { + DPRINTF("savevm-end: no target file open\n"); } // File closed and no other error, so ensure next snapshot can be started. @@ -497,8 +496,6 @@ void qmp_savevm_end(Error **errp) snap_state.saved_vm_running = false; } - snap_state.state = SAVE_STATE_DONE; - qemu_coroutine_enter(wait_for_close); } -- 2.39.5 _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel