On 2/8/24 14:57, Fabiano Rosas wrote:
Cédric Le Goater <c...@redhat.com> writes:
On 2/8/24 14:07, Fabiano Rosas wrote:
Cédric Le Goater <c...@redhat.com> writes:
close_return_path_on_source() retrieves the migration error from the
the QEMUFile '->to_dst_file' to know if a shutdown is required. This
shutdown is required to exit the return-path thread. However, in
migrate_fd_cleanup(), '->to_dst_file' is cleaned up before calling
close_return_path_on_source() and the shutdown is never performed,
leaving the source and destination waiting for an event to occur.
Avoid relying on '->to_dst_file' and use migrate_has_error() instead.
Suggested-by: Peter Xu <pet...@redhat.com>
Signed-off-by: Cédric Le Goater <c...@redhat.com>
---
migration/migration.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index
d5f705ceef4c925589aa49335969672c0d761fa2..5f55af3d7624750ca416c4177781241b3e291e5d
100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2372,8 +2372,7 @@ static bool close_return_path_on_source(MigrationState
*ms)
* cause it to unblock if it's stuck waiting for the destination.
*/
WITH_QEMU_LOCK_GUARD(&ms->qemu_file_lock) {
- if (ms->to_dst_file && ms->rp_state.from_dst_file &&
- qemu_file_get_error(ms->to_dst_file)) {
+ if (migrate_has_error(ms) && ms->rp_state.from_dst_file) {
qemu_file_shutdown(ms->rp_state.from_dst_file);
}
}
Hm, maybe Peter can help defend this, but this assumes that every
function that takes an 'f' and sets the file error also sets
migrate_set_error(). I'm not sure we have determined that, have we?
How could we check all the code path ? I agree it is difficult when
looking at the code :/
It would help if the thing wasn't called 'f' for the most part of the
code to begin with.
Whenever there's a file error at to_dst_file there's the chance that the
rp_state.from_dst_file got stuck. So we cannot ignore the file error.
Would it work if we checked it earlier during cleanup as you did
previously and then set the migration error?
Do you mean doing something similar to what is done in
source_return_path_thread() ?
if (qemu_file_get_error(s->to_dst_file)) {
qemu_file_get_error_obj(s->to_dst_file, &err);
if (err) {
migrate_set_error(ms, err);
error_free(err);
...
Yes. That would be safer I think.
Nevertheless, I am struggling to understand how qemu_file_set_error()
and migrate_set_error() fit together. I was expecting some kind of
synchronization routine but there isn't it seems. Are they completely
orthogonal ? when should we use these routines and when not ?
My initial goal was to modify some of the memory handlers (log_global*)
and migration handlers to propagate errors at the QMP level and them
report to the management layer. This is growing in something bigger
and currently, I don't find a good approach to the problem.
The last two patches of this series try to fix the return-path thread
termination. Let's keep that for after.
Thanks,
C.