Cédric Le Goater <c...@redhat.com> writes:

> On 2/8/24 14:07, Fabiano Rosas wrote:
>> Cédric Le Goater <c...@redhat.com> writes:
>> 
>>> close_return_path_on_source() retrieves the migration error from the
>>> the QEMUFile '->to_dst_file' to know if a shutdown is required. This
>>> shutdown is required to exit the return-path thread. However, in
>>> migrate_fd_cleanup(), '->to_dst_file' is cleaned up before calling
>>> close_return_path_on_source() and the shutdown is never performed,
>>> leaving the source and destination waiting for an event to occur.
>>>
>>> Avoid relying on '->to_dst_file' and use migrate_has_error() instead.
>>>
>>> Suggested-by: Peter Xu <pet...@redhat.com>
>>> Signed-off-by: Cédric Le Goater <c...@redhat.com>
>>> ---
>>>   migration/migration.c | 3 +--
>>>   1 file changed, 1 insertion(+), 2 deletions(-)
>>>
>>> diff --git a/migration/migration.c b/migration/migration.c
>>> index 
>>> d5f705ceef4c925589aa49335969672c0d761fa2..5f55af3d7624750ca416c4177781241b3e291e5d
>>>  100644
>>> --- a/migration/migration.c
>>> +++ b/migration/migration.c
>>> @@ -2372,8 +2372,7 @@ static bool 
>>> close_return_path_on_source(MigrationState *ms)
>>>        * cause it to unblock if it's stuck waiting for the destination.
>>>        */
>>>       WITH_QEMU_LOCK_GUARD(&ms->qemu_file_lock) {
>>> -        if (ms->to_dst_file && ms->rp_state.from_dst_file &&
>>> -            qemu_file_get_error(ms->to_dst_file)) {
>>> +        if (migrate_has_error(ms) && ms->rp_state.from_dst_file) {
>>>               qemu_file_shutdown(ms->rp_state.from_dst_file);
>>>           }
>>>       }
>> 
>> Hm, maybe Peter can help defend this, but this assumes that every
>> function that takes an 'f' and sets the file error also sets
>> migrate_set_error(). I'm not sure we have determined that, have we?
>
> How could we check all the code path ? I agree it is difficult when
> looking at the code :/

It would help if the thing wasn't called 'f' for the most part of the
code to begin with.

Whenever there's a file error at to_dst_file there's the chance that the
rp_state.from_dst_file got stuck. So we cannot ignore the file error.

Would it work if we checked it earlier during cleanup as you did
previously and then set the migration error?

Reply via email to