Peter Xu <pet...@redhat.com> wrote: > On Tue, May 30, 2017 at 05:59:10PM +0200, Juan Quintela wrote: >> Peter Xu <pet...@redhat.com> wrote: >> > Let this be a flag, default to on. Turn it off for <=2.9 versions. >> > >> > After this patch, return path will be on even for pre-copy migration as >> > long as the transport support, e.g., for socket typed transport >> > including "tcp|udp|unix" typed. >> > >> > This will naturally fix the bug mentioned below, when destination failed >> > on migration but source assumed it was successful - since now even for >> > precopy, source will wait for destination's MIG_RP_MSG_SHUT signal, >> > which will carry the final migration status of destination. Then, when >> > destination failed at any point of migration, source will know it, and >> > it'll resume the VM instead of a data lost. >> > >> > Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1439147 >> > Signed-off-by: Peter Xu <pet...@redhat.com> >> > --- >> > include/hw/compat.h | 4 ++++ >> > include/migration/migration.h | 3 +++ >> > migration/migration.c | 15 ++++++++++++++- >> > 3 files changed, 21 insertions(+), 1 deletion(-) >> > >> > diff --git a/include/hw/compat.h b/include/hw/compat.h >> > index 55b1765..049457b 100644 >> > --- a/include/hw/compat.h >> > +++ b/include/hw/compat.h >> > @@ -6,6 +6,10 @@ >> > .driver = "pci-bridge",\ >> > .property = "shpc",\ >> > .value = "off",\ >> > + },{\ >> > + .driver = "migration",\ >> > + .property = "return-path",\ >> > + .value = "off",\ >> > }, >> > >> > #define HW_COMPAT_2_8 \ >> > diff --git a/include/migration/migration.h b/include/migration/migration.h >> > index 70710de..e44119c 100644 >> > --- a/include/migration/migration.h >> > +++ b/include/migration/migration.h >> > @@ -169,6 +169,9 @@ typedef struct MigrationState { >> > int64_t colo_checkpoint_time; >> > QEMUTimer *colo_delay_timer; >> > >> > + /* Whether to try to enable return-path even for pre-copy */ >> > + bool enable_return_path; >> > + >> > /* The last error that occurred */ >> > Error *error; >> > } MigrationState ; >> > diff --git a/migration/migration.c b/migration/migration.c >> > index 6df3483..16a856a 100644 >> > --- a/migration/migration.c >> > +++ b/migration/migration.c >> > @@ -2046,7 +2046,7 @@ static void *migration_thread(void *opaque) >> > static bool migrate_return_path_create(MigrationState *s) >> > { >> > /* Whether we should enable return path */ >> > - bool enable_return_path = false; >> > + bool enable_return_path = s->enable_return_path; >> >> As you can see on my suggestion for this piece of code, just add the >> ()s->enable_return_path &&) to the right place on the call? >> >> Thanks, Juan. > > Do you mean this? > > /* > * Open the return path > */ > if (migrate_postcopy_ram() || s->enable_return_path) { > if (!migrate_return_path_create(s)) { > migrate_set_state(&s->state, MIGRATION_STATUS_SETUP, > MIGRATION_STATUS_FAILED); > migrate_fd_cleanup(s); > return; > } > }
Yeap. > Here what I wanted to achieve is that: > > a. for postcopy, we should try to enable return path, and it must > succeed > > b. for the case when enable_return_path is set, we try to enable return > path, but even if it failed, we can still continue > > Could we really achieve (b) if with above code? Or anything I missed? if we enable_return_path -> it should success, otherwise it makes no sense, no? We can try to remove the return path for some transports if needed, but it makes no sense to enable a property that means: "please, pretty please, enable it if you can" if we are going to do it that way, then it is better to change the property the other way around: - disable_return_path: set for all old machine types And not set for newer machine types, meaning that we just try. What do you think? Later, Juan.