* Andrey Gruzdev (andrey.gruz...@virtuozzo.com) wrote: > On 21.01.2021 19:11, Dr. David Alan Gilbert wrote: > > * Andrey Gruzdev (andrey.gruz...@virtuozzo.com) wrote: > > > On 21.01.2021 12:56, Dr. David Alan Gilbert wrote: > > > > * Andrey Gruzdev (andrey.gruz...@virtuozzo.com) wrote: > > > > > On 19.01.2021 21:49, Dr. David Alan Gilbert wrote: > > > > > > * Andrey Gruzdev (andrey.gruz...@virtuozzo.com) wrote: > > > > > > > Introducing implementation of 'background' snapshot thread > > > > > > > which in overall follows the logic of precopy migration > > > > > > > while internally utilizes completely different mechanism > > > > > > > to 'freeze' vmstate at the start of snapshot creation. > > > > > > > > > > > > > > This mechanism is based on userfault_fd with wr-protection > > > > > > > support and is Linux-specific. > > > > > > I noticed there weren't any bdrv_ calls in here; I guess with a > > > > > > snapshot > > > > > > you still have the source running so still have it accessing the > > > > > > disk; > > > > > > do you do anything to try and wire the ram snapshotting up to disk > > > > > > snapshotting? > > > > > Block-related manipulations should be done externally, I think. > > > > > So create backing images for RW nodes, then stop VM, switch block > > > > > graph > > > > > and start background snapshot. Something like create 'virsh > > > > > snapshot-create-as' > > > > > does, but in other sequence. > > > > If you get a chance it would be great if you could put together an > > > > example of doing the combination RAM+block; that way we find out if > > > > there's > > > > anything silly missing. > > > > > > > > Dave > > > Yep, I'll take a look at the QMP command sequences, how it should look > > > like in our case and prepare an example, hope we are not missing > > > something serious. > > > At least we know that block setup data won't go to snapshot. > > > I've also checked starting background snapshot from the stopped VM state - > > > looks OK, VM resumes operation, snapshot is saved, no apparent problems. > > > > > > Maybe it will take some time, since now I'm on task to create tool to > > > store > > > snapshots with RAM indexable by GPFNs, together with the rest of VMSTATE. > > If you want to make it indexable, why not just do a simple write(2) call > > for the whole of RAM rather than doing the thing like normal migration? > > > > Dave > > For me the main reason is apparent file size.. While we can get the same > allocation > size when saving via write(2) on Linux, in many cases the apparent file size > will > be much bigger then if use QCOW2.
Do you mean because of zero pages or for some other reason? Dave > Andrey > > > > Based on QCOW2 format. Also it should support snapshot revert in postcopy > > > mode. > > > > > > Andrey > > > > > > > > // > > > > > > > > > > > > Signed-off-by: Andrey Gruzdev <andrey.gruz...@virtuozzo.com> > > > > > > > Acked-by: Peter Xu <pet...@redhat.com> > > > > > > > --- > > > > > > > migration/migration.c | 255 > > > > > > > +++++++++++++++++++++++++++++++++++++++++- > > > > > > > migration/migration.h | 3 + > > > > > > > migration/ram.c | 2 + > > > > > > > migration/savevm.c | 1 - > > > > > > > migration/savevm.h | 2 + > > > > > > > 5 files changed, 260 insertions(+), 3 deletions(-) > > > > > > > > > > > > > > diff --git a/migration/migration.c b/migration/migration.c > > > > > > > index 2c2cb9ef01..0901a15ac5 100644 > > > > > > <snip> > > > > > > > > > > > > > - qemu_thread_create(&s->thread, "live_migration", > > > > > > > migration_thread, s, > > > > > > > - QEMU_THREAD_JOINABLE); > > > > > > > + > > > > > > > + if (migrate_background_snapshot()) { > > > > > > > + qemu_thread_create(&s->thread, "background_snapshot", > > > > > > Unfortunately that wont work - there's a 14 character limit on > > > > > > the thread name length; I guess we just shorten that to bg_snapshot > > > > > Yep, missed that pthread_set_name_np() has a length limit) > > > > > > > > > > > Other than that, > > > > > > > > > > > > > > > > > > > > > > > > Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com> > > > > > > > > > > > > > + bg_migration_thread, s, QEMU_THREAD_JOINABLE); > > > > > > > + } else { > > > > > > > + qemu_thread_create(&s->thread, "live_migration", > > > > > > > + migration_thread, s, QEMU_THREAD_JOINABLE); > > > > > > > + } > > > > > > > s->migration_thread_running = true; > > > > > > > } > > > > > > > diff --git a/migration/migration.h b/migration/migration.h > > > > > > > index f40338cfbf..0723955cd7 100644 > > > > > > > --- a/migration/migration.h > > > > > > > +++ b/migration/migration.h > > > > > > > @@ -20,6 +20,7 @@ > > > > > > > #include "qemu/thread.h" > > > > > > > #include "qemu/coroutine_int.h" > > > > > > > #include "io/channel.h" > > > > > > > +#include "io/channel-buffer.h" > > > > > > > #include "net/announce.h" > > > > > > > #include "qom/object.h" > > > > > > > @@ -147,8 +148,10 @@ struct MigrationState { > > > > > > > /*< public >*/ > > > > > > > QemuThread thread; > > > > > > > + QEMUBH *vm_start_bh; > > > > > > > QEMUBH *cleanup_bh; > > > > > > > QEMUFile *to_dst_file; > > > > > > > + QIOChannelBuffer *bioc; > > > > > > > /* > > > > > > > * Protects to_dst_file pointer. We need to make sure we > > > > > > > won't > > > > > > > * yield or hang during the critical section, since this > > > > > > > lock will > > > > > > > diff --git a/migration/ram.c b/migration/ram.c > > > > > > > index 5707382db1..05fe0c8592 100644 > > > > > > > --- a/migration/ram.c > > > > > > > +++ b/migration/ram.c > > > > > > > @@ -1471,6 +1471,7 @@ static RAMBlock *poll_fault_page(RAMState > > > > > > > *rs, ram_addr_t *offset) > > > > > > > page_address = (void *) uffd_msg.arg.pagefault.address; > > > > > > > bs = qemu_ram_block_from_host(page_address, false, > > > > > > > offset); > > > > > > > assert(bs && (bs->flags & RAM_UF_WRITEPROTECT) != 0); > > > > > > > + > > > > > > > return bs; > > > > > > > } > > > > > > > #endif /* CONFIG_LINUX */ > > > > > > > @@ -1836,6 +1837,7 @@ static void > > > > > > > ram_save_host_page_post(RAMState *rs, PageSearchStatus *pss, > > > > > > > /* Un-protect memory range. */ > > > > > > > res = uffd_change_protection(rs->uffdio_fd, > > > > > > > page_address, run_length, > > > > > > > false, false); > > > > > > > + > > > > > > > /* We don't want to override existing error from > > > > > > > ram_save_host_page(). */ > > > > > > > if (res < 0 && *res_override >= 0) { > > > > > > > *res_override = res; > > > > > > > diff --git a/migration/savevm.c b/migration/savevm.c > > > > > > > index 27e842812e..dd4ad0aaaf 100644 > > > > > > > --- a/migration/savevm.c > > > > > > > +++ b/migration/savevm.c > > > > > > > @@ -1354,7 +1354,6 @@ int > > > > > > > qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool > > > > > > > in_postcopy) > > > > > > > return 0; > > > > > > > } > > > > > > > -static > > > > > > > int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile > > > > > > > *f, > > > > > > > bool > > > > > > > in_postcopy, > > > > > > > bool > > > > > > > inactivate_disks) > > > > > > > diff --git a/migration/savevm.h b/migration/savevm.h > > > > > > > index ba64a7e271..aaee2528ed 100644 > > > > > > > --- a/migration/savevm.h > > > > > > > +++ b/migration/savevm.h > > > > > > > @@ -64,5 +64,7 @@ int qemu_loadvm_state(QEMUFile *f); > > > > > > > void qemu_loadvm_state_cleanup(void); > > > > > > > int qemu_loadvm_state_main(QEMUFile *f, > > > > > > > MigrationIncomingState *mis); > > > > > > > int qemu_load_device_state(QEMUFile *f); > > > > > > > +int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f, > > > > > > > + bool in_postcopy, bool inactivate_disks); > > > > > > > #endif > > > > > > > -- > > > > > > > 2.25.1 > > > > > > > > > > > > -- > > > > > Andrey Gruzdev, Principal Engineer > > > > > Virtuozzo GmbH +7-903-247-6397 > > > > > virtuzzo.com > > > > > > > > -- > > > Andrey Gruzdev, Principal Engineer > > > Virtuozzo GmbH +7-903-247-6397 > > > virtuzzo.com > > > > > -- > Andrey Gruzdev, Principal Engineer > Virtuozzo GmbH +7-903-247-6397 > virtuzzo.com > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK