* Jay Zhou (jianjay.z...@huawei.com) wrote: > Hi Dave, > > On 2017/7/21 17:49, Dr. David Alan Gilbert wrote: > > * Jay Zhou (jianjay.z...@huawei.com) wrote: > > > Qemu_savevm_state_cleanup() takes about 300ms in my ram migration tests > > > with a 8U24G vm(20G is really occupied), the main cost comes from > > > KVM_SET_USER_MEMORY_REGION ioctl when mem.memory_size = 0 in > > > kvm_set_user_memory_region(). In kmod, the main cost is > > > kvm_zap_obsolete_pages(), which traverses the active_mmu_pages list to > > > zap the unsync sptes. > > > > Hi Jay, > > Is this actually increasing the real downtime when the guest isn't > > running, or is it just the reported time? I see that the s->downtime > > value is calculated right after where we currently call > > qemu_savevm_state_cleanup. > > It actually increased the real downtime, I used the "ping" command to > test. Reason is that the source side libvirt sends qmp to qemu to query > the status of migration, which needs the BQL. qemu_savevm_state_cleanup > is done with BQL, qemu can not handle the qmp if qemu_savevm_state_cleanup > has not finished. And the source side libvirt delays about 300ms to notify > the destination side libvirt to send the "cont" command to start the vm. > > I think the value of s->downtime is not accurate enough, maybe we could > move the calculation of end_time after qemu_savevm_state_cleanup has done.
I'm copying in Paolo, Radim and Andrea- is there anyway we can make the teardown of KVMs dirty tracking not take so long? 300ms is a silly long time on only a small VM. > > I guess the biggest problem is that 300ms happens before we restart > > the guest on the source if a migration fails. > > 300ms happens even if a migration succeeds. Hmm, OK, this needs fixing then - it does explain a result I saw a while ago where the downtime was much bigger with libvirt than it was with qemu on it's own. > > > I think it can be optimized: > > > (1) source vm will be destroyed if the migration is successfully done, > > > so the resources will be cleanuped automatically by the system > > > (2) delay the cleanup if the migration failed > > > > I don't like putting it in qmp_cont; that shouldn't have migration magic > > in it. > > Yes, it is not a ideal place. :( > > > I guess we could put it in migrate_fd_cleanup perhaps? It gets called on > > a bh near the end - or could we just move it closer to the end of > > migration_thread? > > I have tested putting it in migrate_fd_cleanup, but the downtime is not > optimized. So I think it is the same to move it closer to the end of > migration_thread if it holds the BQL. > Could we put it in migrate_init? Your explanation above hints as to why migrate_fd_cleanup doesn't help; it's because we're still going to be doing it with the BQL taken. Can you tell me which version of libvirt you're using? I thought the newer ones were supposed to use events so they did't have to poll qemu. If we move qemu_savevm_state_cleanup is it still safe? Are there some things we're supposed to do at that point which are wrong if we don't. I wonder about something like; take a mutex in memory_global_dirty_log_start, release it in memory_global_dirty_log_stop. Then make ram_save_cleanup start a new thread that does the call to memory_global_dirty_log_stop. Dave > Thanks, > Jay > > > However, we would need to be a bit careful of anything that needs > > cleaning up before the source restarts on failure; I'm not sure of > > the semantics of all the current things wired into save_cleanup. > > > > Dave > > > > > > > Signed-off-by: Jay Zhou <jianjay.z...@huawei.com> > > > --- > > > migration/migration.c | 16 +++++++++------- > > > qmp.c | 10 ++++++++++ > > > 2 files changed, 19 insertions(+), 7 deletions(-) > > > > > > diff --git a/migration/migration.c b/migration/migration.c > > > index a0db40d..72832be 100644 > > > --- a/migration/migration.c > > > +++ b/migration/migration.c > > > @@ -1877,6 +1877,15 @@ static void *migration_thread(void *opaque) > > > if (qemu_file_get_error(s->to_dst_file)) { > > > migrate_set_state(&s->state, current_active_state, > > > MIGRATION_STATUS_FAILED); > > > + /* > > > + * The resource has been allocated by migration will be > > > reused in > > > + * COLO process, so don't release them. > > > + */ > > > + if (!enable_colo) { > > > + qemu_mutex_lock_iothread(); > > > + qemu_savevm_state_cleanup(); > > > + qemu_mutex_unlock_iothread(); > > > + } > > > trace_migration_thread_file_err(); > > > break; > > > } > > > @@ -1916,13 +1925,6 @@ static void *migration_thread(void *opaque) > > > end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); > > > > > > qemu_mutex_lock_iothread(); > > > - /* > > > - * The resource has been allocated by migration will be reused in > > > COLO > > > - * process, so don't release them. > > > - */ > > > - if (!enable_colo) { > > > - qemu_savevm_state_cleanup(); > > > - } > > > if (s->state == MIGRATION_STATUS_COMPLETED) { > > > uint64_t transferred_bytes = qemu_ftell(s->to_dst_file); > > > s->total_time = end_time - s->total_time; > > > diff --git a/qmp.c b/qmp.c > > > index b86201e..0e68eaa 100644 > > > --- a/qmp.c > > > +++ b/qmp.c > > > @@ -37,6 +37,8 @@ > > > #include "qom/object_interfaces.h" > > > #include "hw/mem/pc-dimm.h" > > > #include "hw/acpi/acpi_dev_interface.h" > > > +#include "migration/migration.h" > > > +#include "migration/savevm.h" > > > > > > NameInfo *qmp_query_name(Error **errp) > > > { > > > @@ -200,6 +202,14 @@ void qmp_cont(Error **errp) > > > if (runstate_check(RUN_STATE_INMIGRATE)) { > > > autostart = 1; > > > } else { > > > + /* > > > + * Delay the cleanup to reduce the downtime of migration. > > > + * The resource has been allocated by migration will be reused > > > + * in COLO process, so don't release them. > > > + */ > > > + if (runstate_check(RUN_STATE_POSTMIGRATE) && > > > !migrate_colo_enabled()) { > > > + qemu_savevm_state_cleanup(); > > > + } > > > vm_start(); > > > } > > > } > > > -- > > > 1.8.3.1 > > > > > > > > -- > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK > > > > . > > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK