"Li, Liang Z" <liang.z...@intel.com> wrote: >> >> > Right now, we don't have an interface to detect that cases and got >> >> > back to the iterative stage. >> >> >> >> How about go back to the iterative stage when detect that the >> >> pending_size is larger Than max_size, like this: >> >> >> >> + /* do flush here is aimed to shorten the VM downtime, >> >> + * bdrv_flush_all is a time consuming operation >> >> + * when the guest has done some file writing */ >> >> + bdrv_flush_all(); >> >> + pending_size = qemu_savevm_state_pending(s->file, >> >> max_size); >> >> + if (pending_size && pending_size >= max_size) { >> >> + qemu_mutex_unlock_iothread(); >> >> + continue; >> >> + } >> >> ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); >> >> if (ret >= 0) { >> >> qemu_file_set_rate_limit(s->file, INT64_MAX); >> >> >> >> and this is quite simple. >> > >> > Yes, but it is too simple. If you hold all the locks during >> > bdrv_flush_all(), your VM will effectively stop as soon as it performs >> > the next I/O access, so you don't win much. And you still don't have a >> > timeout for cases where the flush takes really long. >> >> This is probably better than what we had now (basically we are "meassuring" >> after bdrv_flush_all how much the amount of dirty memory has changed, >> and return to iterative stage if it took too much. A timeout would be better >> anyways. And an interface te start the synchronization sooner >> asynchronously would be also good. >> >> Notice that my understanding is that any proper fix for this is 2.4 material. > > Then, how to deal with this issue in 2.3, leave it here? or make an > incomplete fix like I do above?
I think it is better to leave it here for 2.3. With a patch like this one, we improve in one load and we got worse in a different load (depens a lot in the ratio of dirtying memory vs disk). I have no data which load is more common, so I prefer to be conservative so late in the cycle. What do you think? Later, Juan. > > Liang > >> Thanks, Juan.