> This needs further review/changes on the block layer. > > First explanation, why I think this don't fix the full problem. > Whith this patch, we fix the problem where we have a dirty block layer but > basically nothing dirtying the memory on the guest (we are moving the 20 > seconds from max_downtime for the blocklayer flush), to 20 seconds until > we have decided that the amount of dirty memory is small enough to be > transferred during max_downtime. But it is still going to take 20 seconds to > flush the block layer, and during that 20 seconds, the amount of memory that > can be dirty is HUGE.
It's true. > I think our ouptions are: > > - tell the block layer at the beggining of migration > Hey, we are migrating, could you please start flusing data now, and > don't get the caches to grow too much, please, pretty please. > (I left the API to the block layer) > - Add on that point a new function: > bdrvr_flush_all_start() > That starts the sending of pages, and we "hope" that by the time that > we have migrated all memory, they have also finished (so our last > call to block_flush_all() have less work to do) > - Add another function: > int bdrv_flush_all_timeout(int timeout) > that returns if timeout pass, telling if it has migrated all pages or > timeout has passed. So we can got back to the iterative stage if it > has taken too long. > > Notice that *normally* bdrv_flush_all() is very fast, the problem is that > sometimes it get really, really slow (NFS decided to go slow, TCP drop a > packet, whatever). > > Right now, we don't have an interface to detect that cases and got back to > the iterative stage. How about go back to the iterative stage when detect that the pending_size is larger Than max_size, like this: + /* do flush here is aimed to shorten the VM downtime, + * bdrv_flush_all is a time consuming operation + * when the guest has done some file writing */ + bdrv_flush_all(); + pending_size = qemu_savevm_state_pending(s->file, max_size); + if (pending_size && pending_size >= max_size) { + qemu_mutex_unlock_iothread(); + continue; + } ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); if (ret >= 0) { qemu_file_set_rate_limit(s->file, INT64_MAX); and this is quite simple. > So, I agree whit the diagnosis that there is a problem there, but I think that > the solution is more complex that this. You helped one load making a > different other worse. I am not sure which of the two compromises is > better :-( > > Makes this sense? > > Later, Juan. >