On 02/07/2018 07:57 AM, Michael S. Tsirkin wrote:
On Tue, Feb 06, 2018 at 07:08:18PM +0800, Wei Wang wrote:
Use the free page reporting feature from the balloon device to clear the
bits corresponding to guest free pages from the dirty bitmap, so that the
free memory are not sent.
Signed-off-by: Wei Wang <wei.w.w...@intel.com>
CC: Michael S. Tsirkin <m...@redhat.com>
CC: Juan Quintela <quint...@redhat.com>
What the patch seems to do is stop migration
completely - blocking until guest completes the reporting.
Which makes no sense to me, since it's just an optimization.
Why not proceed with the migration? What do we have to loose?
If we want the optimization to run in parallel with the migration
thread, we will need to create another polling thread, like
multithreading compression. In that way, we will waste some host CPU.
For example, the migration thread may proceed to send pages to the
destination while the optimization thread is in progress, but those
pages may turn out to be free pages (this is likely in the bulk stage)
which don't need to be sent. In that case, why not let the migration
thread wait a little bit (i.e. put the optimization into the migration
thread) and proceed to do some useful things, instead of pretending to
proceed but doing useless things?
The current plan of this patch is to skip free pages for the bulk stage
only. I'm not sure if it would be useful for the 2nd stage onward, which
basically relies on the dirty logging to send pages that have been
written by the guest. For example, if the guest is not so active while
live migration happens, there will be very few dirty bits. This
optimization would be mostly clearing "0" bits from the dirty bitmap.
I imagine some people might want to defer migration until reporting
completes to reduce the load on the network. Fair enough,
but it does not look like you actually measured the reduction
in traffic. So I suggest you work on that as a separate feature.
I have the traffic data actually. Tested with 8G idle guest, Legacy v.s.
Optimization: ~390MB v.s. ~337MB.
The legacy case has the zero page checking optimization, so the traffic
reduction is not very obvious. But zero checking has much more overhead,
which is demonstrated by the migration time (this optimization takes
~14% of the legacy migration time).
Best,
Wei