On Fri, 19 Nov 2010 20:23:55 -0600 Anthony Liguori <anth...@codemonkey.ws> wrote:
> On 11/17/2010 08:32 PM, Wen Congyang wrote: > > When the total sent page size is larger than max_factor > > times of the size of guest OS's memory, stop the > > iteration. > > The default value of max_factor is 3. > > > > This is similar to XEN. > > > > > > Signed-off-by: Wen Congyang > > > > I'm strongly opposed to doing this. I think Xen gets this totally wrong. > > Migration is a contract. When you set the stop time, you're saying that > you want only want the guest to experience a fixed amount of downtime. > Stopping the guest after some arbitrary number of iterations makes the > downtime non-deterministic. With a very large guest, this could wreak > havoc causing dropped networking connections, etc. > > It's totally unsafe. > > If a management tool wants this behavior, they can set a timeout and > explicitly stop the guest during the live migration. IMHO, such a > management tool is not doing it's job properly but it still can be > implemented. > Hmm, is there any information available for management-tools about "the reason migration failed was because migration never ends because of new dirty pages" or some ? I'm grad if I know cold-migraton will success at high rate before stop machine even when live migration failed by timeout. If the "network" or "target node is too busy" is the reason of failure, cold migration will also be in trouble and we'll see longer down time than expected. I think it's helpful to show how the transfer was done, as "sent 3x pages of guest pages but failed." any idea ? Thanks, -Kame