On Mon, Nov 23, 2009 at 10:09:15AM -0600, Anthony Liguori wrote: > Gleb Natapov wrote: > >On Mon, Nov 23, 2009 at 09:32:48AM -0600, Anthony Liguori wrote: > >>Gleb Natapov wrote: > >>>On Mon, Nov 23, 2009 at 09:05:58AM -0600, Anthony Liguori wrote: > >>>>Gleb Natapov wrote: > >>>>>Then I don't see why Juan claims what he claims. > >>>>Live migration is unidirectional. As long as qemu can send out all > >>>>of the data without the stream closing, it will "succeed" on the > >>>>source. While this may sound like a bug, it's an impossible problem > >>>>to solve as it's dealing with reliable communication between two > >>>>unreliable nodes (i.e. the two general's problem). This is why the > >>>>source qemu does not exit after a successful live migration. It > >>>As far as I remember the two general's problem talks about unreliable > >>>channel, not unreliable nodes. > >>That's just semantics. The problem is that one general does not > >>know if the other general received the message. Even if there was a > >>reliable channel between the two generals, if one of the generals > >>can die with no indication, then you still have the same problem, > >>i.e. the first general doesn't know for sure if the second general > >>received the message. > >> > >>>Why not having destination send ACK/NACK > >>>to the source when it knows that migration succeeded/failed. > >>1) Source sends migration traffic > >>2) Destination receives it, sends Ack > >>3) Destination needs to wait to receive Ack from Source before > >>starting guest to ensure that guest does not start twice > >>4) Source receives Ack from Destination, sends Ack > >>5) Source kills guest > >>6) Destination receives Ack from Source, starts guest > >> > >>If Destination dies in between 5 and 6, the VM disappears. > >> > >1) Source sends migration traffic > >2) Destination receives it, sends Ack > >3) Destination start running > >4) Source receives Ack from Destination > >5) Source kills guest > > > >If Source does not receive Ack it stays paused and wait for management to > >sort things out. > > Is it really useful to kill the source guest in this case? I'm wary > of how useful an unreliable ack is namely because it introduces > rather complex semantics from a management tool perspective. If > folks think it would be really useful, I'm not fundamentally opposed > to it. > I am OK with management being responsible to sort things out. Juan said that destination can't abort migration in the middle, so I pointed out easy solution that will work in 99.999% cases.
-- Gleb.