On (Fri) 31 Jul 2015 [10:50:46], Dr. David Alan Gilbert wrote: > * Amit Shah (amit.s...@redhat.com) wrote: > > On (Tue) 16 Jun 2015 [11:26:48], Dr. David Alan Gilbert (git) wrote: > > > From: "Dr. David Alan Gilbert" <dgilb...@redhat.com> > > > > > > Once we're in postcopy the source processors are stopped and memory > > > shouldn't change any more, so there's no need to look at the dirty > > > map. > > > > > > There are two notes to this: > > > 1) If we do resync and a page had changed then the page would get > > > sent again, which the destination wouldn't allow (since it might > > > have also modified the page) > > > 2) Before disabling this I'd seen very rare cases where a page had been > > > marked dirtied although the memory contents are apparently identical > > > > I suppose we don't know why. Any way to send a message to the dest > > with this info, so the dest can print out something? That'll help in > > debugging. (I'm suggesting sending a message to the dest, because > > after a migration, we don't ever think of looking at messages on the > > src. And chances are the dest could blow up after a migration is > > successful because of such "corruption".) > > One way perhaps would be to do one more sync at the end, after migration > is apparently finished, but before the socket was closed; that would > detect these changes and you could send a message to the other end. However, > given that (2) I say that where I'd seen it the page contents were > identical, this could be a false alarm, so we'd need to be careful. > It also doesn't help you find out *why* it happens, since tracing > back from a bit in the migration bitmap to the area of memory > and the thing that marked it dirty is very hard. The only way to do > that, is to mark the memory as read-only and then get a backtrace > to find out who tried to change it; but you don't want to do > that on a normal build and cause the source to die.
Agreed - but some notification that something might possibly be wrong is better than we not having such a clue, and fervently trying to debug an issue. In fact, a per-VM flag could be better since multiple migrations may mean such notifications could be lost in the logs of a previous host which we don't examine. Amit