On 05/28/2015 04:56 PM, Denis V. Lunev wrote: > On 28/05/15 23:09, John Snow wrote: >> >> On 05/26/2015 10:51 AM, Denis V. Lunev wrote: >>> On 26/05/15 17:48, Denis V. Lunev wrote: >>>> On 21/05/15 19:44, John Snow wrote: >>>>> On 05/21/2015 09:57 AM, Denis V. Lunev wrote: >>>>>> On 21/05/15 16:51, Vladimir Sementsov-Ogievskiy wrote: >>>>>>> Hi all. >>>>>>> >>>>>>> Hmm. There is an interesting suggestion from Denis Lunev (in CC) >>>>>>> about >>>>>>> how to drop meta bitmaps and make things easer. >>>>>>> >>>>>>> method: >>>>>>> >>>>>>>> start migration >>>>>>> disk and memory are migrated, but not dirty bitmaps. >>>>>>>> stop vm >>>>>>> create all necessary bitmaps in destination vm (empty, but with same >>>>>>> names and granularities and enabled flag) >>>>>>>> start destination vm >>>>>>> empty bitmaps are tracking now >>>>>>>> start migrating dirty bitmaps. merge them to corresponding bitmaps >>>>>>> in destination >>>>>>> while bitmaps are migrating, they should be in some kind of >>>>>>> 'inconsistent' state. >>>>>>> so, we can't start backup or other migration while bitmaps are >>>>>>> migrating, but vm is already _running_ on destination. >>>>>>> >>>>>>> what do you think about it? >>>>>>> >>>>>> the description is a bit incorrect >>>>>> >>>>>> - start migration process, perform memory and disk migration >>>>>> as usual. VM is still executed at source >>>>>> - start VM on target. VM on source should be on pause as usual, >>>>>> do not finish migration process. Running VM on target "writes" >>>>>> normally setting dirty bits as usual >>>>>> - copy active dirty bitmaps from source to target. This is safe >>>>>> as VM on source is not running >>>>>> - "OR" copied bitmaps with ones running on target >>>>>> - finish migration process (stop source VM). >>>>>> >>>>>> Downtime will not be increased due to dirty bitmaps with this >>>>>> approach, migration process is very simple - plain data copy. >>>>>> >>>>>> Regards, >>>>>> Den >>>>>> >>>>> I was actually just discussing the live migration approach a little >>>>> bit >>>>> ago with Stefan, trying to decide on the "right" packet format (The >>>>> only >>>>> two patches I haven't ACKed yet are ones in which we need to choose a >>>>> send size) and we decided that 1KiB chunk sends would be >>>>> appropriate for >>>>> live migration. >>>>> >>>>> I think I'm okay with that method, but obviously this approach >>>>> outlined >>>>> here would also work very well and would avoid meta bitmaps, chunk >>>>> sizes, migration tuning, convergence questions, etc etc etc. >>>>> >>>>> You'd need to add a new status to the bitmap on the target (maybe >>>>> "INCOMPLETE" or "MIGRATING") that prevents it from being used for a >>>>> backup operation without preventing it from recording new writes. >>>>> >>>>> My only concern is how easy it will be to work this into the migration >>>>> workflow. >>>>> >>>>> It would require some sort of "post-migration" ternary phase, I >>>>> suppose, >>>>> for devices/data that can be transferred after the VM starts -- and I >>>>> suspect we'll be the only use of that phase for now. >>>>> >>>>> David, what are your thoughts, here? Would you prefer Vladimir and I >>>>> push forward on the live migration approach, or add a new post-hoc >>>>> phase? This approach might be simpler on the block layer, but I >>>>> would be >>>>> rather upset if he scrapped his entire series for the second time for >>>>> another approach that also didn't get accepted. >>>>> >>>>> --js >>>> hmmm.... It looks like we should proceed with this to fit 2.4 dates. >>>> There is not much interest at the moment. I think that we could >>>> implement this later in 2.5 etc... >>>> >>>> Regards, >>>> Den >>> oops. I have written something strange. Anyway, I think that for >>> now we should proceed with this patchset to fit QEMU 2.4 dates. >>> The implementation with additional stage (my proposal) could be >>> added later, f.e. in 2.5 as I do not see much interest from migration >>> gurus. >>> >>> In this case the review will take a ... lot of time. >>> >>> Regards, >>> Den >>> >> That sounds good to me. I think this solution is workable for 2.4, and >> we can begin working on a post-migration phase for the future to help >> simplify our cases a lot. >> >> I have been out sick much of this week, so apologies in my lack of >> fervor getting this series upstream recently. >> >> --js > no prob :)
Had a chat with Stefan about this approach and apparently that's what the postcopy migration patches on-list are all about. Stefan brought up the point of post-hoc reliability: It's possible to transfer control to the new VM and then lose your link, making migration completion impossible. Adding a post-copy phase to our existing live migration is a non-starter, because it introduces unfairly this unreliability to the existing system. However, we can make this idea work for migrations started via the post-copy mechanism, because the entire migration already carries that known risk of completion failure. It seems like the likely outcome though is that migrations will be able to be completed with either mechanism in the future: either up-front migration or post-copy migration. In that light, it seems we won't be able to fully rid ourselves of the meta_bitmap idea, making the post-copy idea here not too useful in culling our complexity, since we'll have to support the current standard live migration anyway. So I have reviewed the current set of patches under the assumption that it seems like the right way to go for 2.4 and beyond. Thank you! --js