On 12/22/2015 03:43 PM, Eric Blake wrote: > On 12/22/2015 09:46 AM, Kevin Wolf wrote: >> So far, live migration with shared storage meant that the image is in a >> not-really-ready don't-touch-me state on the destination while the >> source is still actively using it, but after completing the migration, >> the image was fully opened on both sides. This is bad. >> >> This patch adds a block driver callback to inactivate images on the >> source before completing the migration. Inactivation means that it goes >> to a state as if it was just live migrated to the qemu instance on the >> source (i.e. BDRV_O_INCOMING is set). You're then supposed to continue >> either on the source or on the destination, which takes ownership of the >> image. >> >> A typical migration looks like this now with respect to disk images: >> >> 1. Destination qemu is started, the image is opened with >> BDRV_O_INCOMING. The image is fully opened on the source. >> >> 2. Migration is about to complete. The source flushes the image and >> inactivates it. Now both sides have the image opened with >> BDRV_O_INCOMING and are expecting the other side to still modify it. > > The name BDRV_O_INCOMING now doesn't quite match semantics on the > source, but I don't have any better suggestions. BDRV_O_LIMITED_USE? > BDRV_O_HANDOFF? At any rate, I fully agree with your logic of locking > things down on the source to mark that the destination is about to take > over write access to the file. >
INCOMING is handy as it keeps the code simple, even if it's weird to read. Is it worth adding the extra ifs/case statements everywhere to add in BDRV_O_HANDOFF? Maybe in the future someone will use BDRV_O_INCOMING to mean something more specific (data is incoming, not just in the process of being handed off) that could cause problems. Maybe even just renaming BDRV_O_INCOMING right now to be BDRV_O_HANDOFF would accomplish the semantics we want on both source and destination without needing two flags. Follow your dreams, Go with what you feel. >> >> 3. One side (the destination on success) continues and calls >> bdrv_invalidate_all() in order to take ownership of the image again. >> This removes BDRV_O_INCOMING on the resuming side; the flag remains >> set on the other side. >> >> This ensures that the same image isn't written to by both instances >> (unless both are resumed, but then you get what you deserve). This is >> important because .bdrv_close for non-BDRV_O_INCOMING images could write >> to the image file, which is definitely forbidden while another host is >> using the image. > > And indeed, this is a prereq to your patch that modifies the file on > close to clear the new 'open-for-writing' flag :) > >> >> Signed-off-by: Kevin Wolf <kw...@redhat.com> >> --- >> block.c | 34 ++++++++++++++++++++++++++++++++++ >> include/block/block.h | 1 + >> include/block/block_int.h | 1 + >> migration/migration.c | 7 +++++++ >> qmp.c | 12 ++++++++++++ >> 5 files changed, 55 insertions(+) >> > >> @@ -1536,6 +1540,9 @@ static void migration_completion(MigrationState *s, >> int current_active_state, >> if (!ret) { >> ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); >> if (ret >= 0) { >> + ret = bdrv_inactivate_all(); >> + } >> + if (ret >= 0) { >> qemu_file_set_rate_limit(s->file, INT64_MAX); > > Isn't the point of the rate limit change to allow any pending operations > to flush without artificial slow limits? Will inactivating the device > be too slow if rate limits are still slow? > This sets the rate limit for the migration pipe, doesn't it? My reading was that this removes any artificial limits for the sake of post-copy, but we shouldn't be flushing any writes to disk at this point, so I think this order won't interfere with anything. > But offhand, I don't have any strong proof that a different order is > required, so yours makes sense to me. > > You may want a second opinion, but I'm okay if you add: > Reviewed-by: Eric Blake <ebl...@redhat.com> > Reviewed-by: John Snow <js...@redhat.com>