On Tue, Aug 9, 2011 at 11:50 AM, Stefan Hajnoczi <stefa...@gmail.com> wrote: > On Tue, Aug 9, 2011 at 11:35 AM, Kevin Wolf <kw...@redhat.com> wrote: >> Am 09.08.2011 12:25, schrieb Stefan Hajnoczi: >>> On Mon, Aug 8, 2011 at 4:16 PM, Kevin Wolf <kw...@redhat.com> wrote: >>>> Am 08.08.2011 16:49, schrieb Stefan Hajnoczi: >>>>> On Fri, Aug 5, 2011 at 10:48 AM, Kevin Wolf <kw...@redhat.com> wrote: >>>>>> Am 05.08.2011 11:29, schrieb Stefan Hajnoczi: >>>>>>> On Fri, Aug 5, 2011 at 10:07 AM, Kevin Wolf <kw...@redhat.com> wrote: >>>>>>>> Am 05.08.2011 10:40, schrieb Stefan Hajnoczi: >>>>>>>>> We've discussed safe methods for reopening image files (e.g. useful >>>>>>>>> for >>>>>>>>> changing the hostcache parameter). The problem is that closing the >>>>>>>>> file first >>>>>>>>> and then opening it again exposes us to the error case where the open >>>>>>>>> fails. >>>>>>>>> At that point we cannot get to the file anymore and our options are to >>>>>>>>> terminate QEMU, pause the VM, or offline the block device. >>>>>>>>> >>>>>>>>> This window of vulnerability can be eliminated by keeping the file >>>>>>>>> descriptor >>>>>>>>> around and falling back to it should the open fail. >>>>>>>>> >>>>>>>>> The challenge for the file descriptor approach is that image formats, >>>>>>>>> like >>>>>>>>> VMDK, can span multiple files. Therefore the solution is not as >>>>>>>>> simple as >>>>>>>>> stashing a single file descriptor and reopening from it. >>>>>>>> >>>>>>>> So far I agree. The rest I believe is wrong because you can't assume >>>>>>>> that every backend uses file descriptors. The qemu block layer is based >>>>>>>> on BlockDriverStates, not fds. They are a concept that should be hidden >>>>>>>> in raw-posix. >>>>>>>> >>>>>>>> I think something like this could do: >>>>>>>> >>>>>>>> struct BDRVReopenState { >>>>>>>> BlockDriverState *bs; >>>>>>>> /* can be extended by block drivers */ >>>>>>>> }; >>>>>>>> >>>>>>>> .bdrv_reopen(BlockDriverState *bs, BDRVReopenState **reopen_state, int >>>>>>>> flags); >>>>>>>> .bdrv_reopen_commit(BDRVReopenState *reopen_state); >>>>>>>> .bdrv_reopen_abort(BDRVReopenState *reopen_state); >>>>>>>> >>>>>>>> raw-posix would store the old file descriptor in its reopen_state. On >>>>>>>> commit, it closes the old descriptors, on abort it reverts to the old >>>>>>>> one and closes the newly opened one. >>>>>>>> >>>>>>>> Makes things a bit more complicated than the simple bdrv_reopen I had >>>>>>>> in >>>>>>>> mind before, but it allows VMDK to get an all-or-nothing semantics. >>>>>>> >>>>>>> Can you show how bdrv_reopen() would use these new interfaces? I'm >>>>>>> not 100% clear on the idea. >>>>>> >>>>>> Well, you wouldn't only call bdrv_reopen, but also either >>>>>> bdrv_reopen_commit/abort (for the top-level caller we can have a wrapper >>>>>> function that does both, but that's syntactic sugar). >>>>>> >>>>>> For example we would have: >>>>>> >>>>>> int vmdk_reopen() >>>>> >>>>> .bdrv_reopen() is a confusing name for this operation because it does >>>>> not reopen anything. bdrv_prepare_reopen() might be clearer. >>>> >>>> Makes sense. >>>> >>>>> >>>>>> { >>>>>> *((VMDKReopenState**) rs) = malloc(); >>>>>> >>>>>> foreach (extent in s->extents) { >>>>>> ret = bdrv_reopen(extent->file, &extent->reopen_state) >>>>>> if (ret < 0) >>>>>> goto fail; >>>>>> } >>>>>> return 0; >>>>>> >>>>>> fail: >>>>>> foreach (extent in rs->already_reopened) { >>>>>> bdrv_reopen_abort(extent->reopen_state); >>>>>> } >>>>>> return ret; >>>>>> } >>>>> >>>>>> void vmdk_reopen_commit() >>>>>> { >>>>>> foreach (extent in s->extents) { >>>>>> bdrv_reopen_commit(extent->reopen_state); >>>>>> } >>>>>> free(rs); >>>>>> } >>>>>> >>>>>> void vmdk_reopen_abort() >>>>>> { >>>>>> foreach (extent in s->extents) { >>>>>> bdrv_reopen_abort(extent->reopen_state); >>>>>> } >>>>>> free(rs); >>>>>> } >>>>> >>>>> Does the caller invoke bdrv_close(bs) after bdrv_prepare_reopen(bs, >>>>> &rs)? >>>> >>>> No. Closing the old backend would be part of bdrv_reopen_commit. >>>> >>>> Do you have a use case where it would be helpful if the caller invoked >>>> bdrv_close? >>> >>> When the caller does bdrv_close() two BlockDriverStates are never open >>> for the same image file. I thought this was a property we wanted. >>> >>> Also, in the block_set_hostcache case we need to reopen without >>> switching to a new BlockDriverState instance. That means the reopen >>> needs to be in-place with respect to the BlockDriverState *bs pointer. >>> We cannot create a new instance. >> >> Yes, but where do you even get the second BlockDriverState from? >> >> My prototype only returns an int, not a new BlockDriverState. Until >> bdrv_reopen_commit() it would refer to the old file descriptors etc. and >> after bdrv_reopen_commit() the very same BlockDriverState would refer to >> the new ones. > > It seems I don't understand the API. I thought it was: > > do_block_set_hostcache() > { > bdrv_prepare_reopen(bs, &rs); > ...open new file and check everything is okay... > if (ret == 0) { > bdrv_reopen_commit(rs); > } else { > bdrv_reopen_abort(rs); > } > return ret; > } > > If the caller isn't opening the new file then what's the point of > giving the caller control over prepare, commit, and abort?
After sending the last email I realized what I was missing: You need the prepare, commit, and abort API in order to handle multi-file block drivers like VMDK. Stefan