Am 09.08.2011 14:00, schrieb Stefan Hajnoczi: > On Tue, Aug 09, 2011 at 01:39:13PM +0200, Kevin Wolf wrote: >> Am 09.08.2011 12:56, schrieb Stefan Hajnoczi: >>> On Tue, Aug 9, 2011 at 11:50 AM, Stefan Hajnoczi <stefa...@gmail.com> wrote: >>>> On Tue, Aug 9, 2011 at 11:35 AM, Kevin Wolf <kw...@redhat.com> wrote: >>>>> Am 09.08.2011 12:25, schrieb Stefan Hajnoczi: >>>>>> On Mon, Aug 8, 2011 at 4:16 PM, Kevin Wolf <kw...@redhat.com> wrote: >>>>>>> Am 08.08.2011 16:49, schrieb Stefan Hajnoczi: >>>>>>>> On Fri, Aug 5, 2011 at 10:48 AM, Kevin Wolf <kw...@redhat.com> wrote: >>>>>>>>> Am 05.08.2011 11:29, schrieb Stefan Hajnoczi: >>>>>>>>>> On Fri, Aug 5, 2011 at 10:07 AM, Kevin Wolf <kw...@redhat.com> wrote: >>>>>>>>>>> Am 05.08.2011 10:40, schrieb Stefan Hajnoczi: >>>>>>>>>>>> We've discussed safe methods for reopening image files (e.g. >>>>>>>>>>>> useful for >>>>>>>>>>>> changing the hostcache parameter). The problem is that closing >>>>>>>>>>>> the file first >>>>>>>>>>>> and then opening it again exposes us to the error case where the >>>>>>>>>>>> open fails. >>>>>>>>>>>> At that point we cannot get to the file anymore and our options >>>>>>>>>>>> are to >>>>>>>>>>>> terminate QEMU, pause the VM, or offline the block device. >>>>>>>>>>>> >>>>>>>>>>>> This window of vulnerability can be eliminated by keeping the file >>>>>>>>>>>> descriptor >>>>>>>>>>>> around and falling back to it should the open fail. >>>>>>>>>>>> >>>>>>>>>>>> The challenge for the file descriptor approach is that image >>>>>>>>>>>> formats, like >>>>>>>>>>>> VMDK, can span multiple files. Therefore the solution is not as >>>>>>>>>>>> simple as >>>>>>>>>>>> stashing a single file descriptor and reopening from it. >>>>>>>>>>> >>>>>>>>>>> So far I agree. The rest I believe is wrong because you can't assume >>>>>>>>>>> that every backend uses file descriptors. The qemu block layer is >>>>>>>>>>> based >>>>>>>>>>> on BlockDriverStates, not fds. They are a concept that should be >>>>>>>>>>> hidden >>>>>>>>>>> in raw-posix. >>>>>>>>>>> >>>>>>>>>>> I think something like this could do: >>>>>>>>>>> >>>>>>>>>>> struct BDRVReopenState { >>>>>>>>>>> BlockDriverState *bs; >>>>>>>>>>> /* can be extended by block drivers */ >>>>>>>>>>> }; >>>>>>>>>>> >>>>>>>>>>> .bdrv_reopen(BlockDriverState *bs, BDRVReopenState **reopen_state, >>>>>>>>>>> int >>>>>>>>>>> flags); >>>>>>>>>>> .bdrv_reopen_commit(BDRVReopenState *reopen_state); >>>>>>>>>>> .bdrv_reopen_abort(BDRVReopenState *reopen_state); >>>>>>>>>>> >>>>>>>>>>> raw-posix would store the old file descriptor in its reopen_state. >>>>>>>>>>> On >>>>>>>>>>> commit, it closes the old descriptors, on abort it reverts to the >>>>>>>>>>> old >>>>>>>>>>> one and closes the newly opened one. >>>>>>>>>>> >>>>>>>>>>> Makes things a bit more complicated than the simple bdrv_reopen I >>>>>>>>>>> had in >>>>>>>>>>> mind before, but it allows VMDK to get an all-or-nothing semantics. >>>>>>>>>> >>>>>>>>>> Can you show how bdrv_reopen() would use these new interfaces? I'm >>>>>>>>>> not 100% clear on the idea. >>>>>>>>> >>>>>>>>> Well, you wouldn't only call bdrv_reopen, but also either >>>>>>>>> bdrv_reopen_commit/abort (for the top-level caller we can have a >>>>>>>>> wrapper >>>>>>>>> function that does both, but that's syntactic sugar). >>>>>>>>> >>>>>>>>> For example we would have: >>>>>>>>> >>>>>>>>> int vmdk_reopen() >>>>>>>> >>>>>>>> .bdrv_reopen() is a confusing name for this operation because it does >>>>>>>> not reopen anything. bdrv_prepare_reopen() might be clearer. >>>>>>> >>>>>>> Makes sense. >>>>>>> >>>>>>>> >>>>>>>>> { >>>>>>>>> *((VMDKReopenState**) rs) = malloc(); >>>>>>>>> >>>>>>>>> foreach (extent in s->extents) { >>>>>>>>> ret = bdrv_reopen(extent->file, &extent->reopen_state) >>>>>>>>> if (ret < 0) >>>>>>>>> goto fail; >>>>>>>>> } >>>>>>>>> return 0; >>>>>>>>> >>>>>>>>> fail: >>>>>>>>> foreach (extent in rs->already_reopened) { >>>>>>>>> bdrv_reopen_abort(extent->reopen_state); >>>>>>>>> } >>>>>>>>> return ret; >>>>>>>>> } >>>>>>>> >>>>>>>>> void vmdk_reopen_commit() >>>>>>>>> { >>>>>>>>> foreach (extent in s->extents) { >>>>>>>>> bdrv_reopen_commit(extent->reopen_state); >>>>>>>>> } >>>>>>>>> free(rs); >>>>>>>>> } >>>>>>>>> >>>>>>>>> void vmdk_reopen_abort() >>>>>>>>> { >>>>>>>>> foreach (extent in s->extents) { >>>>>>>>> bdrv_reopen_abort(extent->reopen_state); >>>>>>>>> } >>>>>>>>> free(rs); >>>>>>>>> } >>>>>>>> >>>>>>>> Does the caller invoke bdrv_close(bs) after bdrv_prepare_reopen(bs, >>>>>>>> &rs)? >>>>>>> >>>>>>> No. Closing the old backend would be part of bdrv_reopen_commit. >>>>>>> >>>>>>> Do you have a use case where it would be helpful if the caller invoked >>>>>>> bdrv_close? >>>>>> >>>>>> When the caller does bdrv_close() two BlockDriverStates are never open >>>>>> for the same image file. I thought this was a property we wanted. >>>>>> >>>>>> Also, in the block_set_hostcache case we need to reopen without >>>>>> switching to a new BlockDriverState instance. That means the reopen >>>>>> needs to be in-place with respect to the BlockDriverState *bs pointer. >>>>>> We cannot create a new instance. >>>>> >>>>> Yes, but where do you even get the second BlockDriverState from? >>>>> >>>>> My prototype only returns an int, not a new BlockDriverState. Until >>>>> bdrv_reopen_commit() it would refer to the old file descriptors etc. and >>>>> after bdrv_reopen_commit() the very same BlockDriverState would refer to >>>>> the new ones. >>>> >>>> It seems I don't understand the API. I thought it was: >>>> >>>> do_block_set_hostcache() >>>> { >>>> bdrv_prepare_reopen(bs, &rs); >>>> ...open new file and check everything is okay... >>>> if (ret == 0) { >>>> bdrv_reopen_commit(rs); >>>> } else { >>>> bdrv_reopen_abort(rs); >>>> } >>>> return ret; >>>> } >>>> >>>> If the caller isn't opening the new file then what's the point of >>>> giving the caller control over prepare, commit, and abort? >>> >>> After sending the last email I realized what I was missing: >>> >>> You need the prepare, commit, and abort API in order to handle >>> multi-file block drivers like VMDK. >> >> Yes, this is whole point of separating commit out. Does the proposal >> make sense to you now? > > It depends on the details. Adding more functions that every BlockDriver > must implement is bad, so it's important that we only drop this > functionality into raw-posix.c, vmdk.c, and block.c as appropriate.
Yes, I agree. > I liked the idea of doing a generic FDStash type that the monitor and > bdrv_reopen() can use. Blue's idea to hook at the qemu_open() level > takes that further. Well, having something that works for the raw-posix, the monitor and maybe some more things is nice. Having something that works for raw-posix, Sheepdog and rbd is an absolute requirement and I can't see how FDStash solves that. Even raw-win32 doesn't have an int fd, but a HANDLE hfile for its backend. So my main concern is that file descriptors are a concept not as generic as it needs to be to suit all block drivers. > But if we can do prepare, commit, and abort in a relatively simple way > then I'm for it. Great, then let's do that. Kevin