Re: [Qemu-devel] Safely reopening image files by stashing fds

Stefan Hajnoczi Tue, 09 Aug 2011 06:41:09 -0700

On Tue, Aug 9, 2011 at 11:50 AM, Stefan Hajnoczi <stefa...@gmail.com> wrote:
> On Tue, Aug 9, 2011 at 11:35 AM, Kevin Wolf <kw...@redhat.com> wrote:
>> Am 09.08.2011 12:25, schrieb Stefan Hajnoczi:
>>> On Mon, Aug 8, 2011 at 4:16 PM, Kevin Wolf <kw...@redhat.com> wrote:
>>>> Am 08.08.2011 16:49, schrieb Stefan Hajnoczi:
>>>>> On Fri, Aug 5, 2011 at 10:48 AM, Kevin Wolf <kw...@redhat.com> wrote:
>>>>>> Am 05.08.2011 11:29, schrieb Stefan Hajnoczi:
>>>>>>> On Fri, Aug 5, 2011 at 10:07 AM, Kevin Wolf <kw...@redhat.com> wrote:
>>>>>>>> Am 05.08.2011 10:40, schrieb Stefan Hajnoczi:
>>>>>>>>> We've discussed safe methods for reopening image files (e.g. useful 
>>>>>>>>> for
>>>>>>>>> changing the hostcache parameter).  The problem is that closing the 
>>>>>>>>> file first
>>>>>>>>> and then opening it again exposes us to the error case where the open 
>>>>>>>>> fails.
>>>>>>>>> At that point we cannot get to the file anymore and our options are to
>>>>>>>>> terminate QEMU, pause the VM, or offline the block device.
>>>>>>>>>
>>>>>>>>> This window of vulnerability can be eliminated by keeping the file 
>>>>>>>>> descriptor
>>>>>>>>> around and falling back to it should the open fail.
>>>>>>>>>
>>>>>>>>> The challenge for the file descriptor approach is that image formats, 
>>>>>>>>> like
>>>>>>>>> VMDK, can span multiple files.  Therefore the solution is not as 
>>>>>>>>> simple as
>>>>>>>>> stashing a single file descriptor and reopening from it.
>>>>>>>>
>>>>>>>> So far I agree. The rest I believe is wrong because you can't assume
>>>>>>>> that every backend uses file descriptors. The qemu block layer is based
>>>>>>>> on BlockDriverStates, not fds. They are a concept that should be hidden
>>>>>>>> in raw-posix.
>>>>>>>>
>>>>>>>> I think something like this could do:
>>>>>>>>
>>>>>>>> struct BDRVReopenState {
>>>>>>>>    BlockDriverState *bs;
>>>>>>>>    /* can be extended by block drivers */
>>>>>>>> };
>>>>>>>>
>>>>>>>> .bdrv_reopen(BlockDriverState *bs, BDRVReopenState **reopen_state, int
>>>>>>>> flags);
>>>>>>>> .bdrv_reopen_commit(BDRVReopenState *reopen_state);
>>>>>>>> .bdrv_reopen_abort(BDRVReopenState *reopen_state);
>>>>>>>>
>>>>>>>> raw-posix would store the old file descriptor in its reopen_state. On
>>>>>>>> commit, it closes the old descriptors, on abort it reverts to the old
>>>>>>>> one and closes the newly opened one.
>>>>>>>>
>>>>>>>> Makes things a bit more complicated than the simple bdrv_reopen I had 
>>>>>>>> in
>>>>>>>> mind before, but it allows VMDK to get an all-or-nothing semantics.
>>>>>>>
>>>>>>> Can you show how bdrv_reopen() would use these new interfaces?  I'm
>>>>>>> not 100% clear on the idea.
>>>>>>
>>>>>> Well, you wouldn't only call bdrv_reopen, but also either
>>>>>> bdrv_reopen_commit/abort (for the top-level caller we can have a wrapper
>>>>>> function that does both, but that's syntactic sugar).
>>>>>>
>>>>>> For example we would have:
>>>>>>
>>>>>> int vmdk_reopen()
>>>>>
>>>>> .bdrv_reopen() is a confusing name for this operation because it does
>>>>> not reopen anything.  bdrv_prepare_reopen() might be clearer.
>>>>
>>>> Makes sense.
>>>>
>>>>>
>>>>>> {
>>>>>>    *((VMDKReopenState**) rs) = malloc();
>>>>>>
>>>>>>    foreach (extent in s->extents) {
>>>>>>        ret = bdrv_reopen(extent->file, &extent->reopen_state)
>>>>>>        if (ret < 0)
>>>>>>            goto fail;
>>>>>>    }
>>>>>>    return 0;
>>>>>>
>>>>>> fail:
>>>>>>    foreach (extent in rs->already_reopened) {
>>>>>>        bdrv_reopen_abort(extent->reopen_state);
>>>>>>    }
>>>>>>    return ret;
>>>>>> }
>>>>>
>>>>>> void vmdk_reopen_commit()
>>>>>> {
>>>>>>    foreach (extent in s->extents) {
>>>>>>        bdrv_reopen_commit(extent->reopen_state);
>>>>>>    }
>>>>>>    free(rs);
>>>>>> }
>>>>>>
>>>>>> void vmdk_reopen_abort()
>>>>>> {
>>>>>>    foreach (extent in s->extents) {
>>>>>>        bdrv_reopen_abort(extent->reopen_state);
>>>>>>    }
>>>>>>    free(rs);
>>>>>> }
>>>>>
>>>>> Does the caller invoke bdrv_close(bs) after bdrv_prepare_reopen(bs,
>>>>> &rs)?
>>>>
>>>> No. Closing the old backend would be part of bdrv_reopen_commit.
>>>>
>>>> Do you have a use case where it would be helpful if the caller invoked
>>>> bdrv_close?
>>>
>>> When the caller does bdrv_close() two BlockDriverStates are never open
>>> for the same image file.  I thought this was a property we wanted.
>>>
>>> Also, in the block_set_hostcache case we need to reopen without
>>> switching to a new BlockDriverState instance.  That means the reopen
>>> needs to be in-place with respect to the BlockDriverState *bs pointer.
>>>  We cannot create a new instance.
>>
>> Yes, but where do you even get the second BlockDriverState from?
>>
>> My prototype only returns an int, not a new BlockDriverState. Until
>> bdrv_reopen_commit() it would refer to the old file descriptors etc. and
>> after bdrv_reopen_commit() the very same BlockDriverState would refer to
>> the new ones.
>
> It seems I don't understand the API.  I thought it was:
>
> do_block_set_hostcache()
> {
>    bdrv_prepare_reopen(bs, &rs);
>    ...open new file and check everything is okay...
>    if (ret == 0) {
>        bdrv_reopen_commit(rs);
>    } else {
>        bdrv_reopen_abort(rs);
>    }
>    return ret;
> }
>
> If the caller isn't opening the new file then what's the point of
> giving the caller control over prepare, commit, and abort?


After sending the last email I realized what I was missing:

You need the prepare, commit, and abort API in order to handle
multi-file block drivers like VMDK.

Stefan

Re: [Qemu-devel] Safely reopening image files by stashing fds

Reply via email to