Re: [Qemu-devel] [PATCH RFC] block: Tolerate existing writers on read only BdrvChild

Fam Zheng Wed, 01 Mar 2017 04:41:07 -0800

On Wed, 03/01 10:49, Kevin Wolf wrote:
> Am 01.03.2017 um 09:15 hat Fam Zheng geschrieben:
> > In an ideal world, read-write access to an image is inherently
> > exclusive, because we cannot guarantee other readers and writers a
> > consistency view of the whole image at all point. That's what the
> > current permission system does, and it is okay as long as it is entirely
> > internal. But that would change with the coming image locking. In
> > practice, both end users and our test cases use tools like qemu-img and
> > qemu-io to peek at images while guest is running.
> > 
> > Relax a bit and accept other writers in this case.
> > 
> > Signed-off-by: Fam Zheng <f...@redhat.com>
> 
> Hm. On the one hand I think you're right that people are using things
> like this, on other hand it's also true that the result might not be
> consistent and therefore image locking is right about forbidding these
> actions.
> 
> I think your patch is too permissive, it allows even launching a second
> long-running VM on the image, which will definitely see corrupted data
> sooner or later.
> 
> Maybe what we can do is allow shared writers for read-only images if
> CONSISTENT_READ isn't requested. It's still not 100% correct because we
> can get inconsistent metadata and cause unexpected failure, but this is
> probably tolerable. We would then have to change the allowed actions to
> not request this permission.
> 
> In qemu-img, we have these read-only users:
> 
> * qemu-img check (without -r): Let's keep this blocked, it will only
>   report lots of leaks and leads to invalid bug reports. I've had my
>   share of them.
> 
> * qemu-img compare: Not sure if this makes sense with an image that is
>   in active use?
> 
> * qemu-img convert source: Similarly to qemu-img compare, this doesn't
>   make a whole lot of sense, with one exception: People are using -s to
>   extract internal snapshots while the VM is still running, and this
>   usually works because the snapshot doesn't change. However, I don't
>   want to know what happens if you delete the snapshort from the qemu
>   process while qemu-img convert is running... Doing 'qemu-img snapshot
>   -s ...' with a running VM is more tolerated abuse than a supported
>   feature.
> 
> * qemu-img info: This one makes perfect sense even with a running VM
> 
> * qemu-img map: Results are potentially meaningless with concurrent I/O,
>   but there may be cases where it makes sense.
> 
> * qemu-img snapshot -l: Somewhat similar to qemu-img convert -s, except
>   that it's very quick and doesn't access actual data (could even be
>   BDRV_O_NOIO, I think). Allowing this makes sense, I guess.
> 
> * qemu-img rebase, safe mode, backing files: If we allowed concurrent
>   writes, it wouldn't be safe.
> 
> * qemu-img bench: Well... No. You don't need this on an image with an
>   active VM.
> 
> * qemu-img dd: Same as convert, except that there is no -s.
> 
> So the list of subcommands where we want to support it is rather short.
> We can change blk_new_open() to clear CONSISTENT_READ for BDRV_O_NOIO,
> which could cover 'info' and 'snapshot -l'.
> 
> That leaves us with qemu-io, 'convert -s' and 'map', all of which can be
> imagined to be useful even with a running VM, but all of which can also
> easily produce wrong results in this case. An explicit command line
> option to disable CONSISTENT_READ might be the right tool here.


I'm not sure about this because: 1) this is intrusive from a user PoV, many
scripts and upper layer tools will stop working; 2) CONSISTENT_READ is enforced
by qcow2 in its .bdrv_child_perm implementation even if blk_new_open() doesn't
ask for it, therefore such an option has to impact the whole graph; 3) this
isn't only about asking for "persistent read" perm, but more about granting
"write" in shared_perm, so it feels messy.

A perhaps more contained way is to add a BDRV_O_RELAXED_LOCK flag and use it in
those subcommands, then in the image locking code, the "no other writer" byte is
not locked if it's set. This has a simpler semantic and a more manageable scope.

Fam

Re: [Qemu-devel] [PATCH RFC] block: Tolerate existing writers on read only BdrvChild

Reply via email to