On 04/15/16 05:27, Fam Zheng wrote: > Block drivers can implement this new operation .bdrv_lockf to actually lock > the > image in the protocol specific way. > > Signed-off-by: Fam Zheng <f...@redhat.com> > --- > block.c | 42 ++++++++++++++++++++++++++++++++++++++++++ > include/block/block_int.h | 12 ++++++++++++ > 2 files changed, 54 insertions(+) > > diff --git a/block.c b/block.c > index 1c575e4..7971a25 100644 > --- a/block.c > +++ b/block.c > @@ -846,6 +846,34 @@ out: > g_free(gen_node_name); > } > > +static int bdrv_lock_unlock_image_do(BlockDriverState *bs, bool lock_image) > +{ > + int cmd = BDRV_LOCKF_UNLOCK; > + > + if (bs->image_locked == lock_image) { > + return 0; > + } else if (!bs->drv) { > + return -ENOMEDIUM; > + } else if (!bs->drv->bdrv_lockf) { > + return 0; > + } > + if (lock_image) { > + cmd = bs->open_flags & BDRV_O_RDWR ? BDRV_LOCKF_RWLOCK : > + BDRV_LOCKF_ROLOCK; > + } > + return bs->drv->bdrv_lockf(bs, cmd); > +} > + > +static int bdrv_lock_image(BlockDriverState *bs) > +{ > + return bdrv_lock_unlock_image_do(bs, true); > +} > + > +static int bdrv_unlock_image(BlockDriverState *bs) > +{ > + return bdrv_lock_unlock_image_do(bs, false); > +} > + > static QemuOptsList bdrv_runtime_opts = { > .name = "bdrv_common", > .head = QTAILQ_HEAD_INITIALIZER(bdrv_runtime_opts.head), > @@ -995,6 +1023,14 @@ static int bdrv_open_common(BlockDriverState *bs, > BdrvChild *file, > goto free_and_fail; > } > > + if (!(open_flags & (BDRV_O_NO_LOCK | BDRV_O_INACTIVE))) { > + ret = bdrv_lock_image(bs); > + if (ret) { > + error_setg(errp, "Failed to lock image"); > + goto free_and_fail; > + } > + } > + > ret = refresh_total_sectors(bs, bs->total_sectors); > if (ret < 0) { > error_setg_errno(errp, -ret, "Could not refresh total sector count"); > @@ -2144,6 +2180,7 @@ static void bdrv_close(BlockDriverState *bs) > if (bs->drv) { > BdrvChild *child, *next; > > + bdrv_unlock_image(bs); > bs->drv->bdrv_close(bs); > bs->drv = NULL; > > @@ -3230,6 +3267,9 @@ void bdrv_invalidate_cache(BlockDriverState *bs, Error > **errp) > error_setg_errno(errp, -ret, "Could not refresh total sector count"); > return; > } > + if (!(bs->open_flags & BDRV_O_NO_LOCK)) { > + bdrv_lock_image(bs); > + } > } > > void bdrv_invalidate_cache_all(Error **errp) > @@ -3262,6 +3302,7 @@ static int bdrv_inactivate(BlockDriverState *bs) > } > > bs->open_flags |= BDRV_O_INACTIVE; > + ret = bdrv_unlock_image(bs); > return 0; > } > > @@ -3981,3 +4022,4 @@ void bdrv_refresh_filename(BlockDriverState *bs) > QDECREF(json); > } > } > + > diff --git a/include/block/block_int.h b/include/block/block_int.h > index 10d8759..ffa30b0 100644 > --- a/include/block/block_int.h > +++ b/include/block/block_int.h > @@ -85,6 +85,12 @@ typedef struct BdrvTrackedRequest { > struct BdrvTrackedRequest *waiting_for; > } BdrvTrackedRequest; > > +typedef enum { > + BDRV_LOCKF_RWLOCK, > + BDRV_LOCKF_ROLOCK, > + BDRV_LOCKF_UNLOCK, > +} BdrvLockfCmd; > + > struct BlockDriver { > const char *format_name; > int instance_size; > @@ -317,6 +323,11 @@ struct BlockDriver { > */ > void (*bdrv_drain)(BlockDriverState *bs); > > + /** > + * Lock/unlock the image. > + */ > + int (*bdrv_lockf)(BlockDriverState *bs, BdrvLockfCmd cmd); > + > QLIST_ENTRY(BlockDriver) list; > }; > > @@ -485,6 +496,7 @@ struct BlockDriverState { > NotifierWithReturn write_threshold_notifier; > > int quiesce_counter; > + bool image_locked; > }; > > struct BlockBackendRootState { >
I'd like to raise one point which I think may not have been, yet (after briefly skimming the v1 / v2 comments). Sorry if this has been discussed already. IIUC, the idea is that "protocols" (in the block layer sense) implement the lockf method, and then bdrv_open_common() automatically locks image files, if the lockf method is available, and if various settings (cmdline options etc) don't request otherwise. I tried to see if this series modifies -- for example -- raw_reopen_commit() and raw_reopen_abort(), in "block/raw-posix.c". Or, if it modifies bdrv_reopen_multiple(), in "block.c". It doesn't seem to. Those functions are relevant for the following reason. Given the following chain of references: file descriptor --> file description --> file an fcntl() lock is associated with the file. However, the fcntl() lock held by the process on the file is dropped if the process closes *any* file descriptor that points (through the same or another file description) to the file. From <http://pubs.opengroup.org/onlinepubs/9699919799/functions/fcntl.html>: All locks associated with a file for a given process shall be removed when a file descriptor for that file is closed by that process [...] >From <http://pubs.opengroup.org/onlinepubs/9699919799/functions/close.html>: All outstanding record locks owned by the process on the file associated with the file descriptor shall be removed (that is, unlocked). >From <http://man7.org/linux/man-pages/man2/fcntl.2.html>: If a process closes any file descriptor referring to a file, then all of the process's locks on that file are released, regardless of the file descriptor(s) on which the locks were obtained. The bdrv_reopen_multiple() function reopens a bunch of image files. Backed by the raw-posix protocol driver, this seems to boil down to a series of (i) fcntl(F_DUPFD_CLOEXEC), and/or (ii) dup(), and/or (iii) qemu_open() calls, in raw_reopen_prepare(). The result is stored in "raw_s->fd" every time. (In the first two cases, the file description will be shared, in the third case, the file will be shared, between "s->fd" and "raw_s->fd".) Assume that one of the raw_reopen_prepare() calls fails. Then bdrv_reopen_multiple() will roll back the work done thus far, calling raw_reopen_abort() on the initial subset of image files. This results in "raw_s->fd" being passed to close(), which is when the lock (conceptually held for "s->fd") is dropped for good. If all of the raw_reopen_prepare() calls succeed, then a series of raw_reopen_commit() calls will occur. That has the same effect: "s->fd" is passed to close(), which drops the lock for "raw_s->fd" too (which is supposed to be used for accessing the file, going forward). Sorry if this is already handled in the series, I couldn't find it. Thanks Laszlo