* Markus Armbruster <arm...@redhat.com> [2010-11-10 06:48]: > One real question, and a couple of nits. > > Ryan Harper <ry...@us.ibm.com> writes: > > > Block hot unplug is racy since the guest is required to acknowlege the ACPI > > unplug event; this may not happen synchronously with the device removal > > command > > Well, I wouldn't call unplug "racy". It just takes an unpredictable > length of time, possibly forever. To make a race, you need to throw in > a client assuming (incorrectly) that unplug is instantaneous, as > described in your next paragraph. > > Moreover, all PCI unplug is that way, not just block. > > > This series aims to close a gap where by mgmt applications that assume the > > block resource has been removed without confirming that the guest has > > acknowledged the removal may re-assign the underlying device to a second > > guest > > leading to data leakage. > > Yes, the incorrect assumption is a problem. But with that fixed (in the > management application), we run right into the next problem: there is no > way for the management application to reliably disconnect the guest from > a block device. And that's the problem you're fixing.
Yeah, that's the right way to word it; providing a method to forcibly disconnect the guest from the host device. > > > This series introduces a new montor command to decouple asynchornous device > > Typos "montor" and "asynchornous". You might want to use a spell > checker :) > > Lines are a bit long. Recommend wrap at column 70. > > > removal from restricting guest access to a block device. We do this by > > creating > > a new monitor command drive_del which maps to a bdrv_unplug() command which > > does a qemu_aio_flush; bdrv_flush() and bdrv_close(). Once complete, > > subsequent > > IO is rejected from the device and the guest will get IO errors but > > continue to > > function. In addition to preventing further IO, we clean up state pointers > > between host (BlockDriverState) and guest (DeviceInfo). > > > > A subsequent device removal command can be issued to remove the device, to > > which > > the guest may or maynot respond, but as long as the unplugged bit is set, > > no IO > > "maynot" is not a word. > > > will be sumbitted. > > This suggests to drive_del before device_del, which makes the device > goes through a "broken device" state on its way to unplug. If the guest > accesses the device in that state, it gets I/O errors. Not nice. > > Instead, I'd recommend device_del, wait for the device to go away, > drive_del on time out. If the guest reacts to the ACPI unplug promptly, > it's never exposed to the "broken device" state. Note: if the drive_del > fails because the device doesn't exist, we lost the race with the > automatic destruction, which is harmless. Ignore that error. Honestly, other than describing what happens if you sever the connection when the guest isn't aware of it; I don't want to try to capture how the mgmt layer implements the removal. One may want to force the disconnect before attempting to remove the device; or the other way around; that's really the mgmt layer's call. > > > Signed-off-by: Ryan Harper <ry...@us.ibm.com> > > --- > > block.c | 7 +++++++ > > block.h | 1 + > > blockdev.c | 36 ++++++++++++++++++++++++++++++++++++ > > blockdev.h | 1 + > > hmp-commands.hx | 18 ++++++++++++++++++ > > 5 files changed, 63 insertions(+), 0 deletions(-) > > > > diff --git a/block.c b/block.c > > index 6b505fb..c76a796 100644 > > --- a/block.c > > +++ b/block.c > > @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int > > removable) > > } > > } > > > > +void bdrv_unplug(BlockDriverState *bs) > > +{ > > + qemu_aio_flush(); > > + bdrv_flush(bs); > > + bdrv_close(bs); > > +} > > + > > Unless we expect more users, I'd inline this into its only caller. > Matter of taste. Works for me. > > > int bdrv_is_removable(BlockDriverState *bs) > > { > > return bs->removable; > > diff --git a/block.h b/block.h > > index 78ecfac..581414c 100644 > > --- a/block.h > > +++ b/block.h > > @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, > > BlockErrorAction on_read_error, > > BlockErrorAction on_write_error); > > BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read); > > void bdrv_set_removable(BlockDriverState *bs, int removable); > > +void bdrv_unplug(BlockDriverState *bs); > > int bdrv_is_removable(BlockDriverState *bs); > > int bdrv_is_read_only(BlockDriverState *bs); > > int bdrv_is_sg(BlockDriverState *bs); > > diff --git a/blockdev.c b/blockdev.c > > index 6cb179a..ee8c2ec 100644 > > --- a/blockdev.c > > +++ b/blockdev.c > > @@ -14,6 +14,8 @@ > > #include "qemu-option.h" > > #include "qemu-config.h" > > #include "sysemu.h" > > +#include "hw/qdev.h" > > +#include "block_int.h" > > > > static QTAILQ_HEAD(drivelist, DriveInfo) drives = > > QTAILQ_HEAD_INITIALIZER(drives); > > > > @@ -597,3 +599,37 @@ int do_change_block(Monitor *mon, const char *device, > > } > > return monitor_read_bdrv_key_start(mon, bs, NULL, NULL); > > } > > + > > +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data) > > +{ > > + const char *id = qdict_get_str(qdict, "id"); > > + BlockDriverState *bs; > > + Property *prop; > > + > > + bs = bdrv_find(id); > > + if (!bs) { > > + qerror_report(QERR_DEVICE_NOT_FOUND, id); > > + return -1; > > + } > > + > > + /* quiesce block driver; prevent further io */ > > + bdrv_unplug(bs); > > + > > + /* clean up guest state from pointing to host resource by > > + * finding and removing DeviceState "drive" property */ > > + for (prop = bs->peer->info->props; prop && prop->name; prop++) { > > + if ((prop->info->type == PROP_TYPE_DRIVE) && > > + (*(BlockDriverState **)qdev_get_prop_ptr(bs->peer, prop) == > > bs)) { > > + if (prop->info->free) { > > + prop->info->free(bs->peer, prop); > > + } > > Does this null the drive property? I doubt it. Quick check in the > debugger? > > The free callbacks generally don't zap the properties, because they run > from qdev_free(). To be honest; I didn't see anything that looked like "remove this property" in the qdev api. Any pointers? should I be calling qdev_free() on the dev? I don't quite understand the distinction between the info list of properties and the device itself, nor specifically what we need to remove in the drive_del() operation versus the device_del() portion. > > > + } > > + } > > + > > + /* clean up host state pointing to guest resource by removing > > + * pointers to guest device in the BlockDriverState */ > > + bdrv_delete(bs); > > + > > + return 0; > > +} > > + > > diff --git a/blockdev.h b/blockdev.h > > index 653affc..2a0559e 100644 > > --- a/blockdev.h > > +++ b/blockdev.h > > @@ -51,5 +51,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject > > **ret_data); > > int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject > > **ret_data); > > int do_change_block(Monitor *mon, const char *device, > > const char *filename, const char *fmt); > > +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data); > > > > #endif > > diff --git a/hmp-commands.hx b/hmp-commands.hx > > index e5585ba..d6dc18c 100644 > > --- a/hmp-commands.hx > > +++ b/hmp-commands.hx > > @@ -68,6 +68,24 @@ Eject a removable medium (use -f to force it). > > ETEXI > > > > { > > + .name = "drive_del", > > + .args_type = "id:s", > > + .params = "device", > > + .help = "remove host block device", > > + .user_print = monitor_user_noop, > > + .mhandler.cmd_new = do_drive_del, > > + }, > > + > > +STEXI > > +...@item delete @var{device} > > +...@findex delete > > +Remove host block device. The result is that guest generated IO is no > > longer > > +submitted against the host device underlying the disk. Once a drive has > > +been deleted, the QEMU Block layer returns -EIO which results in IO > > +errors in the guest for applications that are reading/writing to the > > device. > > +ETEXI > > + > > + { > > .name = "change", > > .args_type = "device:B,target:F,arg:s?", > > .params = "device filename [format]", -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com