v1 was: "add watermark reporting for block devices", but "watermark" is incorrectly unused. Hence the change in subject.
Sorry for long pause from v1 [0]; Only recently I was able to sort out all the missing details. Context for this RFC/patch is presented below as remider. Why RFC? -------- See "Known issues" below. Changes since v1: ----------------- addressed reviewers comments. Highligths: - fixed terminology: "watermark" -> "usage threshold" - threshold is expressed in bytes - make the event triggers only once when threshold crossed - configured threshold visible in 'query-block' output - fixed bugs Known Issues ------------ The threshold we (oVirt) care about is actually the one in the parent device, not in the device itself. See "Rationale and context from v1" below for details. The use case is qcow2 on a thin-provisioned block device. So the notification handler must be set in the inner BlockDriverState, like this (pseudocode, no error checking, see patch for details) /* ... */ BlockDriverState *bs = bdrv_find("drive-virtio-disk0"); /* ... */ bdrv_set_usage_threshold(bs, threshold_bytes); /* ... */ void bdrv_set_usage_threshold(BlockDriverState *bs, int64_t threshold_bytes) { BlockDriverState *target_bs = bs; if (bs->file) { target_bs = bs->file; /* we care about highest allocation of this! */ } bs->wr_usage_threshold_notifier.notify = usage_threshold_before_write_notify; notifier_with_return_list_add(&bs->before_write_notifiers, &bs->wr_usage_threshold_notifier); } Problem: when the notifier triggers, I don't know how to get the device name, because a simple bdrv_get_device_name(bs); yields just "" Despite my efforts and some digging in the code, I don't know which is the clean and correct way to get the (parent) device name. Rationale and context from v1 ----------------------------- I'm one of the oVirt developers (http://www.ovirt.org); oVirt is a virtualization management application built around qemu/kvm, so it is nice to get in touch :) We have begun a big scalability improvement effort, aiming to support without problems hundreds of VMs per host, with plans to support thousands in a not so distant future. In doing so, we are reviewing our usage flows. One of them is thin-provisioned storage, which is used quite extensively, with block devices (ISCSI for example) and COW images. When using thin provisioning, oVirt tries hard to hide this fact from the guest OS, and to do so watches closely the usage of the device, and resize it when its usage exceeds a configured threshold (the "high water mark"), in order to avoid the guest OS to get paused for space exhausted. To do the watching, we poll he devices using libvirt [1], which in turn uses query-blockstats. This is suboptimal with just one VM, but with hundereds of them, let alone thousands, it doesn't scale and it is quite a resource hog. Would be great to have this watermark concept supported into qemu, with a new event to be raised when the limit is crossed. To track this RFE I filed https://bugs.launchpad.net/qemu/+bug/1338957 Moreover, I had the chance to take a look at the QEMU sources and come up with this tentative patch which I'd also like to submit. Notes ----- [0]: https://lists.gnu.org/archive/html/qemu-devel/2014-07/msg01348.html [1]: http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=ebb0c19c48690f0598de954f8e0e9d4d29d48b85 Francesco Romani (1): block: add event when disk usage exceeds threshold block/Makefile.objs | 1 + block/qapi.c | 3 + block/usage-threshold.c | 124 ++++++++++++++++++++++++++++++++++++++++ include/block/block_int.h | 4 ++ include/block/usage-threshold.h | 39 +++++++++++++ qapi/block-core.json | 46 ++++++++++++++- qmp-commands.hx | 26 +++++++++ 7 files changed, 242 insertions(+), 1 deletion(-) create mode 100644 block/usage-threshold.c create mode 100644 include/block/usage-threshold.h -- 1.9.3