----- Original Message ----- > From: "Stefan Hajnoczi" <stefa...@gmail.com> > To: "Francesco Romani" <from...@redhat.com> > Cc: kw...@redhat.com, lcapitul...@redhat.com, qemu-devel@nongnu.org, > stefa...@redhat.com, mdr...@linux.vnet.ibm.com > Sent: Monday, November 17, 2014 5:49:36 PM > Subject: Re: [Qemu-devel] [RFC][PATCH v2] block: add write threshold > reporting for block devices > > On Fri, Nov 07, 2014 at 02:12:13PM +0100, Francesco Romani wrote: > > Sorry for the long review delay. Looks pretty good, just one real issue > to think about at the bottom.
Hi Stefan, thanks for the review and no problem for the delay :) > > > +static void usage_threshold_disable(BlockDriverState *bs) > > +{ > > It would be safest to make this idempotent: > > if (!usage_threshold_is_set(bs)) { > return; > } > > That way it can be called multiple times. Will change. > > > + notifier_with_return_remove(&bs->wr_usage_threshold_notifier); > > + bs->wr_offset_threshold = 0; > > +} > > + > > +static int usage_threshold_is_set(const BlockDriverState *bs) > > +{ > > + return !!(bs->wr_offset_threshold > 0); > > +} > > Please use the bool type instead of an int return value. Sure, will fix. > > +static int coroutine_fn before_write_notify(NotifierWithReturn *notifier, > > + void *opaque) > > +{ > > + BdrvTrackedRequest *req = opaque; > > + BlockDriverState *bs = req->bs; > > + int64_t amount = 0; > > + > > + assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0); > > + assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0); > > Does the code still make these assumptions or are the asserts left over > from previous versions of the patch? It's a leftover. I understood they don't hurt and add a bit of safety, but if they are confusing I'll remove them. > > +void bdrv_set_usage_threshold(BlockDriverState *bs, int64_t > > threshold_bytes) > > +{ > > + BlockDriverState *target_bs = bs; > > + if (bs->file) { > > + target_bs = bs->file; > > + } > > Hmm...I think now I understand why you are trying to use bs->file. This > is an attempt to make image formats work with the threshold. > > Unfortunately the BlockDriverState topology can be more complicated than > just 1 level. I thought so but I can't reproduce yet more complex topologies. Disclosure: I'm testing against the topology I know to be used on oVirt, lacking immediate availability of others: suggestions welcome. At risk of being redundant, in oVirt the usecase is QCOW2 over lvm block device, and we'd like to be notified about the allocation of the lvm block device, which (IIUC) is the last bs->file. This is a simple topology (unless I'm missing something), and that's the reason why I go down just one level. Of course I want a general solution for this change, so... > If we hardcode a strategy to traverse bs->file then it will work in most > cases: > > while (bs->file) { > bs = bs->file; > } > > But there are cases like VMDK extent files where a BlockDriverState > actually has multiple children. > > One way to solve this is to require that the management tool tells QEMU > which exact BlockDriverState node the threshold applies to. Then QEMU > doesn't need any hardcoded policy. But I'm not sure how realistic that > it at the moment (whether management tools are uses node names for each > node yet), so it may be best to hardcode the bs->file traversal that > I've suggested. oVirt relies on libvirt[1], and uses device node (e.g. 'vda'). BTW, I haven't found a way to inspect from the outside (e.g. monitor command) the BlockDriverState topology, there is a way I'm missing? Another issue I don't yet have a proper solution is related to this one. The management app will have deal with VM with more than one block device disk, so we need a way to map the notification with the corresponding device. If we descend the bs->file chain, AFAIU the easiest mapping, being the device name, is easily lost because only the outermost BlockDriverState has a device name attached, so when the notification trigger bdrv_get_device_name() will return NULL I believe we don't necessarily need a device name in the notification, for example something like an opaque cookie set together with the threshold and returned back with the notification will suffice. Of course the device name is nicer :) +++ [1] if that can help further to understand the usecase, these are the libvirt APIs being used by oVirt: http://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockStatsFlags http://libvirt.org/html/libvirt-libvirt-domain.html#virDomainGetBlockInfo both relies on the output[2] of 'query-blockstats' monitor command. [2] AFAIU -but this is just my guesswork!- it also assumes a quite simple topology like I did Thanks and bests, -- Francesco Romani RedHat Engineering Virtualization R & D Phone: 8261328 IRC: fromani