Am 15.05.2014 11:54, schrieb Kevin Wolf: > Am 15.05.2014 um 07:16 hat Peter Lieven geschrieben: >> Am 14.05.2014 13:41, schrieb Kevin Wolf: >>> Am 08.05.2014 um 18:22 hat Peter Lieven geschrieben: >>>> this patch tries to optimize zero write requests >>>> by automatically using bdrv_write_zeroes if it is >>>> supported by the format. >>>> >>>> This significantly speeds up file system initialization and >>>> should speed zero write test used to test backend storage >>>> performance. >>>> >>>> I ran the following 2 tests on my internal SSD with a >>>> 50G QCOW2 container and on an attached iSCSI storage. >>>> >>>> a) mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/vdX >>>> >>>> QCOW2 [off] [on] [unmap] >>>> ----- >>>> runtime: 14secs 1.1secs 1.1secs >>>> filesize: 937M 18M 18M >>>> >>>> iSCSI [off] [on] [unmap] >>>> ---- >>>> runtime: 9.3s 0.9s 0.9s >>>> >>>> b) dd if=/dev/zero of=/dev/vdX bs=1M oflag=direct >>>> >>>> QCOW2 [off] [on] [unmap] >>>> ----- >>>> runtime: 246secs 18secs 18secs >>>> filesize: 51G 192K 192K >>>> throughput: 203M/s 2.3G/s 2.3G/s >>>> >>>> iSCSI* [off] [on] [unmap] >>>> ---- >>>> runtime: 8mins 45secs 33secs >>>> throughput: 106M/s 1.2G/s 1.6G/s >>>> allocated: 100% 100% 0% >>>> >>>> * The storage was connected via an 1Gbit interface. >>>> It seems to internally handle writing zeroes >>>> via WRITESAME16 very fast. >>>> >>>> Signed-off-by: Peter Lieven <p...@kamp.de> >>>> --- >>>> v3->v4: - use QAPI generated enum and lookup table [Kevin] >>>> - added more details about the options in the comments >>>> of the qapi-schema [Eric] >>>> - changed the type of detect_zeroes from str to >>>> BlockdevDetectZeroesOptions. I left the name >>>> as is because it is consistent with e.g. >>>> BlockdevDiscardOptions or BlockdevAioOptions [Eric] >>>> - changed the parse function in blockdev_init to >>>> be generic usable for other enum parameters >>>> >>>> v2->v3: - moved parameter parsing to blockdev_init >>>> - added per device detect_zeroes status to >>>> hmp (info block -v) and qmp (query-block) [Eric] >>>> - added support to enable detect-zeroes also >>>> for hot added devices [Eric] >>>> - added missing entry to qemu_common_drive_opts >>>> - fixed description of qemu_iovec_is_zero [Fam] >>>> >>>> v1->v2: - added tests to commit message (Markus) >>>> RFCv2->v1: - fixed paramter parsing strncmp -> strcmp (Eric) >>>> - fixed typo (choosen->chosen) (Eric) >>>> - added second example to commit msg >>>> >>>> RFCv1->RFCv2: - add detect-zeroes=off|on|unmap knob to drive cmdline >>>> parameter >>>> - call zero detection only for format (bs->file != NULL) >>>> >>>> block.c | 11 ++++++++++ >>>> block/qapi.c | 1 + >>>> blockdev.c | 34 +++++++++++++++++++++++++++++ >>>> hmp.c | 5 +++++ >>>> include/block/block_int.h | 1 + >>>> include/qemu-common.h | 1 + >>>> qapi-schema.json | 52 >>>> ++++++++++++++++++++++++++++++++------------- >>>> qemu-options.hx | 6 ++++++ >>>> qmp-commands.hx | 3 +++ >>>> util/iov.c | 21 ++++++++++++++++++ >>>> 10 files changed, 120 insertions(+), 15 deletions(-) >>>> >>>> diff --git a/block.c b/block.c >>>> index b749d31..aea4c77 100644 >>>> --- a/block.c >>>> +++ b/block.c >>>> @@ -3244,6 +3244,17 @@ static int coroutine_fn >>>> bdrv_aligned_pwritev(BlockDriverState *bs, >>>> >>>> ret = notifier_with_return_list_notify(&bs->before_write_notifiers, >>>> req); >>>> >>>> + if (!ret && bs->detect_zeroes != BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF && >>>> + !(flags & BDRV_REQ_ZERO_WRITE) && bs->file && >>>> + drv->bdrv_co_write_zeroes && qemu_iovec_is_zero(qiov)) { >>> Pretty long condition. :-) >>> >>> Looks like most is obviously needed, but I wonder what the bs->file part >>> is good for? That looks rather arbitrary. >> What I wanted to achieve is that this condition is only true if we handle >> the format (e.g. raw, qcow2, vmdk etc.). If e.g. qcow2 then sends a >> zero write this should always end on the disk and should not be optimizable. > But why? This means setting an arbitrary policy for no good reason. You > already have an option, and it already defaults to off, so unless > someone specifically enables it for bs->file, we don't do the > optimisation. But if someone wants to have it on bs->file, what reason > is there to ignore that request?
I was erroneously thinking that the setting would be inherited to bs->file. I will remove the bs->file from the condition. > >>>> + flags |= BDRV_REQ_ZERO_WRITE; >>>> + /* if the device was not opened with discard=on the below flag >>>> + * is immediately cleared again in bdrv_co_do_write_zeroes */ >>> Is it? I only see it being cleared in bdrv_co_write_zeroes(), but that's >>> not a function that seems to be called from here. >> You are right. Question, do we want to support detect_zeroes = unmap >> if discard = ignore? If yes, I have to update the docs. Otherwise >> I have to check for BDRV_O_DISCARD before setting BDRV_REQ_MAY_UNMAP. > I think it would be reasonable enough to just error out when you try to > open an image with detect_zeroes=unmap,discard=ignore. > > Can these flags be changed during runtime? If so, we need to check there > as well. No, but you can specify them during hot add. Same as with discard. Peter > >>>> + if (bs->detect_zeroes == BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP) { >>>> + flags |= BDRV_REQ_MAY_UNMAP; >>>> + } >>>> + } >>>> + >>>> if (ret < 0) { >>>> /* Do nothing, write notifier decided to fail this request */ >>>> } else if (flags & BDRV_REQ_ZERO_WRITE) { > Kevin