Re: [Qemu-devel] [Qemu-block] [PATCH v6] migration/block: use blk_pwrite_zeroes for each zero cluster

858585 jemmy Thu, 13 Apr 2017 23:49:02 -0700

On Fri, Apr 14, 2017 at 2:38 PM, Stefan Hajnoczi <stefa...@gmail.com> wrote:
> On Fri, Apr 14, 2017 at 7:00 AM, Fam Zheng <f...@redhat.com> wrote:
>> On Thu, 04/13 10:34, jemmy858...@gmail.com wrote:
>>> From: Lidong Chen <lidongc...@tencent.com>
>>>
>>> BLOCK_SIZE is (1 << 20), qcow2 cluster size is 65536 by default,
>>> this may cause the qcow2 file size to be bigger after migration.
>>> This patch checks each cluster, using blk_pwrite_zeroes for each
>>> zero cluster.
>>>
>>> Reviewed-by: Stefan Hajnoczi <stefa...@redhat.com>
>>> Signed-off-by: Lidong Chen <lidongc...@tencent.com>
>>> ---
>>> v6 changelog:
>>> Fix up some grammar in the comment.
>>> ---
>>>  migration/block.c | 35 +++++++++++++++++++++++++++++++++--
>>>  1 file changed, 33 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/migration/block.c b/migration/block.c
>>> index 7734ff7..41c7a55 100644
>>> --- a/migration/block.c
>>> +++ b/migration/block.c
>>> @@ -885,6 +885,8 @@ static int block_load(QEMUFile *f, void *opaque, int 
>>> version_id)
>>>      int64_t total_sectors = 0;
>>>      int nr_sectors;
>>>      int ret;
>>> +    BlockDriverInfo bdi;
>>> +    int cluster_size;
>>>
>>>      do {
>>>          addr = qemu_get_be64(f);
>>> @@ -919,6 +921,15 @@ static int block_load(QEMUFile *f, void *opaque, int 
>>> version_id)
>>>                      error_report_err(local_err);
>>>                      return -EINVAL;
>>>                  }
>>> +
>>> +                ret = bdrv_get_info(blk_bs(blk), &bdi);
>>> +                if (ret == 0 && bdi.cluster_size > 0 &&
>>> +                    bdi.cluster_size <= BLOCK_SIZE &&
>>> +                    BLOCK_SIZE % bdi.cluster_size == 0) {
>>> +                    cluster_size = bdi.cluster_size;
>>> +                } else {
>>> +                    cluster_size = BLOCK_SIZE;
>>> +                }
>>>              }
>>>
>>>              if (total_sectors - addr < BDRV_SECTORS_PER_DIRTY_CHUNK) {
>>> @@ -932,10 +943,30 @@ static int block_load(QEMUFile *f, void *opaque, int 
>>> version_id)
>>>                                          nr_sectors * BDRV_SECTOR_SIZE,
>>>                                          BDRV_REQ_MAY_UNMAP);
>>>              } else {
>>> +                int i;
>>> +                int64_t cur_addr;
>>> +                uint8_t *cur_buf;
>>> +
>>>                  buf = g_malloc(BLOCK_SIZE);
>>>                  qemu_get_buffer(f, buf, BLOCK_SIZE);
>>> -                ret = blk_pwrite(blk, addr * BDRV_SECTOR_SIZE, buf,
>>> -                                 nr_sectors * BDRV_SECTOR_SIZE, 0);
>>> +                for (i = 0; i < BLOCK_SIZE / cluster_size; i++) {
>>> +                    cur_addr = addr * BDRV_SECTOR_SIZE + i * cluster_size;
>>> +                    cur_buf = buf + i * cluster_size;
>>> +
>>> +                    if ((!block_mig_state.zero_blocks ||
>>> +                        cluster_size < BLOCK_SIZE) &&
>>> +                        buffer_is_zero(cur_buf, cluster_size)) {
>>> +                        ret = blk_pwrite_zeroes(blk, cur_addr,
>>> +                                                cluster_size,
>>> +                                                BDRV_REQ_MAY_UNMAP);
>>> +                    } else {
>>> +                        ret = blk_pwrite(blk, cur_addr, cur_buf,
>>> +                                         cluster_size, 0);
>>> +                    }
>>> +                    if (ret < 0) {
>>> +                        break;
>>> +                    }
>>> +                }
>>>                  g_free(buf);
>>>              }
>>
>> Sorry for asking this question so late, but, before it gets too late: did you
>> evaluate the performance impact of this change under real world workload?
>>
>> Effectively, if no cluster is zero, this patch still splits a big write into
>> small ones, which is the opposition of usual performance optimizations (i.e.
>> trying to coalesce requests).
>
> Good point!
>
> Another patch can modify the loop to perform the largest writes
> possible.  In other words, do not perform the write immediately and
> keep a cluster counter instead.  When the zero/non-zero state changes,
> perform the write for the accumulated cluster count.


if the zero/non-zero state changes very frequently, it will not work.

I also consider this way before i submit this patch.
but i find the performance is almost the same for qcow2 which
cluster_size is 65536.

I worry about some other format which have very small cluster_size,for
example 512.
but i don't find. please tell me if you know, and i will test it.

Do you think it's necessary if the size of the cluster_size is too small, we
can cluster_size*N instead?

Thanks.


>
> Stefan

Re: [Qemu-devel] [Qemu-block] [PATCH v6] migration/block: use blk_pwrite_zeroes for each zero cluster

Reply via email to