Re: [Qemu-devel] [PATCH v2] migration/block: use blk_pwrite_zeroes for each zero cluster

858585 jemmy Sat, 08 Apr 2017 06:29:59 -0700

On Sat, Apr 8, 2017 at 12:52 PM, 858585 jemmy <jemmy858...@gmail.com> wrote:
> On Fri, Apr 7, 2017 at 6:10 PM, Fam Zheng <f...@redhat.com> wrote:
>> On Fri, 04/07 16:44, jemmy858...@gmail.com wrote:
>>> From: Lidong Chen <lidongc...@tencent.com>
>>>
>>> BLOCK_SIZE is (1 << 20), qcow2 cluster size is 65536 by default,
>>> this maybe cause the qcow2 file size is bigger after migration.
>>> This patch check each cluster, use blk_pwrite_zeroes for each
>>> zero cluster.
>>>
>>> Signed-off-by: Lidong Chen <lidongc...@tencent.com>
>>> ---
>>>  migration/block.c | 37 +++++++++++++++++++++++++++++++++++--
>>>  1 file changed, 35 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/migration/block.c b/migration/block.c
>>> index 7734ff7..c32e046 100644
>>> --- a/migration/block.c
>>> +++ b/migration/block.c
>>> @@ -885,6 +885,11 @@ static int block_load(QEMUFile *f, void *opaque, int 
>>> version_id)
>>>      int64_t total_sectors = 0;
>>>      int nr_sectors;
>>>      int ret;
>>> +    int i;
>>> +    int64_t addr_offset;
>>> +    uint8_t *buf_offset;
>>
>> Poor variable names, they are not offset, maybe "cur_addr" and "cur_buf"? And
>> they can be moved to the loop block below.
> ok, i will change.
>
>>
>>> +    BlockDriverInfo bdi;
>>> +    int cluster_size;
>>>
>>>      do {
>>>          addr = qemu_get_be64(f);
>>> @@ -934,8 +939,36 @@ static int block_load(QEMUFile *f, void *opaque, int 
>>> version_id)
>>>              } else {
>>>                  buf = g_malloc(BLOCK_SIZE);
>>>                  qemu_get_buffer(f, buf, BLOCK_SIZE);
>>> -                ret = blk_pwrite(blk, addr * BDRV_SECTOR_SIZE, buf,
>>> -                                 nr_sectors * BDRV_SECTOR_SIZE, 0);
>>> +
>>> +                ret = bdrv_get_info(blk_bs(blk), &bdi);
>>> +                cluster_size = bdi.cluster_size;
>>> +
>>> +                if (ret == 0 && cluster_size > 0 &&
>>> +                    cluster_size < BLOCK_SIZE &&
>>
>> I think cluster_size == BLOCK_SIZE should work too.
> This case the (flags & BLK_MIG_FLAG_ZERO_BLOCK) should be true,
> and will invoke blk_pwrite_zeroes before apply this patch.
> but maybe the source qemu maybe not enabled zero flag.
> so i think cluster_size <= BLOCK_SIZE is ok.
>
>>
>>> +                    BLOCK_SIZE % cluster_size == 0) {
>>> +                    for (i = 0; i < BLOCK_SIZE / cluster_size; i++) {
>>> +                        addr_offset = addr * BDRV_SECTOR_SIZE
>>> +                                        + i * cluster_size;
>>> +                        buf_offset = buf + i * cluster_size;
>>> +
>>> +                        if (buffer_is_zero(buf_offset, cluster_size)) {
>>> +                            ret = blk_pwrite_zeroes(blk, addr_offset,
>>> +                                                    cluster_size,
>>> +                                                    BDRV_REQ_MAY_UNMAP);
>>> +                        } else {
>>> +                             ret = blk_pwrite(blk, addr_offset, buf_offset,
>>> +                                              cluster_size, 0);
>>> +                        }
>>> +
>>> +                        if (ret < 0) {
>>> +                            g_free(buf);
>>> +                            return ret;
>>> +                        }
>>> +                    }
>>> +                } else {
>>> +                    ret = blk_pwrite(blk, addr * BDRV_SECTOR_SIZE, buf,
>>> +                                     nr_sectors * BDRV_SECTOR_SIZE, 0);
>>> +                }
>>>                  g_free(buf);
>>>              }
>>>
>>> --
>>> 1.8.3.1
>>>
>>
>> Is it possible use (source) cluster size as the transfer chunk size, instead 
>> of
>> BDRV_SECTORS_PER_DIRTY_CHUNK? Then the existing BLK_MIG_FLAG_ZERO_BLOCK logic
>> can help and you don't need to send zero bytes on the wire. This may still 
>> not
>> be optimal if dest has larger cluster, but it should cover the common use 
>> case
>> well.
>
> yes, i also think BDRV_SECTORS_PER_DIRTY_CHUNK is too large.
> This have two disadvantage:
> 1. it will cause the dest qcow2 file size is bigger after migration.
> 2. it will cause transfer not necessary data, and maybe cause the
> migration can't be successful.
>
> in my production environment, some vm only write 2MB/s, the dirty
> block migrate speed is 70MB/s.
> but it still migration timeout.
>
> but if we change the size of BDRV_SECTORS_PER_DIRTY_CHUNK, it will
> break the protocol.
> the old version qemu will not be able to migrate to new version qemu.
> there are not information about the length about the migration buffer.
>
> so i think we should add new flags to indicate that there are an
> additional byte about the length
> of migration buffer. i will send another patch later, and test the result.


Hi Fam:
Do we need consider the circumstances than migrate from new qemu version
to old qemu version?

>
> this patch is also valuable, there are many old version qemu in my
> production environment.
> and will be benefit with this patch.
>
>>
>> Fam

Re: [Qemu-devel] [PATCH v2] migration/block: use blk_pwrite_zeroes for each zero cluster

Reply via email to