On Fri, Apr 14, 2017 at 2:38 PM, Stefan Hajnoczi <stefa...@gmail.com> wrote: > On Fri, Apr 14, 2017 at 7:00 AM, Fam Zheng <f...@redhat.com> wrote: >> On Thu, 04/13 10:34, jemmy858...@gmail.com wrote: >>> From: Lidong Chen <lidongc...@tencent.com> >>> >>> BLOCK_SIZE is (1 << 20), qcow2 cluster size is 65536 by default, >>> this may cause the qcow2 file size to be bigger after migration. >>> This patch checks each cluster, using blk_pwrite_zeroes for each >>> zero cluster. >>> >>> Reviewed-by: Stefan Hajnoczi <stefa...@redhat.com> >>> Signed-off-by: Lidong Chen <lidongc...@tencent.com> >>> --- >>> v6 changelog: >>> Fix up some grammar in the comment. >>> --- >>> migration/block.c | 35 +++++++++++++++++++++++++++++++++-- >>> 1 file changed, 33 insertions(+), 2 deletions(-) >>> >>> diff --git a/migration/block.c b/migration/block.c >>> index 7734ff7..41c7a55 100644 >>> --- a/migration/block.c >>> +++ b/migration/block.c >>> @@ -885,6 +885,8 @@ static int block_load(QEMUFile *f, void *opaque, int >>> version_id) >>> int64_t total_sectors = 0; >>> int nr_sectors; >>> int ret; >>> + BlockDriverInfo bdi; >>> + int cluster_size; >>> >>> do { >>> addr = qemu_get_be64(f); >>> @@ -919,6 +921,15 @@ static int block_load(QEMUFile *f, void *opaque, int >>> version_id) >>> error_report_err(local_err); >>> return -EINVAL; >>> } >>> + >>> + ret = bdrv_get_info(blk_bs(blk), &bdi); >>> + if (ret == 0 && bdi.cluster_size > 0 && >>> + bdi.cluster_size <= BLOCK_SIZE && >>> + BLOCK_SIZE % bdi.cluster_size == 0) { >>> + cluster_size = bdi.cluster_size; >>> + } else { >>> + cluster_size = BLOCK_SIZE; >>> + } >>> } >>> >>> if (total_sectors - addr < BDRV_SECTORS_PER_DIRTY_CHUNK) { >>> @@ -932,10 +943,30 @@ static int block_load(QEMUFile *f, void *opaque, int >>> version_id) >>> nr_sectors * BDRV_SECTOR_SIZE, >>> BDRV_REQ_MAY_UNMAP); >>> } else { >>> + int i; >>> + int64_t cur_addr; >>> + uint8_t *cur_buf; >>> + >>> buf = g_malloc(BLOCK_SIZE); >>> qemu_get_buffer(f, buf, BLOCK_SIZE); >>> - ret = blk_pwrite(blk, addr * BDRV_SECTOR_SIZE, buf, >>> - nr_sectors * BDRV_SECTOR_SIZE, 0); >>> + for (i = 0; i < BLOCK_SIZE / cluster_size; i++) { >>> + cur_addr = addr * BDRV_SECTOR_SIZE + i * cluster_size; >>> + cur_buf = buf + i * cluster_size; >>> + >>> + if ((!block_mig_state.zero_blocks || >>> + cluster_size < BLOCK_SIZE) && >>> + buffer_is_zero(cur_buf, cluster_size)) { >>> + ret = blk_pwrite_zeroes(blk, cur_addr, >>> + cluster_size, >>> + BDRV_REQ_MAY_UNMAP); >>> + } else { >>> + ret = blk_pwrite(blk, cur_addr, cur_buf, >>> + cluster_size, 0); >>> + } >>> + if (ret < 0) { >>> + break; >>> + } >>> + } >>> g_free(buf); >>> } >> >> Sorry for asking this question so late, but, before it gets too late: did you >> evaluate the performance impact of this change under real world workload? >> >> Effectively, if no cluster is zero, this patch still splits a big write into >> small ones, which is the opposition of usual performance optimizations (i.e. >> trying to coalesce requests). > > Good point! > > Another patch can modify the loop to perform the largest writes > possible. In other words, do not perform the write immediately and > keep a cluster counter instead. When the zero/non-zero state changes, > perform the write for the accumulated cluster count.
if the zero/non-zero state changes very frequently, it will not work. I also consider this way before i submit this patch. but i find the performance is almost the same for qcow2 which cluster_size is 65536. I worry about some other format which have very small cluster_size,for example 512. but i don't find. please tell me if you know, and i will test it. Do you think it's necessary if the size of the cluster_size is too small, we can cluster_size*N instead? Thanks. > > Stefan