13.08.2019 17:23, Max Reitz wrote: > On 13.08.19 16:14, Vladimir Sementsov-Ogievskiy wrote: >> 12.08.2019 19:37, Vladimir Sementsov-Ogievskiy wrote: >>> 12.08.2019 19:11, Max Reitz wrote: >>>> On 12.08.19 17:47, Vladimir Sementsov-Ogievskiy wrote: >>>>> 12.08.2019 18:10, Max Reitz wrote: >>>>>> On 10.08.19 21:31, Vladimir Sementsov-Ogievskiy wrote: >>>>>>> backup_cow_with_offload can transfer more than one cluster. Let >>>>>>> backup_cow_with_bounce_buffer behave similarly. It reduces the number >>>>>>> of IO requests, since there is no need to copy cluster by cluster. >>>>>>> >>>>>>> Logic around bounce_buffer allocation changed: we can't just allocate >>>>>>> one-cluster-sized buffer to share for all iterations. We can't also >>>>>>> allocate buffer of full-request length it may be too large, so >>>>>>> BACKUP_MAX_BOUNCE_BUFFER is introduced. And finally, allocation logic >>>>>>> is to allocate a buffer sufficient to handle all remaining iterations >>>>>>> at the point where we need the buffer for the first time. >>>>>>> >>>>>>> Bonus: get rid of pointer-to-pointer. >>>>>>> >>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com> >>>>>>> --- >>>>>>> block/backup.c | 65 >>>>>>> +++++++++++++++++++++++++++++++------------------- >>>>>>> 1 file changed, 41 insertions(+), 24 deletions(-) >>>>>>> >>>>>>> diff --git a/block/backup.c b/block/backup.c >>>>>>> index d482d93458..65f7212c85 100644 >>>>>>> --- a/block/backup.c >>>>>>> +++ b/block/backup.c >>>>>>> @@ -27,6 +27,7 @@ >>>>>>> #include "qemu/error-report.h" >>>>>>> #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16) >>>>>>> +#define BACKUP_MAX_BOUNCE_BUFFER (64 * 1024 * 1024) >>>>>>> typedef struct CowRequest { >>>>>>> int64_t start_byte; >>>>>>> @@ -98,44 +99,55 @@ static void cow_request_end(CowRequest *req) >>>>>>> qemu_co_queue_restart_all(&req->wait_queue); >>>>>>> } >>>>>>> -/* Copy range to target with a bounce buffer and return the bytes >>>>>>> copied. If >>>>>>> - * error occurred, return a negative error number */ >>>>>>> +/* >>>>>>> + * Copy range to target with a bounce buffer and return the bytes >>>>>>> copied. If >>>>>>> + * error occurred, return a negative error number >>>>>>> + * >>>>>>> + * @bounce_buffer is assumed to enough to store >>>>>> >>>>>> s/to/to be/ >>>>>> >>>>>>> + * MIN(BACKUP_MAX_BOUNCE_BUFFER, @end - @start) bytes >>>>>>> + */ >>>>>>> static int coroutine_fn >>>>>>> backup_cow_with_bounce_buffer(BackupBlockJob *job, >>>>>>> int64_t start, >>>>>>> int64_t end, >>>>>>> bool >>>>>>> is_write_notifier, >>>>>>> bool >>>>>>> *error_is_read, >>>>>>> - void >>>>>>> **bounce_buffer) >>>>>>> + void >>>>>>> *bounce_buffer) >>>>>>> { >>>>>>> int ret; >>>>>>> BlockBackend *blk = job->common.blk; >>>>>>> - int nbytes; >>>>>>> + int nbytes, remaining_bytes; >>>>>>> int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : >>>>>>> 0; >>>>>>> assert(QEMU_IS_ALIGNED(start, job->cluster_size)); >>>>>>> - bdrv_reset_dirty_bitmap(job->copy_bitmap, start, >>>>>>> job->cluster_size); >>>>>>> - nbytes = MIN(job->cluster_size, job->len - start); >>>>>>> - if (!*bounce_buffer) { >>>>>>> - *bounce_buffer = blk_blockalign(blk, job->cluster_size); >>>>>>> - } >>>>>>> + bdrv_reset_dirty_bitmap(job->copy_bitmap, start, end - start); >>>>>>> + nbytes = MIN(end - start, job->len - start); >>>>>>> - ret = blk_co_pread(blk, start, nbytes, *bounce_buffer, read_flags); >>>>>>> - if (ret < 0) { >>>>>>> - trace_backup_do_cow_read_fail(job, start, ret); >>>>>>> - if (error_is_read) { >>>>>>> - *error_is_read = true; >>>>>>> + >>>>>>> + remaining_bytes = nbytes; >>>>>>> + while (remaining_bytes) { >>>>>>> + int chunk = MIN(BACKUP_MAX_BOUNCE_BUFFER, remaining_bytes); >>>>>>> + >>>>>>> + ret = blk_co_pread(blk, start, chunk, bounce_buffer, >>>>>>> read_flags); >>>>>>> + if (ret < 0) { >>>>>>> + trace_backup_do_cow_read_fail(job, start, ret); >>>>>>> + if (error_is_read) { >>>>>>> + *error_is_read = true; >>>>>>> + } >>>>>>> + goto fail; >>>>>>> } >>>>>>> - goto fail; >>>>>>> - } >>>>>>> - ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer, >>>>>>> - job->write_flags); >>>>>>> - if (ret < 0) { >>>>>>> - trace_backup_do_cow_write_fail(job, start, ret); >>>>>>> - if (error_is_read) { >>>>>>> - *error_is_read = false; >>>>>>> + ret = blk_co_pwrite(job->target, start, chunk, bounce_buffer, >>>>>>> + job->write_flags); >>>>>>> + if (ret < 0) { >>>>>>> + trace_backup_do_cow_write_fail(job, start, ret); >>>>>>> + if (error_is_read) { >>>>>>> + *error_is_read = false; >>>>>>> + } >>>>>>> + goto fail; >>>>>>> } >>>>>>> - goto fail; >>>>>>> + >>>>>>> + start += chunk; >>>>>>> + remaining_bytes -= chunk; >>>>>>> } >>>>>>> return nbytes; >>>>>>> @@ -301,9 +313,14 @@ static int coroutine_fn >>>>>>> backup_do_cow(BackupBlockJob *job, >>>>>>> } >>>>>>> } >>>>>>> if (!job->use_copy_range) { >>>>>>> + if (!bounce_buffer) { >>>>>>> + size_t len = MIN(BACKUP_MAX_BOUNCE_BUFFER, >>>>>>> + MAX(dirty_end - start, end - >>>>>>> dirty_end)); >>>>>>> + bounce_buffer = blk_try_blockalign(job->common.blk, >>>>>>> len); >>>>>>> + } >>>>>> >>>>>> If you use _try_, you should probably also check whether it succeeded. >>>>> >>>>> Oops, you are right, of course. >>>>> >>>>>> >>>>>> Anyway, I wonder whether it’d be better to just allocate this buffer >>>>>> once per job (the first time we get here, probably) to be of size >>>>>> BACKUP_MAX_BOUNCE_BUFFER and put it into BackupBlockJob. (And maybe add >>>>>> a buf-size parameter similar to what the mirror jobs have.) >>>>>> >>>>> >>>>> Once per job will not work, as we may have several guest writes in >>>>> parallel and therefore >>>>> several parallel copy-before-write operations. >>>> >>>> Hm. I’m not quite happy with that because if the guest just issues many >>>> large discards in parallel, this means that qemu will allocate a large >>>> amount of memory. >>>> >>>> It would be nice if there was a simple way to keep track of the total >>>> memory usage and let requests yield if they would exceed it. >>> >>> Agree, it should be fixed anyway. >>> >> >> >> But still.. >> >> Synchronous mirror allocates full-request buffers on guest write. Is it >> correct? >> >> If we assume that it is correct to double memory usage of guest operations, >> than for backup >> the problem is only in write_zero and discard where guest-assumed memory >> usage should be zero. > > Well, but that is the problem. I didn’t say anything in v2, because I > only thought of normal writes and I found it fine to double the memory > usage there (a guest won’t issue huge write requests in parallel). But > discard/write-zeroes are a different matter. > >> And if we should distinguish writes from write_zeroes and discard, it's >> better to postpone this >> improvement to be after backup-top filter merged. > > But do you need to distinguish it? Why not just keep track of memory > usage and put the current I/O coroutine to sleep in a CoQueue or > something, and wake that up at the end of backup_do_cow()? >
1. Because if we _can_ allow doubling of memory, it's more effective to not restrict allocations on guest writes. It's just seems to be more effective technique. 2. Anyway, I'd allow some always-available size to allocate - let it be one cluster, which will correspond to current behavior and prevent guest io hang in worst case. So I mean, if we have enough memory allow individual CBW operation to allocate the whole buffer, and if we have no extra memory allow to allocate one cluster. -- Best regards, Vladimir