backup: teach backup_cow_with_bounce_buffer to copy more at once

Vladimir Sementsov-Ogievskiy Tue, 13 Aug 2019 08:02:15 -0700

13.08.2019 17:57, Max Reitz wrote:
> On 13.08.19 16:39, Vladimir Sementsov-Ogievskiy wrote:
>> 13.08.2019 17:23, Max Reitz wrote:
>>> On 13.08.19 16:14, Vladimir Sementsov-Ogievskiy wrote:
>>>> 12.08.2019 19:37, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 12.08.2019 19:11, Max Reitz wrote:
>>>>>> On 12.08.19 17:47, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>> 12.08.2019 18:10, Max Reitz wrote:
>>>>>>>> On 10.08.19 21:31, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>>> backup_cow_with_offload can transfer more than one cluster. Let
>>>>>>>>> backup_cow_with_bounce_buffer behave similarly. It reduces the number
>>>>>>>>> of IO requests, since there is no need to copy cluster by cluster.
>>>>>>>>>
>>>>>>>>> Logic around bounce_buffer allocation changed: we can't just allocate
>>>>>>>>> one-cluster-sized buffer to share for all iterations. We can't also
>>>>>>>>> allocate buffer of full-request length it may be too large, so
>>>>>>>>> BACKUP_MAX_BOUNCE_BUFFER is introduced. And finally, allocation logic
>>>>>>>>> is to allocate a buffer sufficient to handle all remaining iterations
>>>>>>>>> at the point where we need the buffer for the first time.
>>>>>>>>>
>>>>>>>>> Bonus: get rid of pointer-to-pointer.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com>
>>>>>>>>> ---
>>>>>>>>>      block/backup.c | 65 
>>>>>>>>> +++++++++++++++++++++++++++++++-------------------
>>>>>>>>>      1 file changed, 41 insertions(+), 24 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/block/backup.c b/block/backup.c
>>>>>>>>> index d482d93458..65f7212c85 100644
>>>>>>>>> --- a/block/backup.c
>>>>>>>>> +++ b/block/backup.c
>>>>>>>>> @@ -27,6 +27,7 @@
>>>>>>>>>      #include "qemu/error-report.h"
>>>>>>>>>      #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
>>>>>>>>> +#define BACKUP_MAX_BOUNCE_BUFFER (64 * 1024 * 1024)
>>>>>>>>>      typedef struct CowRequest {
>>>>>>>>>          int64_t start_byte;
>>>>>>>>> @@ -98,44 +99,55 @@ static void cow_request_end(CowRequest *req)
>>>>>>>>>          qemu_co_queue_restart_all(&req->wait_queue);
>>>>>>>>>      }
>>>>>>>>> -/* Copy range to target with a bounce buffer and return the bytes 
>>>>>>>>> copied. If
>>>>>>>>> - * error occurred, return a negative error number */
>>>>>>>>> +/*
>>>>>>>>> + * Copy range to target with a bounce buffer and return the bytes 
>>>>>>>>> copied. If
>>>>>>>>> + * error occurred, return a negative error number
>>>>>>>>> + *
>>>>>>>>> + * @bounce_buffer is assumed to enough to store
>>>>>>>>
>>>>>>>> s/to/to be/
>>>>>>>>
>>>>>>>>> + * MIN(BACKUP_MAX_BOUNCE_BUFFER, @end - @start) bytes
>>>>>>>>> + */
>>>>>>>>>      static int coroutine_fn 
>>>>>>>>> backup_cow_with_bounce_buffer(BackupBlockJob *job,
>>>>>>>>>                                                            int64_t 
>>>>>>>>> start,
>>>>>>>>>                                                            int64_t 
>>>>>>>>> end,
>>>>>>>>>                                                            bool 
>>>>>>>>> is_write_notifier,
>>>>>>>>>                                                            bool 
>>>>>>>>> *error_is_read,
>>>>>>>>> -                                                      void 
>>>>>>>>> **bounce_buffer)
>>>>>>>>> +                                                      void 
>>>>>>>>> *bounce_buffer)
>>>>>>>>>      {
>>>>>>>>>          int ret;
>>>>>>>>>          BlockBackend *blk = job->common.blk;
>>>>>>>>> -    int nbytes;
>>>>>>>>> +    int nbytes, remaining_bytes;
>>>>>>>>>          int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING 
>>>>>>>>> : 0;
>>>>>>>>>          assert(QEMU_IS_ALIGNED(start, job->cluster_size));
>>>>>>>>> -    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, 
>>>>>>>>> job->cluster_size);
>>>>>>>>> -    nbytes = MIN(job->cluster_size, job->len - start);
>>>>>>>>> -    if (!*bounce_buffer) {
>>>>>>>>> -        *bounce_buffer = blk_blockalign(blk, job->cluster_size);
>>>>>>>>> -    }
>>>>>>>>> +    bdrv_reset_dirty_bitmap(job->copy_bitmap, start, end - start);
>>>>>>>>> +    nbytes = MIN(end - start, job->len - start);
>>>>>>>>> -    ret = blk_co_pread(blk, start, nbytes, *bounce_buffer, 
>>>>>>>>> read_flags);
>>>>>>>>> -    if (ret < 0) {
>>>>>>>>> -        trace_backup_do_cow_read_fail(job, start, ret);
>>>>>>>>> -        if (error_is_read) {
>>>>>>>>> -            *error_is_read = true;
>>>>>>>>> +
>>>>>>>>> +    remaining_bytes = nbytes;
>>>>>>>>> +    while (remaining_bytes) {
>>>>>>>>> +        int chunk = MIN(BACKUP_MAX_BOUNCE_BUFFER, remaining_bytes);
>>>>>>>>> +
>>>>>>>>> +        ret = blk_co_pread(blk, start, chunk, bounce_buffer, 
>>>>>>>>> read_flags);
>>>>>>>>> +        if (ret < 0) {
>>>>>>>>> +            trace_backup_do_cow_read_fail(job, start, ret);
>>>>>>>>> +            if (error_is_read) {
>>>>>>>>> +                *error_is_read = true;
>>>>>>>>> +            }
>>>>>>>>> +            goto fail;
>>>>>>>>>              }
>>>>>>>>> -        goto fail;
>>>>>>>>> -    }
>>>>>>>>> -    ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer,
>>>>>>>>> -                        job->write_flags);
>>>>>>>>> -    if (ret < 0) {
>>>>>>>>> -        trace_backup_do_cow_write_fail(job, start, ret);
>>>>>>>>> -        if (error_is_read) {
>>>>>>>>> -            *error_is_read = false;
>>>>>>>>> +        ret = blk_co_pwrite(job->target, start, chunk, bounce_buffer,
>>>>>>>>> +                            job->write_flags);
>>>>>>>>> +        if (ret < 0) {
>>>>>>>>> +            trace_backup_do_cow_write_fail(job, start, ret);
>>>>>>>>> +            if (error_is_read) {
>>>>>>>>> +                *error_is_read = false;
>>>>>>>>> +            }
>>>>>>>>> +            goto fail;
>>>>>>>>>              }
>>>>>>>>> -        goto fail;
>>>>>>>>> +
>>>>>>>>> +        start += chunk;
>>>>>>>>> +        remaining_bytes -= chunk;
>>>>>>>>>          }
>>>>>>>>>          return nbytes;
>>>>>>>>> @@ -301,9 +313,14 @@ static int coroutine_fn 
>>>>>>>>> backup_do_cow(BackupBlockJob *job,
>>>>>>>>>                  }
>>>>>>>>>              }
>>>>>>>>>              if (!job->use_copy_range) {
>>>>>>>>> +            if (!bounce_buffer) {
>>>>>>>>> +                size_t len = MIN(BACKUP_MAX_BOUNCE_BUFFER,
>>>>>>>>> +                                 MAX(dirty_end - start, end - 
>>>>>>>>> dirty_end));
>>>>>>>>> +                bounce_buffer = blk_try_blockalign(job->common.blk, 
>>>>>>>>> len);
>>>>>>>>> +            }
>>>>>>>>
>>>>>>>> If you use _try_, you should probably also check whether it succeeded.
>>>>>>>
>>>>>>> Oops, you are right, of course.
>>>>>>>
>>>>>>>>
>>>>>>>> Anyway, I wonder whether it’d be better to just allocate this buffer
>>>>>>>> once per job (the first time we get here, probably) to be of size
>>>>>>>> BACKUP_MAX_BOUNCE_BUFFER and put it into BackupBlockJob.  (And maybe 
>>>>>>>> add
>>>>>>>> a buf-size parameter similar to what the mirror jobs have.)
>>>>>>>>
>>>>>>>
>>>>>>> Once per job will not work, as we may have several guest writes in 
>>>>>>> parallel and therefore
>>>>>>> several parallel copy-before-write operations.
>>>>>>
>>>>>> Hm.  I’m not quite happy with that because if the guest just issues many
>>>>>> large discards in parallel, this means that qemu will allocate a large
>>>>>> amount of memory.
>>>>>>
>>>>>> It would be nice if there was a simple way to keep track of the total
>>>>>> memory usage and let requests yield if they would exceed it.
>>>>>
>>>>> Agree, it should be fixed anyway.
>>>>>
>>>>
>>>>
>>>> But still..
>>>>
>>>> Synchronous mirror allocates full-request buffers on guest write. Is it 
>>>> correct?
>>>>
>>>> If we assume that it is correct to double memory usage of guest 
>>>> operations, than for backup
>>>> the problem is only in write_zero and discard where guest-assumed memory 
>>>> usage should be zero.
>>>
>>> Well, but that is the problem.  I didn’t say anything in v2, because I
>>> only thought of normal writes and I found it fine to double the memory
>>> usage there (a guest won’t issue huge write requests in parallel).  But
>>> discard/write-zeroes are a different matter.
>>>
>>>> And if we should distinguish writes from write_zeroes and discard, it's 
>>>> better to postpone this
>>>> improvement to be after backup-top filter merged.
>>>
>>> But do you need to distinguish it?  Why not just keep track of memory
>>> usage and put the current I/O coroutine to sleep in a CoQueue or
>>> something, and wake that up at the end of backup_do_cow()?
>>>
>>
>> 1. Because if we _can_ allow doubling of memory, it's more effective to not 
>> restrict allocations on
>> guest writes. It's just seems to be more effective technique.
> 
> But the problem with backup and zero writes/discards is that the memory
> is not doubled.  The request doesn’t need any memory, but the CBW
> operation does, and maybe lots of it.
> 
> So the guest may issue many zero writes/discards in parallel and thus
> exhaust memory on the host.


So this is the reason to separate writes from write-zeros/discrads. So at least 
write will be happy. And I
think that write is more often request than write-zero/discard

> 
>> 2. Anyway, I'd allow some always-available size to allocate - let it be one 
>> cluster, which will correspond
>> to current behavior and prevent guest io hang in worst case.
> 
> The guest would only hang if it we have to copy more than e.g. 64 MB at
> a time.  At which point I think it’s not unreasonable to sequentialize
> requests.
> 



-- 
Best regards,
Vladimir

Re: [Qemu-devel] [PATCH v3 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once

Reply via email to