Recently we had some reports about stuck full backups: running backup
job had 0 progress for more than an hour. This reproduced only on big
images, at least 30TB of allocated space.
We are using snapshot filter + cbw filter + blockdev-backup. The
discovered problem lies in access bitmap in cbw filter.
To build bcs bitmap for backup, the job repeatedly asks for block
status with a range (X, virtual_size). As we are accessing the disk
via snapshot filter, we end up in cbw_co_snapshot_block_status()->
cbw_snapshot_read_lock(). Here we check that access bitmap does not
have any zeroes in this range. After that, we ask the disk.
Firstly, we always look for zero in access bitmap for the whole
range. This is rather slow, because it is full and we need to scan
every bit in a lower level. Secondly, the following block status call
to the disk may return not-very-high amount of continious clusters,
for example 512 (empirical value, guessing for the 'full' image it is
4K/clu_addr_size for qcow2 driver).
This way we are doomed to re-scan access bitmap on the next block
status request (X + 512, virtual_size).

perf tracing example:
  96.67%          9581  qemu-kvm         qemu-kvm                    [.] 
hbitmap_next_zero
   0.33%            32  qemu-kvm         qemu-kvm                    [.] 
qcow2_cache_do_get.lto_priv.0
   0.10%            10  qemu-kvm         [kernel.vmlinux]            [k] 
perf_adjust_freq_unthr_context
   0.08%             8  qemu-kvm         qemu-kvm                    [.] 
qcow2_get_host_offset

This can be clearly observed on the image with small clu_size=65536
and preallocated metadata.
size                   10T   11T
blockdev-backup        52s   57s
cbw + snap             325s  413s
cbw + snap + patches   55s   61s

The growth is also non-linear in this case, +10% size results in
+20% time.

This patchset changes access-bitmap into 'deny' bitmap by reversing
the bits within and making the code look for set bits. It is much
faster due to hbitmap levels.

Also
 - update iotest 257: now access bitmap on the cbw filter is empty,
so count is 0 instead of 67108864 (which is the image size)
 - remove meta betmap leftovers
 - report block_status() until the end of accessible section to
snapshot filter, instead of returning EINVAL on big requests

Andrey Zhadchenko (4):
  hbitmap: drop meta bitmap leftovers
  hbitmap: introduce hbitmap_reverse()
  block/copy-before-write: reverse access bitmap
  block/copy-before-write: report partial block status to snapshot

 block/copy-before-write.c    | 33 +++++++++++++++++++++------------
 block/dirty-bitmap.c         |  9 +++++++++
 include/block/block_int-io.h |  1 +
 include/qemu/hbitmap.h       |  8 ++++++++
 tests/qemu-iotests/257.out   | 28 ++++++++++++++--------------
 util/hbitmap.c               | 32 +++++++++++++++++---------------
 6 files changed, 70 insertions(+), 41 deletions(-)

-- 
2.43.0


Reply via email to