Re: Avoid copying unallocated clusters during full backup

Vladimir Sementsov-Ogievskiy Mon, 20 Apr 2020 08:05:26 -0700

20.04.2020 17:31, Bryan S Rosenburg wrote:

Vladimir, thank you for outlining the current state of affairs regarding 
efficient backup. I'd like to describe what we know about the image-expansion 
problem we're having using the current (qemu 4.2.0) code, just to be sure that 
your work is addressing it.


In our use case, the image-expansion problem occurs only when the source disk 
file and the target backup file are in different file systems. Both files are 
qcow2 files, and as long as they both reside in the same file system, the 
target file winds up with roughly the same size as the source. But if the 
target is in another file system (we've tried a second ext4 hard disk file 
system, a tmpfs file system, and fuse-based file systems such as s3fs), the 
target ends up with a size comparable to the nominal size of the source disk.

I think the expansion is related to this comment in qemu/include/block/block.h:

/**
* bdrv_co_copy_range:
. . . .
* Note: block layer doesn't emulate or fallback to a bounce buffer approach
* because usually the caller shouldn't attempt offloaded copy any more (e.g.
* calling copy_file_range(2)) after the first error, thus it should fall back
* to a read+write path in the caller level.



The bdrv_co_copy_range() service does the right things with respect to skipping 
unallocated ranges in the source disk and not writing zeros to the target. But 
qemu gives up on using this service the first time an underlying 
copy_file_range() system call fails, and copy_file_range() always fails with 
EXDEV when the source and destination files are on different file systems. In 
this specific case (at least), I think that falling back to a bounce buffer 
approach would make sense so that we don't lose the rest of the logic in 
bdrv_co_copy_range. As it is, qemu falls back on a very high-level loop reading 
from the source and writing to the target. At this high level, reading an 
unallocated range from the source simply returns a buffer full of zeroes, with 
no indication that the range was unallocated. The zeroes are then written to 
the target as if they were real data.

As a quick experiment, I tried a very localized fallback when copy_file_range 
returns EXDEV in handle_aiocb_copy_range() in qemu/block/file-posix.c. It's not 
a great fix because it has to allocate and free a buffer on the spot and it 
does not head off future calls to copy_file_range that will also fail, but it 
does fix the image-expansion problem when crossing file systems. I can provide 
the patch if anyone wants to see it.

I just wanted to get this aspect of the problem onto the table, to make sure it 
gets addressed in the current rework. Maybe it's a non-issue already.


Yes, the problem is that copy_range subsystem handles block-status, when 
generic backup copying loop doesn't. I'm not sure that adding fallback into 
copy-range is a correct thing to do, at least it should be optional, enabled by 
flag.. But you don't need it for your problem,
as it is already fixed upstream:

You need to backport my commit 2d57511a88 "block/block-copy: use block_status" 
(together with 3 preparing patches before it, or with the whole series (including some 
refactoring after the 2d57511 commit)

Hope, it will help)

--
Best regards,
Vladimir

Re: Avoid copying unallocated clusters during full backup

Reply via email to