On Tue, Apr 15, 2025 at 03:37:39PM +0300, Vladimir Sementsov-Ogievskiy wrote: > On 14.04.25 21:12, Eric Blake wrote: > > On Mon, Apr 14, 2025 at 08:05:21PM +0300, Vladimir Sementsov-Ogievskiy > > wrote: > > > On 11.04.25 04:04, Eric Blake wrote: > > > > The 'want_zero' parameter to raw_co_block_status() was added so that > > > > we can avoid potentially time-consuming lseek(SEEK_DATA) calls > > > > throughout the file (working around poor filesystems that have O(n) > > > > rather than O(1) extent probing). But when it comes to learning if a > > > > file is completely sparse (for example, it was just created), always > > > > claiming that a file is all data without even checking offset 0 breaks > > > > what would otherwise be attempts at useful optimizations for a > > > > known-zero mirror destination. > > > > > > > > Note that this allows file-posix to report a file as completely zero > > > > if it was externally created (such as via 'truncate --size=$n file') > > > > as entirely sparse; however, it does NOT work for files created > > > > internally by blockdev-create. That's because blockdev-create > > > > intentionally does a sequence of truncate(0), truncate(size), > > > > allocate_first_block(), in order to make it possible for gluster on > > > > XFS to probe the sector size for direct I/O (which doesn't work if the > > > > first block is sparse). That will be addressed in a later patch. > > > > > > > > Signed-off-by: Eric Blake<ebl...@redhat.com> > > > > --- > > > > block/file-posix.c | 9 ++++++++- > > > > 1 file changed, 8 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/block/file-posix.c b/block/file-posix.c > > > > index 56d1972d156..67e83528cf5 100644 > > > > --- a/block/file-posix.c > > > > +++ b/block/file-posix.c > > > > @@ -3217,7 +3217,14 @@ static int coroutine_fn > > > > raw_co_block_status(BlockDriverState *bs, > > > > return ret; > > > > } > > > > > > > > - if (!want_zero) { > > > > + /* > > > > + * If want_zero is clear, then the caller wants speed over > > > > + * accuracy, and the only place where SEEK_DATA should be > > > > + * attempted is at the start of the file to learn if the file has > > > > + * any data at all (anywhere else, just blindly claim the entire > > > > + * file is data). > > > > + */ > > > > + if (!want_zero && offset) { > > > > *pnum = bytes; > > > > *map = offset; > > > > *file = bs; > > > Looks like a hack. So we have bdrv_co_is_zero_fast() which do pass > > > want_zero=false to block-status. But in case of mirror, which want to > > > check the whole disk, we actually want want_zero=true, and detect it by > > > offset=0.. > > > > > > Isn't it better to add a kind of bdrv_is_zero_middle_speed() (which > > > means, don't try to read the data to check, but be free to use suboptimal > > > lseek call or something like this), which will pass want_zero=true, and > > > use it from mirror? Mirror case differs from usage in qcow2 exactly by > > > the fact that we call it only once. > > Which is exactly why I wrote patch 4/6 turning the want_zero bool into > > an enum so that we are being more explicit in WHY block status is > > being requested. > > Hmm, and this makes more strange that this hack for file-posix is kept after > it. Don't we have other block drivers, where we should behave similarly in > block_status for offset=0? Or I mean, could we just use different > block-status modes in qcow2 and mirror, when call bdrv_co_is_zero_fast(), and > don't handle offset==0 specially in file-posix?
I'll need to ajust this in v2 of my series. The problem I'm trying to resolve: libvirt migration with --migrate-disks-detect-zeroes is causing the destination to become fully allocated if the destination does not use "discard":"unmap", whereas with QEMU 9.0 it did not. The change was commit d05ae948c, where we fixed a problem with punching holes into the mirror destination even when the user had requested the file to not shrink; but it means that a file that is not allowed to shrink is now fully allocated rather than left sparse even when leaving it sparse would still be an accurate mirror result with less I/O. And it turns out that when libvirt is driving the mirror into a raw destination, the mirroring is done over NBD, rather than directly to a file-posix destination. Coupled with the oddity that 'qemu-img create -f raw' ends up pre-allocating up to 64k of the image with all zero contents to make alignment probing easier, it is no longer possible to quickly see that the entire NBD image reads as zero, just as it was not possible to quickly see that for file-posix. So, in v2 of my patch series, I think I need to hoist the logic that I added to file-posix.c that reads an initial small sector in order to determine if the entire image is zero to instead live in io.c to be used across all protocols, NBD included. So if you don't think offset=0 should be given any special treatment in file-posix, but instead have generic code in io.c that checks if an entire image has block_status of sparse and/or just a small initial non-sparse section that can be manually checked for zero contents, then that should still fix the issue of mirroring to a sparse destination with "discard":"ignore" but keeping the destination sparse. -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org