On 21/07/16 12:43, Dave Chinner wrote: > On Wed, Jul 20, 2016 at 03:35:17PM +0200, Niels de Vos wrote: >> On Wed, Jul 20, 2016 at 10:30:25PM +1000, Dave Chinner wrote: >>> On Wed, Jul 20, 2016 at 05:19:37AM -0400, Paolo Bonzini wrote: >>>> Adding ext4 and XFS guys (Lukas and Dave respectively). As a quick recap, >>>> the >>>> issue here is the semantics of FIEMAP and SEEK_HOLE/SEEK_DATA, which we >>>> use in >>>> "qemu-img map". This command prints metadata about a virtual disk >>>> image---which >>>> in the case of a raw image amounts to detecting holes and unwritten >>>> extents. >>>> >>>> First, it seems like SEEK_HOLE and SEEK_DATA really should be >>>> "SEEK_NONZERO" and >>>> "SEEK_ZERO", on both ext4 and XFS. You can see that unwritten extents >>>> are >>>> reported by "qemu-img map" as holes: >>> >>> Correctly so. seek hole/data knows nothing about the underlying >>> filesystem storage implementation. A "hole" is defined as a region >>> of zeros, and the filesystem is free call anything it kows for >>> certain contains zeros as a hole. >>> >>> FYI, SEEK_HOLE/DATA were named after the the pre-existing Solaris >>> seek API that uses these definitions for hole and data.... >>> >>>> $ dd if=/dev/urandom of=test.img bs=1M count=100 >>>> $ fallocate -z -o 10M -l 10M test.img >>>> $ du -h test.img >>>> $ qemu-img map --output=json test.img >>>> [{ "start": 0, "length": 10485760, "depth": 0, "zero": false, "data": >>>> true, "offset": 0}, >>>> { "start": 10485760, "length": 10485760, "depth": 0, "zero": true, >>>> "data": false, "offset": 10485760}, >>>> { "start": 20971520, "length": 83886080, "depth": 0, "zero": false, >>>> "data": true, "offset": 20971520}] >>> >>>> >>>> On the second line, zero=true data=false identifies a hole. The right >>>> output >>>> would either have zero=true data=true (unwritten extent) or just >>>> >>>> [{ "start": 0, "length": 104857600, "depth": 0, "zero": false, "data": >>>> true, "offset": 0}, >>>> >>>> since the zero flag is advisory (it doesn't try to detect zeroes beyond >>>> what the >>>> filesystem says). >>> >>> Not from SEEK_HOLE/SEEK_DATA. All it conveys is "this is the start >>> of a range of zeros" and "this is the start of a range of data". And >>> for filesystems that don't specifically implement these seek >>> operations, zeros are considered data. i.e. SEEK_HOLE will take you >>> to the end of file, SEEK_DATA returns the current position.... >>> >>> i.e. unwritten extents contain no data, so they are semantically >>> identical to holes for the purposes of seeking and hence SEEK_DATA >>> can skip over them. >>> >>>> The reason why we disabled FIEMAP was a combination of a corruption and >>>> performance >>>> issue. The data corruption bug was at >>>> https://bugs.launchpad.net/qemu/+bug/1368815 >>>> and it was reported on Ubuntu Trusty (kernel 3.13 based on the release >>>> notes at >>>> https://wiki.ubuntu.com/TrustyTahr/ReleaseNotes). We corrected that by >>>> using >>>> FIEMAP_FLAG_SYNC, based on a similar patch to coreutils. This turned out >>>> to be too >>>> slow, so we dropped FIEMAP altogether. >>> >>> Yes, because FIEMAP output is only useful for diagnostic purposes as >>> it can be stale even before the syscall returns to userspace. i.e. >>> it can't be used in userspace for optimising file copies.... >> >> Oh... And I was surprised to learn that "cp" does use FIEMAP and not >> SEEK_HOLE/SEEK_DATA. You give a strong suggestion to fix that. >> http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/extent-scan.c#n87 > > My understanding was that FIEMAP was disabled almost immediately > after the data corruption problems were understood, and it hasn't > been turned back on since. I don't see FIEMAP calls in strace when I > copy sparse files, even when I use --sparse=auto which is supposed > to trigger the sparse source file checks using FIEMAP. > > So, yeah, while there's FIEMAP code present, that doesn't mean it's > actually used. And, yes, there are comments in coreutils commits in > 2011 saying that the FIEMAP workarounds are temporary until > SEK_HOLE/DATA are supported, which were added to the kernel in 2011 > soon after the 'cp-corrupts-data-with-fiemap' debacle came to light. > Five years later, and the fiemap code is stil there?
Yes it's still there, and hasn't caused issues. It's used only for sparse files: $ truncate -s1G t.fiemap $ strace -e ioctl cp t.fiemap t2.fiemap ioctl(3, FS_IOC_FIEMAP, 0x7ffff007c910) = 0 It's a pity fiemap is racy (on some file systems?) wrt reporting of data not yet written out, and is thus fairly useless IMHO without the FIEMAP_FLAG_SYNC workaround. We _are_ planning to use SEEK_HOLE in future. It would be nice though to have a way to distinguish zeros from holes to efficiently/consistently maintain allocations to avoid future ENOSPC or inefficient future allocations. thanks, Pádraig