On Thu, Jul 21, 2016 at 10:23:48AM -0400, Paolo Bonzini wrote: > > > 1) avoid copying zero data, to keep the copy process efficient. For this, > > > SEEK_HOLE/SEEK_DATA are enough. > > > > > > 2) copy file contents while preserving the allocation state of the file's > > > extents. > > > > Which is /very difficult/ to do safely and reliably. > > i.e. the use of fiemap to duplicate the exact layout of a file > > from userspace is only posisble if you can /guarantee/ the source > > file has not changed in any way during the copy operation at the > > pointin time you finalise the destination data copy. > > We don't do exactly that, exactly because it's messy when you have > concurrent accesses (which shouldn't be done but you never know).
Which means you *cannot make the assumption it won't happen*. FIEMAP is not guaranteed to tell you exactly where the data in the file is that you need to copy is and that nothing you can do from userspace changes that. I can't say it any clearer than that. > When > doing a copy, we use(d to use) FIEMAP the same way as you'd use lseek, > querying one extent at a time. If you proceed this way, all of these > can cause the same races: > > - pread(ofs=10MB, len=10MB) returns all zeroes, so the 10MB..20MB is > not copied > > - pread(ofs=10MB, len=10MB) returns non-zero data, so the 10MB..20MB is > copied > > - lseek(SEEK_DATA, 10MB) returns 20MB, so the 10MB..20MB area is not > copied > > - lseek(SEEK_HOLE, 10MB) returns 20MB, so the 10MB..20MB area is > copied > > - ioctl(FIEMAP at 10MB) returns an extent starting at 20MB, so > the 10MB..20MB area is not copied No, FIEMAP is not guaranteed to behave like this. what is returned is filesystem dependent. Fielsystems that don't support holes will return data extents. Filesystems that support compression might return a compressed data extent rather than a hole. Encrypted files might not expose holes at all, so people can't easily find known plain text regions in the encrypted data. Filesystems could report holes as deduplicated data, etc. What do you do when FIEMAP returns "OFFLINE" to indicate that the data is located elsewhere and will need to be retrieved by the HSM operating on top of the filesystem before layout can be determined? All of the above are *valid* and *correct*, because the filesytem defines what FIEMAP returns for a given file offset. just because ext4 and XFS have mostly the same behaviour, it doesn't mean that every other filesystem behaves the same way. The assumptions being made about FIEMAP behaviour will only lead to user data corruption, as they already have several times in the past. Cheers, Dave. -- Dave Chinner dchin...@redhat.com