Jeremy Allison wrote:
> It turns out that: lseek(3, 0, SEEK_HOLE) returns end-of-file for a
> sparse file copied from a Linux squashfs mounted drive. This breaks
> the --sparse=auto heuristic that detects a sparse file.
The reason for this is because Squashfs supports sparse files, but
it has never implemented SEEK_HOLE/SEEK_DATA, forcing applications to
do their own hole discovery. This was done for following reason.
Squashfs supports sparse holes at the granularity of the block, but
the block size in Squashfs is by default 128 Kbytes (and can be up to
1 Mbyte). In contrast most Linux filesystems use 4K block sizes.
This means any Squashfs SEEK_HOLE/SEEK_DATA implementation will not
behave like other Linux filesystems, because it won't report sparseness
at the 4K granularity that most people or programs will expect it to.
With the result a program may miss holes that exist in the file.
I have always considered it better not to support something rather than
support it in a way that people won't expect it to behave, or the
principle of least surprise.
> lseek(3, 0, SEEK_DATA) = 0
> fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
> lseek(3, 0, SEEK_HOLE) = 417792
This is the behaviour of the default llseek() implementation in the
Linux kernel VFS when doing an lseek SEEK_HOLE. This is to seek to
a virtual hole at the end of the file.
See
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/read_write.c#n102
I am not subscribed to this email list, and so please CC me on replies.
Thanks
Phillip
---
Squashfs author and maintainer