Tim Kientzle <t...@kientzle.com> wrote: > What is the most efficient (preferably portable) way for an archiving program > (such as tar) to determine whether it should archive a particular file as > sparse or non-sparse?
IIRC, a lseek() call is aprox. 2 microseconds. I did some reseach in 2005 when implemented support for SEEK_HOLE. IIRC, SEEK_HOLE was implemented by Sun in spring 2005 after I discussed methods for a useful and performant interface with Jeff Bonwick from the Sun ZFS team. At that time, Sun told us that we shopuld first use fpathconf(f, _PC_MIN_HOLE_SIZE) to find whether a specific filesystem supports SEEK_HOLE and I believed that another syscall would be bad for the perfornamce. It turned out that this is not the case and I finally started to use fpathconf(f, _PC_MIN_HOLE_SIZE) in 2006. The current method I use is to call: lseek(f, (off_t)0, SEEK_HOLE); If this returns EINVAL, the OS does not support SEEK_HOLE (I use private #defines for SEEK_HOLE == 4 and SEEK_DATA == 3) to check this. It it returns ENOTSUP, the specific filesystem does not support SEEK_HOLE. If the return value is >= st_size, the file is not sparse, as there is only the virtual hole past the end of the file. > Historically, we?ve compared st_nblocks to st_size to quickly determine if a > file is sparse in order to avoid the SEEK_HOLE scan in most cases. Some > programs even consider st_nblocks == 0 as an indication that a file is > entirely sparse. Based on the claims I?ve read here, it sounds like > st_nblocks can no longer be trusted for these purposes. > > So is there some other way to quickly identify sparse files so we can avoid > the SEEK_HOLE scan for non-sparse files? Star only uses this method in case that SEEK_HOLE is not supported. In addition, I changed my algorithm regarding st_blocks == 0 and the assumtion that the file only consists of a single hole in October 2013 after I discovered that NetAPP stores files up to 64 bytes in the inode. Otherwise the fallback algorithm for sparse files on a dump OS is: st_size > (st_blocks * DEV_BSIZE) + DEV_BSIZE Jörg -- EMail:jo...@schily.net (home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'