> On Jan 17, 2018, at 1:09 PM, Andreas Dilger <adil...@dilger.ca> wrote: > >> So is there some other way to quickly identify sparse files so we can avoid >> the SEEK_HOLE scan for non-sparse files? > > Given that calling SEEK_HOLE is also going to have some cost, my suggestion > would be to ignore st_blocks completely for small files (size < 64KB) and > just read the file in this case, since the overhead of reading will probably > be about the same as checking if any blocks are allocated. If no blocks are > allocated, then the read will not do any disk IO, and if there are blocks > allocated they would have needed to be read from disk anyway and SEEK_HOLE > would be pure overhead.
If I understand, you’re basically suggesting not bothering with the sparse-file storage for small files (size < 64k). This makes a lot of sense to me: sparse file storage is most important for applications (e.g., databases) that use large files as randomly-accessible swap space and need to preserve sparseness (to not blow out disk) and/or non-sparseness (so that overwrites don’t require allocations). So skipping the SEEK_HOLE check for common small files seems like a good optimization. Cheers, Tim