> On Jan 17, 2018, at 1:09 PM, Andreas Dilger <adil...@dilger.ca> wrote:
> 
>> So is there some other way to quickly identify sparse files so we can avoid 
>> the SEEK_HOLE scan for non-sparse files?
> 
> Given that calling SEEK_HOLE is also going to have some cost, my suggestion
> would be to ignore st_blocks completely for small files (size < 64KB) and
> just read the file in this case, since the overhead of reading will probably
> be about the same as checking if any blocks are allocated.  If no blocks are
> allocated, then the read will not do any disk IO, and if there are blocks
> allocated they would have needed to be read from disk anyway and SEEK_HOLE
> would be pure overhead.

If I understand, you’re basically suggesting not bothering
with the sparse-file storage for small files (size < 64k).

This makes a lot of sense to me:  sparse file storage
is most important for applications (e.g., databases) that
use large files as randomly-accessible swap space and need
to preserve sparseness (to not blow out disk) and/or
non-sparseness (so that overwrites don’t require
allocations).

So skipping the SEEK_HOLE check for common
small files seems like a good optimization.

Cheers,

Tim



Reply via email to