Re: Feature request: Let tar read / write files in parallel

Tim Kientzle Sun, 11 May 2025 14:50:10 -0700

On May 11, 2025, at 6:50 AM, Klaus Kusche <klaus.kus...@computerix.info> wrote:
> 
> I regularly backup hundreds of thousands of very small files with tar.
> Currently, this results in many very small sequential read requests.


Are these small read requests occurring because the files are small?
Or is tar deliberately making small read requests?

A few possible experiments:
 * Use a larger request size when reading file data.  64k or 128k, perhaps?

 * Try mmap-ing the input files, either relying on the kernel’s read-ahead
    logic or having a background thread that reads a single byte every 4k or
    so to prompt page-ins ahead of the main thread.

 * Compare how `star` performs; it uses a very different buffering
   architecture which may uncover other possibilities.

> (my tar reads and archives up to 2 GB/s when the input files 
> are GB-sized, including on-the-fly compression).

This suggests the real issue may be opening the files rather than
reading them.  That is, you may be seeing small read requests from
the filesystem code (reading directory pages and stat-ing files)
rather than from reading the file contents.  That’s a very different
problem.

Cheers,

Tim

Re: Feature request: Let tar read / write files in parallel

Reply via email to