Hi,

would it be possible to read files in parallel when creating 
an archive (and perhaps write files in parallel when extracting
an archive)? At least in my case, this would speed up tar 
by at least an order of magnitude.

Rationale:

I regularly backup hundreds of thousands of very small files with tar.
Currently, this results in many very small sequential read requests.

However, small sequential read requests are incredibly slow even 
on very modern SSD's: 4k sequential read throughput is 40-80 MB/s.
With 64 threads each doing sequential 4k reads on the same SSD,
total read throughput would be 1000-3500 MB/s.

Even for a mechanical disk drive, issuing as many reads as possible
in parallel makes reading much faster, because the drive internally 
sorts the pending requests into cylinder and sector order, 
minimizing step time and rotational delay. Parallel requests 
are even more important for RAID arrays to keep all drives busy.

The 40-80 MB/s sequential read throughput is the only bottleneck:
tar itself could do several GB/s, the parallel compression program 
could also keep up with 1 GB/s using 8 cores, and the SSD onto which 
the created tar file is written also easily writes 1 GB/s
(my tar reads and archives up to 2 GB/s when the input files 
are GB-sized, including on-the-fly compression).

Today, we have lots of RAM. I could easily spend some GB for tar.
So would it be possible to allocate many file-sized buffers
(at least for files up to a given size limit),
fill them in parallel with several read threads or async read calls,
and sequentially write them to the archive whenever a file read 
has completed?

Regards,
Klaus.


Reply via email to