Hello,
On 13/05/2025 18:32, Antonio Diaz Diaz wrote: > Hello Klaus, > > Perhaps you may try some experiments with 'tarlz -c --no-solid', which opens > and reads the files in parallel[1]. > > [1] http://www.nongnu.org/lzip/tarlz.html I did several experiments: 1.) Around 2 million files in 400000 dir's, size on disk with 4k filesystem blocks is 11 GB, size of the uncompressed archive is 5.2 GB (so the average file and dir size is half of a disk block only). Caches and buffers cleared before each test run. time tarlz -c --no-solid -0 -n ___ . > /dev/null Wallclock / user / system time (seconds): n=1: 342 / 214 / 28 n=4: 95 / 222 / 29 n=6: 70 / 236 / 31 n=8: 63 / 264 / 35 n=10: 67 / 304 / 49 So we have almost linear speedup from 1 to 4 parallel readers, and less but still significant speedup up to 8 parallel readers. The n=1 case is about two thirds disk bound and one third cpu bound (due to compression, tarlz without compression isn't parallel at all), all others are 100 % disk bound (disk is 100 % busy all the time, all cores are less than 50 % busy). Starting at n=10, the cpu runs hot and is massively throttled. 2.) The files and dir's used above are actually 16 identical copies of the same directory tree. To avoid any influence of compression overhead, I tried normal tar. I started 16 separate tar's in parallel, each tar reading one of the 16 subtrees. time ( for i in $(seq 16) ; do tar cf /dev/null test$i & done ; wait ) Times are 3.3 sec wallclock, 78.6 usr, 11.4 sys. Doing the same 16 tar's sequentially takes 38.2 / 1.5 / 4.7 seconds. Both the sequential and the parallel case are completely disk-bound. So we have a disk read throughput improvement by a factor of 11.6 for 16 parallel readers compared to sequential read for small reads (mostly 4K reads). 3.) With a smaller test set (10 instead of 16 identical subdir's), I tested tarlz with -bsolid: time tarlz -c --bsolid -0 -n ___ . > /dev/null The results are similar to experiment 1, but the cpu gets hot sooner: n=1: 151.2 / 74.5 / 11.4 n=2: 75.2 / 77.0 / 11.1 n=4: 38.2 / 78.6 / 11.4 n=6: 33.3 / 82.7 / 12.7 n=8: 35.1 / 90.7 / 16.2 So again, there ist perfect read throughput speedup up to 4 readers, and a little bit more speedup for 6 readers. Conclusion: Tar clearly needs parallel dir&file reading on SSD's for small files, and perhaps parallel reading for parts of the same file for really large files? Klaus.