> On May 12, 2025, at 11:29 AM, Klaus Kusche <klaus.kus...@computerix.info>
> wrote:
> File opening might be part of the problem,
> but is most likely not the main problem.
> The small files are structured into small directories,
> 3 to 8 files per directory.
> So theoretically I have one 4K read for each file for data,
> one 4K read per about four files for the directory,
> and one 4K read per 16 files for getting the inodes.
>
> So, for a perfect solution, directory reading and file opening
> should also be done in parallel.
Parallel reading _might_ be a big speed-up, but it might not.
The bottleneck could well be the per-request overhead at
the drive itself. If that’s the case, then parallel reads will
still issue the same collection of small reads and the drive
will still need to process those reads, which will still be a lot
slower than processing a fewer larger operations.
But as I said, I don’t know for sure and would dearly love to
hear the results of any experiments you might do. If it
is a big speed-up, then that would be very illuminating indeed.
For example, create a bunch of small files and a small
test program that just has a hard-coded list of file names
and reads the contents of each file into memory with no
other processing.
See how that behaves if you have a separate thread for
each file versus reading them one at a time from a single
thread. (And please let us know what you see!)
Cheers,
Tim