André Warnier wrote:
Neil Gunton wrote:
[...]
Hi.
I am not really an expert on large websites, caches and so on, but in
our applications we are managing a large number of files.
One of the things we have learned over the years, is that even on modern
operating systems, having large numbers of entries in each directory is
an absolute performance killer.
This may thus be or not relevant to your particular problem, but what is
the average number of entries you have *per directory* ?
I'm not sure what the average number of files per directory is
currently. Is there a linux tool which gives that kind of statistic?
Looking at one random bucket, there were only 2 files in there.
I think the issue here is the large size of the directory tree itself -
simply traversing this seems to be a problem. I started off a du this
morning on that tree, at around 9am, and it's now after 12 midday and
the command is still not done yet. Meanwhile my iowait has doubled on
the server as a result. Obviously it's a lot of work just traversing
this tree, since du is not even doing any pruning, just walking the
directory tree. It makes me wonder if there's something wrong with my
system, though it seems ok in all other respects. I think this is just a
not-very-efficient datastructure, at least with respect to this
filesystem, hence my original question about reiserfs. I think I need
either a filesystem better suited to traversing large directory trees,
or else a different tool that keeps track of the cache in a different
manner.
Neil