André Warnier wrote:
Neil Gunton wrote:
[...]
Hi.
I am not really an expert on large websites, caches and so on, but in our applications we are managing a large number of files. One of the things we have learned over the years, is that even on modern operating systems, having large numbers of entries in each directory is an absolute performance killer. This may thus be or not relevant to your particular problem, but what is the average number of entries you have *per directory* ?

I'm not sure what the average number of files per directory is currently. Is there a linux tool which gives that kind of statistic?

Looking at one random bucket, there were only 2 files in there.

I think the issue here is the large size of the directory tree itself - simply traversing this seems to be a problem. I started off a du this morning on that tree, at around 9am, and it's now after 12 midday and the command is still not done yet. Meanwhile my iowait has doubled on the server as a result. Obviously it's a lot of work just traversing this tree, since du is not even doing any pruning, just walking the directory tree. It makes me wonder if there's something wrong with my system, though it seems ok in all other respects. I think this is just a not-very-efficient datastructure, at least with respect to this filesystem, hence my original question about reiserfs. I think I need either a filesystem better suited to traversing large directory trees, or else a different tool that keeps track of the cache in a different manner.

Neil

Reply via email to