Re: Speed problem in scanning a directory with 500,000 files

Dan Kegel Tue, 08 May 2001 13:32:10 -0700
Min Yuan wrote:
>>>> We have a directory on Redhat 6.2 with 500, 000
>>>> files. In our code we open and read the directory
>>>> and for each entry in the directory we use lstat()
>>>> to check for some information. The whole scanning
>>>> takes more than eight hours which is terribly long.
>>>>
>>>> Is there any way we could reduce this length of
>>>> time? If the answer is NO, then is there any official
>>>> documents about it and where can we find it?
>>>
>>> Yes.  Stop putting so many files into a single
>>> directory.
>>
>> Besides this, there is no other ways? This is not the
>> right solution for us. Because we are developing an
>> application which needs to handle large directories
>> on client sites and these 500000 files have to be
>> put in a single directory.
>
>Directory search time becomes linear in the number
>of entries once the size exceeds any directory name
>cache capacity.  Repeated directory searches then
>becomes quadratic in the number of entries. 500,000 ^
>2 isn't a small number ...    

Min,
Julie's right.
But it should be easy for you to write your program to put files 
with names starting with 'a' in a directory 'a', 'b' in a directory 
'b', and so on.  That would help a lot.  It'll help even more if you 
can do two levels.  It's really the smartest way to deal with the problem.

If you *really* can't do this, see
http://marc.theaimsgroup.com/?l=linux-kernel&m=98681307416575&w=2
people are working on a patch that may interest you.

- Dan



_______________________________________________
Redhat-devel-list mailing list
[EMAIL PROTECTED]
https://listman.redhat.com/mailman/listinfo/redhat-devel-list
Re: Speed problem in scanning a directory with 500,000 files

Reply via email to