Re: [CentOS] Question about optimal filesystem with many small files.

Les Mikesell Fri, 10 Jul 2009 11:53:31 -0700

oooooooooooo ooooooooooooo wrote:
> Hi, After talking with te customer, I finnaly managed to convince him for 
> using the first characters of the hash as directory names.
> 
> Now I'm in doubt about the following options:
> 
> a) Using directory 4 levels /c/2/a/4/ (200 files per directory) and mysql 
> with a hash->filename table, so I can get teh file name from the hash and 
> then I can directly access it (I first query mysql for the hash of the file, 
> and the I read the file).
> 
> b) Using 5 levels without mysql, and making a dir listing (due to technical 
> issues, I can't only know an approximate file name, so I can't make a direct 
> access here), match the file name and then read it. The issue here is that I 
> would have 16^5 leave directories (more than a million).
> 
> I could also make more combinations of mysql/not mysql and number of levels.
> 
> What do you think it would give the best performance in ext3?


I don't think you've explained the constraint that would make you use 
mysql or not.  I'd avoid it if everything involved can compute the hash 
or is passed the whole path since is bound to be slower than doing the 
math, and just on general principles I'd use a tree like 
00/AA/FF/filename (three levels of 2 hex characters) as the first cut, 
although squid uses just two levels with a default of 16 first level and 
256 2nd level directories and probably has some good reason for it.

-- 
   Les Mikesell
    lesmikes...@gmail.com

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Re: [CentOS] Question about optimal filesystem with many small files.

Reply via email to