On Thu, 2005-04-21 at 11:09 +0200, Denys Duchier wrote: > Tomas Mraz <[EMAIL PROTECTED]> writes: > > > If we suppose the maximum number of stored blobs in the order of milions > > probably the optimal indexing would be 1 level [0:2] indexing or 2 > > levels [0:1] [2:3]. However it would be necessary to do some > > benchmarking first before setting this to stone. > > As I have suggested in a previous message, it is trivial to implement adaptive > indexing: there is no need to hardwire a specific indexing scheme. > Furthermore, > I suspect that the optimal size of subkeys may well depend on the filesystem. > My experiments seem to indicate that subkeys of length 2 achieve an excellent > compromise between discriminatory power and disk footprint on ext2. > > Btw, if, as you indicate above, you do believe that a 1 level indexing should > use [0:2], then it doesn't make much sense to me to also suggest that a 2 > level > indexing should use [0:1] as primary subkey :-)
Why do you think so? IMHO we should always target a similar number of files/subdirectories in a directories of the blob archive. So If I always suppose that the archive would contain at most 16 millions of files then the possible indexing schemes are either 1 level with key length 3 (each directory would contain ~4096 files) or 2 level with key length 2 (each directory would contain ~256 files). Which one is better could be of course filesystem and hardware dependent. Of course it might be best to allow adaptive indexing but I think that first some benchmarking should be made and it's possible that some fixed scheme could be chosen as optimal. -- Tomas Mraz <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html