oooooooooooo ooooooooooooo wrote:
>> I don't think you've explained the constraint that would make you use
>> mysql or not.
> 
> My original idea was using the just the hash as filename, by this way I could 
> have a direct access. But the customer rejected this and requested to have 
> part of the long file name (from 11 to 1023 characters). As linux only allows 
> 256 characters in the path and I could get duplicates with the 256 first 
> chars, I trim teh real filename to around 200 characters and I add the hash 
> at the end (plus a couple metadata small fields). 
> 
> Yes, there requirements does not makes too much sense, but I've tried to 
> convince the customer to use just the hash with no luck (seems he does not 
> understand well what is a hash although I've tried to explain it several 
> times).

You mentioned that the data can be retrieved from somewhere else.  Is 
some part of this filename a unique key?  Do you have to track this 
relationship anyway - or age/expire content?  I'd try to arrange things 
so the most likely scenario would take the fewest operations.  Perhaps a 
mix of hash+filename would give direct access 99+% of the time and you 
could move all copies of collisions to a different area.  Then you could 
  keep the database mapping the full name to the hashed path but you'd 
only have to consult it when the open() attempt fails.

> That's why  I need or a) use mysql or b) do a directory lising.
> 
>> 00/AA/FF/filename
> That would make up to 256^3 directory leaves, what is more than 16 Million 
> ones, due I have around 15M files, I think that this is an excessive number 
> of directories.

I guess that's why squid only uses 16 x 256...

-- 
   Les Mikesell
     lesmikes...@gmail.com

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Reply via email to