Since this has been a topic for a while, I will just throw out an idea to see 
how fast you guys can shoot it down.

A cache object is stored as a series of fragments. If we subdivided each 
fragment in to "chunks", we could have 64 chunks / fragment and represent them 
with a bitmap in a single uint64_t. A set bit would indicate valid data in that 
chunk. Partial content would be written only in chunk units and only for chunks 
that are complete in the data. For the default size of 1M fragments, each chunk 
would be 16K which seems a reasonable value. The bitmaps would be stored along 
with the fragment offset table in the alternate info header. This would keep it 
out of the directory while making it available when serving because the 
alternate data is loaded before that point. Range validity checks could also be 
done without additional disk I/O because you can't detect if a range is valid 
for an object before the alternate is determined. We could only serve if the 
request range was completely covered, or generate a synthetic range request to 
cover parts that were not already in the cache.

This would mean that files less than one fragment would not have partial 
content cached but I think that's acceptable as the advantages of partial 
caching are only for larger objects.

Reply via email to