On Aug 27, 2014, at 9:42 AM, Rasim Saltuk Alakuş <rala...@turksat.com.tr> wrote:

> Hi All,
> 
> ATS uses URL hash for cache storage. And CacheUrl plugin adds some more 
> flexibility in URL hashing strategy.
> 
> We think of creating hash based on packet content and use it as the hash 
> while storing and retrieving from cache This looks a better solution, so that 
> URI changes won't hurt caching system. One immediate benefit for example if 
> you cache YouTube , each request for same video can have different URL and 
> CacheUrl plugin does not always provide a good solution. Also maintaining 
> site based hash filters looks not an elegant solution.
> 
> Is there any previous or active work for implementing content based hashing? 
> What kind of problems and constrains you may guess. Is there any volunteer to 
> implement this feature together with us?



So what would the client lookup “hash” on? All ATS has at that point is the 
URL, and various headers. It would not be able (without further actions, see 
next paragraph) be able to retrieve an object from cache.

Now, what would  work, which I think had been mentioned before somewhere, is 
that you fetch (from origin) a very small piece of the object on every client 
request. Say, 512 bytes (or something else that fits within one (typical) TCP 
packet, using e.g. "Range: bytes=0-511". Then you use that as your cache key. 
This could do what you are asking for, using the “data” as the cache key, but 
has the downside that you always have to ask origin for some data. At a 
minimum, I think such a plugin also must be a per-remap plugin, so you can 
decide when you want to take that hit or not.

I don’t know of anyone working on such a plugin. But it sounds potentially very 
useful :).


Also, as a side note, there was, way back when, some beginning code to deal 
with cache dedup. I don’t know that it ever worked, but John might know. The 
idea was to hash the actual data (body) and dedup it just like e.g. some modern 
file systems can do. This wouldn’t avoid origin requests per se, you’d still 
have to fetch it, calculate the hash, and then decide that you don’t have to 
duplicate the storage (so, save some disk storage).

Cheers,

— leif

Reply via email to