On Aug 27, 2014, at 9:42 AM, Rasim Saltuk Alakuş <rala...@turksat.com.tr> wrote:
> Hi All, > > ATS uses URL hash for cache storage. And CacheUrl plugin adds some more > flexibility in URL hashing strategy. > > We think of creating hash based on packet content and use it as the hash > while storing and retrieving from cache This looks a better solution, so that > URI changes won't hurt caching system. One immediate benefit for example if > you cache YouTube , each request for same video can have different URL and > CacheUrl plugin does not always provide a good solution. Also maintaining > site based hash filters looks not an elegant solution. > > Is there any previous or active work for implementing content based hashing? > What kind of problems and constrains you may guess. Is there any volunteer to > implement this feature together with us? So what would the client lookup “hash” on? All ATS has at that point is the URL, and various headers. It would not be able (without further actions, see next paragraph) be able to retrieve an object from cache. Now, what would work, which I think had been mentioned before somewhere, is that you fetch (from origin) a very small piece of the object on every client request. Say, 512 bytes (or something else that fits within one (typical) TCP packet, using e.g. "Range: bytes=0-511". Then you use that as your cache key. This could do what you are asking for, using the “data” as the cache key, but has the downside that you always have to ask origin for some data. At a minimum, I think such a plugin also must be a per-remap plugin, so you can decide when you want to take that hit or not. I don’t know of anyone working on such a plugin. But it sounds potentially very useful :). Also, as a side note, there was, way back when, some beginning code to deal with cache dedup. I don’t know that it ever worked, but John might know. The idea was to hash the actual data (body) and dedup it just like e.g. some modern file systems can do. This wouldn’t avoid origin requests per se, you’d still have to fetch it, calculate the hash, and then decide that you don’t have to duplicate the storage (so, save some disk storage). Cheers, — leif