On Aug 27, 2014, at 1:51 PM, Nick Kew <n...@apache.org> wrote: > On Wed, 27 Aug 2014 16:17:17 +0000 > Rasim Saltuk Alakuş <rala...@turksat.com.tr> wrote: > >> >> Hi All, >> >> ATS uses URL hash for cache storage. And CacheUrl plugin adds some more >> flexibility in URL hashing strategy. >> >> We think of creating hash based on packet content and use it as the hash >> while storing and retrieving from cache This looks a better solution, so >> that URI changes won't hurt caching system. One immediate benefit for >> example if you cache YouTube , each request for same video can have >> different URL and CacheUrl plugin does not always provide a good solution. >> Also maintaining site based hash filters looks not an elegant solution. >> >> Is there any previous or active work for implementing content based hashing? >> What kind of problems and constrains you may guess. Is there any volunteer >> to implement this feature together with us? > > > Indeed, the whole scheme is BAD (Broken As Designed). > Using different URLs for common content breaks cacheing on > the Web at large, and hacking one agent (such as Trafficserver) > to work around it will gain you only a tiny fraction of what > you've thrown away. Indeed, if every agent on the Web - > from origin servers to desktop browsers - implemented this > cacheing scheme, you'd still lose MOST of the benefits of > cacheing, as the same content passes through different paths.
I thought some more on this over a boring meeting, two more thoughts comes to mind: 1) Cache poisoning. This could be a serious problem, at a minimum some defenses such as using the Host: portion of the request for the cache key would be required. But, I’m guessing that still would be possible to abuse, to poison the HTTP caches (since the client request + origin response headers no longer dictates the cache lookup). 2) HTTP/2. Albeit it supports non-TLS, several browser vendors have indicated they will not support H2 over plain text. So, assuming we’re moving towards TLS across the board, this sort of interaction will get more tricky. I personally think it’ll have to evolve in a way that the content owners will need to participate better with caches. It’s too early to say, but maybe such a proposal would encourage the YouTube’s and Netflix’es to behave better (in some way that they can still control content, ad impressions, click tracking etc. etc. yet allow ISPs to cache the actual content). Just my $0.01, — Leif