Re: generating hash from packet content

Leif Hedstrom Thu, 28 Aug 2014 13:10:26 -0700

On Aug 28, 2014, at 12:19 PM, Bill Zeng <billzeng2...@gmail.com> wrote:

> 
> 
> 
> On Thu, Aug 28, 2014 at 10:41 AM, Leif Hedstrom <zw...@apache.org> wrote:
> 
> On Aug 28, 2014, at 11:35 AM, Bill Zeng <billzeng2...@gmail.com> wrote:
> 
> > Just to throw another idea your way. We can insert another level of 
> > indirection between URL's and objects. Every object has a unique hash. 
> > URL's point to the hashes instead of objects. The hashes are used to look 
> > up objects. Even if multiple URL's are duplicated and hence their hashes, 
> > they always point to the same object. It seems a non-easy project though. 
> > It requires major changes to ATS.
> 
> 
> I’m not sure I understand this, or how it helps this problem? However, isn’t 
> this sort of how the cache already works? There’s a hash from URL to the 
> “header” entry, which then has its own hash to the actual object. Alan?
> 
> Maybe I did not understand it correctly. Currently, ATS calculates a hash 
> from a URL and uses the hash to look up the actual object. That is "URL --> 
> actual object". My idea is to "URL --> hash of an object --> actual object". 
> We calculate the hash of a URL and use that to look up the hash of an actual 
> object and then use the hash of the actual object to look up the actual 
> object.

But what problem does that solve? You have URL <A> and <B>, both which  point 
to the same object. How do you find that object based only on the client 
request (URL + headers)? How do you generate the “object hash” for the lookup, 
without going to origin first? That’s the problem here, afaik?

Or is your suggestion here to solve the cache deduping problem (which is not 
what the OP asked for)? If so, there was the beginning for that in the cache 
code, storing the hash of objects in the cache as well (but maybe that’s gone 
now?). There is also a CRC (checksum) feature in the cache, maybe the intention 
back then was to generalizing the cache dedup with these checksums. Only John 
Plevyak would know :).

Fwiw, this problem is what Metalink is intended to solve for some use cases 
(e.g. site mirrors), but Metalink requires cooperation (additional Metalink 
headers) from the origin. It does not solve (or intend to solve) the issue 
where e.g. YouTube rotates the content URLs frequently.

— Leif

Re: generating hash from packet content

Reply via email to