Re: Download mirrors, plugin, GSoC

Leif Hedstrom Sat, 12 May 2012 18:46:44 -0700

On 5/12/12 8:51 AM, Jack Bates wrote:

Hi, I would like files that are distributed from multiple mirrors to workbetter with caching proxies, and I hope to write a Traffic Server pluginto help with this

Cool. I'm omw to a Mickey Mouse cruise (wish me luck ...), but wanted to putin a couple of $0.02's.

* Remember lists of mirrors so future requests for any of these URLs usethe same cache key. A problem is how to prevent a malicious domain fromdistributing false information about URLs it doesn't control. This couldbe addressed with a whitelist of domains

You can use our cache to 'remember' this, writing your own "entries" for themirror lists. Might be a bit wasteful on the directory entries, since theobjects will be tiny, but this is only an issue if you have a lot ofalternative URLs.

* Making decisions about the best mirror to choose, e.g. one that ismost cost efficient, faster, or more local
  * Use content digest to detect or repair download errors

A first attempt at a plugin is up on GitHub: https://github.com/jablko/dedup

Should we call it "metalink" or something instead of dedup? Dedup issomething we might want to do later on the cache itself (e.g. deduping onsegments etc.).

I use TSmalloc() to allocate a struct to pass variables to TSCacheRead()callbacks. Leif mentioned in sample code that this is suboptimal and touse jemalloc in configure instead. I will do so

The point here is that plugins using TSmalloc() extensively, should considercompiling ATS with tcmalloc or jemalloc. You would still use TCmalloc()though. Alternatively, you can manage your own memory pools, but that's afuture exercise IMO.

The parsing of "Link: <...>; rel=duplicate" is rough, I would mostappreciate any feedback on this. I call TSUrlParse() from the secondcharacter of the field value to the first ">" character after the firstcharacter. I think that according to RFC 3986, a URI-reference can'tcontain a ">" character, so I think this logic is okay? I use memchr() tofind the ">" character because "string values returned from marshallbuffers are not null-terminated ... cannot be passed into the commonstr*() routines"

Assuming your plugin is in C++, look at some of the existing stuff in Boost.They have some pretty advanced tokenizers. If that's not enough, you mighthave to consider lex/flex maybe.

I'm not sure how best to test if Link headers have a "rel=duplicate"parameter. Traffic Server has some private code,HttpCompat::lookup_param_in_semicolon_string(), to parse, e.g."Content-Type: ...; charset=UTF-8", but nothing in the public API. I canprobably cobble together something from scratch with memchr(), etc. butI'm nervous about getting it right, e.g. all the RFC rules aboutwhitespace, and is conformance good enough or are there nonconformantimplementations to consider? Finally are there any libraries I shouldconsider using?

If the internal methods works for you, use them by cheating (e.g. copy pasteor steal the header / class definitions). The issue many times is that ourcore is C++ whereas the APIs are C, so it's not straightforward to supportthem in our public APIs.


Cheers,

-- leif

Re: Download mirrors, plugin, GSoC

Reply via email to