On 5/12/12 8:51 AM, Jack Bates wrote:
Hi, I would like files that are distributed from multiple mirrors to work better with caching proxies, and I hope to write a Traffic Server plugin to help with this

Cool. I'm omw to a Mickey Mouse cruise (wish me luck ...), but wanted to put in a couple of $0.02's.



* Remember lists of mirrors so future requests for any of these URLs use the same cache key. A problem is how to prevent a malicious domain from distributing false information about URLs it doesn't control. This could be addressed with a whitelist of domains

You can use our cache to 'remember' this, writing your own "entries" for the mirror lists. Might be a bit wasteful on the directory entries, since the objects will be tiny, but this is only an issue if you have a lot of alternative URLs.



* Making decisions about the best mirror to choose, e.g. one that is most cost efficient, faster, or more local

  * Use content digest to detect or repair download errors

A first attempt at a plugin is up on GitHub: https://github.com/jablko/dedup

Should we call it "metalink" or something instead of dedup? Dedup is something we might want to do later on the cache itself (e.g. deduping on segments etc.).



I use TSmalloc() to allocate a struct to pass variables to TSCacheRead() callbacks. Leif mentioned in sample code that this is suboptimal and to use jemalloc in configure instead. I will do so

The point here is that plugins using TSmalloc() extensively, should consider compiling ATS with tcmalloc or jemalloc. You would still use TCmalloc() though. Alternatively, you can manage your own memory pools, but that's a future exercise IMO.


The parsing of "Link: <...>; rel=duplicate" is rough, I would most appreciate any feedback on this. I call TSUrlParse() from the second character of the field value to the first ">" character after the first character. I think that according to RFC 3986, a URI-reference can't contain a ">" character, so I think this logic is okay? I use memchr() to find the ">" character because "string values returned from marshall buffers are not null-terminated ... cannot be passed into the common str*() routines"

Assuming your plugin is in C++, look at some of the existing stuff in Boost. They have some pretty advanced tokenizers. If that's not enough, you might have to consider lex/flex maybe.

I'm not sure how best to test if Link headers have a "rel=duplicate" parameter. Traffic Server has some private code, HttpCompat::lookup_param_in_semicolon_string(), to parse, e.g. "Content-Type: ...; charset=UTF-8", but nothing in the public API. I can probably cobble together something from scratch with memchr(), etc. but I'm nervous about getting it right, e.g. all the RFC rules about whitespace, and is conformance good enough or are there nonconformant implementations to consider? Finally are there any libraries I should consider using?

If the internal methods works for you, use them by cheating (e.g. copy paste or steal the header / class definitions). The issue many times is that our core is C++ whereas the APIs are C, so it's not straightforward to support them in our public APIs.

Cheers,

-- leif

Reply via email to