On 5/12/12 8:51 AM, Jack Bates wrote:
Hi, I would like files that are distributed from multiple mirrors to work
better with caching proxies, and I hope to write a Traffic Server plugin
to help with this
Cool. I'm omw to a Mickey Mouse cruise (wish me luck ...), but wanted to put
in a couple of $0.02's.
* Remember lists of mirrors so future requests for any of these URLs use
the same cache key. A problem is how to prevent a malicious domain from
distributing false information about URLs it doesn't control. This could
be addressed with a whitelist of domains
You can use our cache to 'remember' this, writing your own "entries" for the
mirror lists. Might be a bit wasteful on the directory entries, since the
objects will be tiny, but this is only an issue if you have a lot of
alternative URLs.
* Making decisions about the best mirror to choose, e.g. one that is
most cost efficient, faster, or more local
* Use content digest to detect or repair download errors
A first attempt at a plugin is up on GitHub: https://github.com/jablko/dedup
Should we call it "metalink" or something instead of dedup? Dedup is
something we might want to do later on the cache itself (e.g. deduping on
segments etc.).
I use TSmalloc() to allocate a struct to pass variables to TSCacheRead()
callbacks. Leif mentioned in sample code that this is suboptimal and to
use jemalloc in configure instead. I will do so
The point here is that plugins using TSmalloc() extensively, should consider
compiling ATS with tcmalloc or jemalloc. You would still use TCmalloc()
though. Alternatively, you can manage your own memory pools, but that's a
future exercise IMO.
The parsing of "Link: <...>; rel=duplicate" is rough, I would most
appreciate any feedback on this. I call TSUrlParse() from the second
character of the field value to the first ">" character after the first
character. I think that according to RFC 3986, a URI-reference can't
contain a ">" character, so I think this logic is okay? I use memchr() to
find the ">" character because "string values returned from marshall
buffers are not null-terminated ... cannot be passed into the common
str*() routines"
Assuming your plugin is in C++, look at some of the existing stuff in Boost.
They have some pretty advanced tokenizers. If that's not enough, you might
have to consider lex/flex maybe.
I'm not sure how best to test if Link headers have a "rel=duplicate"
parameter. Traffic Server has some private code,
HttpCompat::lookup_param_in_semicolon_string(), to parse, e.g.
"Content-Type: ...; charset=UTF-8", but nothing in the public API. I can
probably cobble together something from scratch with memchr(), etc. but
I'm nervous about getting it right, e.g. all the RFC rules about
whitespace, and is conformance good enough or are there nonconformant
implementations to consider? Finally are there any libraries I should
consider using?
If the internal methods works for you, use them by cheating (e.g. copy paste
or steal the header / class definitions). The issue many times is that our
core is C++ whereas the APIs are C, so it's not straightforward to support
them in our public APIs.
Cheers,
-- leif