On 6/12/12 5:43 PM, Nick Kew wrote:
On Tue, 12 Jun 2012 16:51:42 -0600
Yeah, I totally agree. There are some potential alternatives here, such as
fixed sized chunking of content perhaps. It's still a difficult problem to
solve optimally for every type of request. Your suggested heuristics
probably is reasonable for many cases, but what is the client asks for the
first 16KB, and we have no idea how long the request is (it could be e.b.
512GB)? Do we defer dealing with it until we have collected enough data to
build an intelligent decision?

Also, blindly caching every Range: request could potentially completely fill
the cache with responses that partially overlap (there are no restrictions
on how the client can form the Range requests :/).
Are we at cross-purposes here?

If the client requests the first 16kb, then a rangeless request to the
backend fetches that 16k first, so the client can be satisfied while
the proxy continues filling the cache.  That requires decoupling the
client and server requests.  I wasn't suggesting caching any ranged
request!

That could be difficult to do I think, because of the way the producer / consumer relationships are done in the core (but maybe it's doable, I haven't looked at it from that perspective).


Thinking about it, perhaps an optimal heuristic is, on receipt of a
range request, to make two requests to the backend: the request as-is,
and (if cacheable) a second background request for everything-but the
range.  The background request grabs a mutex so we don't duplicate

Right, that's almost what my initial proposal was done, except it wouldn't kick of the second "full" request until it gets the response header (to avoid sending the full request for an uncacheable object). Maybe we could issue a HEAD request initially, until we figure out if the object is cacheable or not? The "mutex" is more or less implicit by the way our cache works, only one producer can hold the cached object for write. This is also how read-while-writer works (one producer writing to the cache, and there can be multiple client consumers even before the cache is finished writing).
That's actually looking a lot like your original proposal!


Brilliant minds ... :).

-- leif

Reply via email to