Lowlevel Access to RiakCS objects?

Martin Alpers Thu, 27 Mar 2014 15:07:15 -0700

Hi all,

is there a canonical way to access RiakCS objects on a lower level? If I 
remember correctly, RiakCS basically distributes larger objects into chunks of 
one megabye each, and mapreduces them together on retrieval.
I would like to read those chunks for caching purposes.


For those interested in why I would wnat that:
A Riak/RiakCS cluster is the heart of our yet-to-be-implemented video delivery 
cluster. A video management system will enable registered user to upload their 
videos and the public can watch them.
In order to reduce intra-cluster traffic, we intent to cache the videos, 
preferably in RAM.
We do not have any numbers on how often users would skip parts of the video and 
generate range requests. If that case is really common, we would prefer to 
serve them from cache as well, at least with Varnish and Squid, some users 
would experience unacceptably long delays.
We looked out for a cache that could pipe through any request on an URL on 
which caching is in progress and serve from cache afterwards.

The problem with both Varnish and Squid (and I suppose most caches, because 
this behaviour seems reasonable in most cases) boils down to treating a caching 
in progress as a cache hit.
My colleague started to write his own caching proxy in NodeJS, but using 
asynchronous callbacks to check if a file exists, and to create it if it does 
not, strikes me as somewhat couragous for production.

Now while we cannot risk to let some users wait for hundreds of megabytes to be 
cached before delivery begins, and while we want at least to be prepared to 
face many more range requests than the average "wget was interrupted" case, it 
occurred to me thata few megabytes are not an issue at multi-fast-ethernet 
speed.
So if we can split our files into objects small enough, we could code a proxy 
that translates a range request into one or more normal requests for those 
chunks, cuts off a certain offset of the first chunk if the range of the 
orignal request began somewhere off the boundary, and reconcatenates those 
chunks in correct order for delivery.
So the cache would never have to be bypassed, and the whole headache of telling 
a complete hit from one "in progress" would be gone.

Since RiakCS has already split our files into small pieces, and somehow tracks 
them, so could we possibly piggyback on that?

And by the way, I just came across the memory backend. I assume it is 
distributed like the persistent ones, so it will not help me redice internal 
traffic, right?

Any input is highly appreciated.

Best Regards
Martin

-- 
Greetings, Martin Alpers

martin-alp...@web.de; Public Key ID: 10216CFB
Tel: +49 431/90885691; Mobile: +49 176/66185173
JID: martin.alp...@jabber.org
FYI: http://apps.opendatacity.de/stasi-vs-nsa/

signature.asc
Description: Digital signature

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Lowlevel Access to RiakCS objects?

Reply via email to