Re: Proposal for chunked storage (related to Lowlevel Access to RiakCS objects)

Martin Alpers Wed, 02 Apr 2014 13:15:39 -0700

Since this is related to my earlier question: sorry to have let you wait, Timo.

Kelly, the reason I brought up my original question was because my use case 
involves delivering videos under load.
Suppose there is a cluster of 50 nodes with a replication value of three. Now 
if a random node is queried for a file of say, 500MB. Then I would expect 30MB 
to be local, and 470MB (or 94%) to be retrieved from the cluster, assuming 
local data is used if available.
If this happens over and over again, you want caching. Trouble with video: 
users might skip parts, and those requests are nasty to most caches. If they 
are few, it is Ok to just not cache them, but otherwise, we might need a 
solution.
One approach is to have a proxy "translate" a range request into a series of 
Riak queries, each of which is small enough to be cached in "no time", but not 
so small as to allow the overhead outgrowing the real data.

You are right that RiakCS does a fine job in what it does, but typical caches 
can only query it sequentially, which precludes caching of range requests on 
any object of relevant size.

Timo, of course you are somewhat reinventing the wheel, half the work you have 
to do is similar to how RiakCS stores files. However, since the other half 
cannot "just use" S3, I fail to see the alternative, apart from cancelling the 
project.
That said, I am not an expert in these matters, but your basic approach to big 
files looks sound to me.
I am curious about the use case of growing files. 
My first thought was that if I wanted to store logs in a database, I would 
prefer to store e.g. single lines as values, or rows in a relational database.

Best regards
Martin

On 14/03/31.09:30:1396254608, Kelly McLaughlin wrote:
> Riak CS does store large files directly in Riak in a manner somewhat similar 
> to what you describe and has the advantage that there are S3 libraries for 
> most languages. You might want to look at it a little closer because this 
> does sound like re-inventing of the wheel to me.
> 
> Kelly
> 
> 
> On March 29, 2014 at 5:55:32 AM, Timo Gatsonides (t...@me.com) wrote:
> 
> 
> Related to the recent thread about "Lowlevel Access to RiakCS objects" I plan 
> to implement an extension to a Riak Object (in the Golang driver at 
> https://github.com/tpjg/goriakpbc) that will cover two use cases:
> 1) storing larger files directly in Riak (not Riak CS)
> 2) store growing files, e.g. log files, efficiently in Riak
> 
> I have some questions for the Riak community. First: am I about to re-invent 
> a wheel and if so can someone please point me to an example implementation?
> 
> If not I would like to know if there is more interest and maybe more use 
> cases so we can develop a standard way to store these objects in Riak, 
> allowing access from multiple programming languages. If there is, please read 
> the proposal below and provide feedback.
> 
> Store the meta-information about a “BigObject” in a regular Riak value. This 
> object will be stored in <bucket> , <key>. The object will have the following 
> meta tags:
> - segment_size - size of each segment in bytes
> - segment_count - the number of segments of the object
> - total_size - optional total size of the entire object (see below).
> 
> The objects data would be stored in segments in <bucket>, <key-%06d>. The 
> Content-Type would be the same for all objects. Depending on the use case 
> total_size could be filled.
> 
> Using the two use cases above as an example:
> 1) storing large files, e.g. video: use a large segment_size, e.g. 1Mb and 
> store the total_size since the file will be static and the meta-information 
> can be written at once
> 2) store growing files, e.g. daily log files: use a smaller segment_size, 
> e.g. 10Kb-100Kb, do not store total_size as otherwise each “Append” operation 
> would require two PUTs. If a segment grows beyond the segment_size update the 
> meta-information K/V, otherwise only PUT the last segment again. In the 
> extension to a client driver a BigObject would store some state information 
> that keeps track of the meta-information K/V and the last segment to make the 
> Append operation somewhat efficient - note though that each append is 
> overwriting the last segment.
> 
> Any input is highly appreciated.
> 
> Kind regards,
> Timo
> 
>  
> 
> _______________________________________________  
> riak-users mailing list  
> riak-users@lists.basho.com  
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com  

> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

-- 
Greetings, Martin Alpers

martin-alp...@web.de; Public Key ID: 10216CFB
Tel: +49 431/90885691; Mobile: +49 176/66185173
JID: martin.alp...@jabber.org
FYI: http://apps.opendatacity.de/stasi-vs-nsa/

signature.asc
Description: Digital signature

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Proposal for chunked storage (related to Lowlevel Access to RiakCS objects)

Reply via email to