Hi,

On 20 Dec 2013, at 23:16, Jason Campbell <xia...@xiaclo.net> wrote:

> 
> ----- Original Message -----
> From: "Andrew Stone" <ast...@basho.com>
> To: "Jason Campbell" <xia...@xiaclo.net>
> Cc: "Sean Cribbs" <s...@basho.com>, "riak-users" 
> <riak-users@lists.basho.com>, "Viable Nisei" <vsni...@gmail.com>
> Sent: Saturday, 21 December, 2013 10:01:29 AM
> Subject: Re: May allow_mult cause DoS?
> 
> 
>> Think of an object with thousands of siblings. That's an object that has 1 
>> copy of the data for each sibling. That object could be on the order of 100s 
>> of megabytes. Everytime an object is read off disk and returned to the 
>> client 100mb is being transferred. Furthermore leveldb must rewrite the 
>> entire 100mb to disk everytime a new sibling is added. And it just got 
>> larger with that write. If a merge occurs, the amount of data is a single 
>> copy of the data at that key instead of what amounts to approximately 10000 
>> copies of the same sized data, when all you care about is one of those 
>> 10,000. 
> 
> This makes sense for concurrent writes, but the use case that was being 
> talked about was siblings with no parent object.

What is a "sibling with no parent object”? I think I understand what you’re 
getting at, when each sibling is some fragment of the whole, is that it?

>  I understand the original use case being discussed was tens of millions of 
> objects, and the metadata alone would likely exceed recommended object sizes 
> in Riak.
> I've mentioned my use case before, which is trying to get fast writes on 
> large objects.  I abuse siblings to some extent, although by the nature of 
> the data, there will never be more than a few thousand small siblings (under 
> a hundred bytes).  I merge them on read and write the updated object back.  
> Even with sibling metadata, I doubt the bloated object is over a few MB, 
> especially with snappy compression which handles duplicate content quite 
> well.  Even if Riak merges the object on every write, it's still much faster 
> than transferring the whole object over the network every time I want to do a 
> small write.  Is there a more efficient way to do this?  I thought about 
> writing single objects and using a custom index, but that results in a read 
> and 2 writes, and the index could grow quite large compared to the amount of 
> data I'm writing.

This is similar, i suppose, to Riak 2.0 data types. We send an operation to 
Riak, and apply that inside the database rather then fetching, mutating, 
writing at the client. Think of adding to a Set, you just send the thing to be 
added and Riak merges it for you. For your use case would a user defined merge 
function in the database be a valuable feature? It would be every better if 
Riak stored data differently (incrementally, append-only rather than 
read-merge-write at the vnode.) These are things we’re going to be working on 
soon (I hope!) I had no idea that people used siblings this way. It’s 
interesting.

Cheers

Russell

> 
> Thanks,
> Jason
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to