On 13 Nov 2013, at 10:03, Carlos Baquero <c...@di.uminho.pt> wrote:

> 
> Its interesting to see a use case where a grow only set is sufficient. I 
> believe Riak 2.0 will offer optimized OR-Sets that allow item removal at the 
> expense of some extra complexity in element storage and logarithmic metadata 
> growth per operation. But for your case a simple direct set of elements with 
> server side merge by set union looks perfect. Its not efficient at all to 
> keep all those siblings if a simple server side merge can reduce them.
> 
> Maybe it is a good idea to not overlook the potential usefulness of simple 
> grow only sets and add that datatype to the 2.0 server side CRDTs library. 
> And maybe even 2P-Sets that only allow deleting once, might be useful for 
> some cases. 

We plan to add more data types in future, I don’t think they’ll make them into 
2.0. You can use an ORSet as a G-Set, though, just only ever add to it. The 
overhead is pretty small.

the difficulty is exposing different “flavours” of CRDTs in a non-confusing 
way. We chose to go with the name “data type” and name the implementations 
generically (set, map, counter.) I wonder if we painted ourselves into a corner.

Cheers

Russell

> 
> Regards,
> Carlos
> 
> -----
> Carlos Baquero
> HASLab / INESC TEC &
> Universidade do Minho,
> Portugal
> 
> c...@di.uminho.pt
> http://gsd.di.uminho.pt/cbm
> 
> 
> 
> 
> 
> On 12/11/2013, at 22:10, Jason Campbell wrote:
> 
>> I am currently forcing siblings for time series data. The maximum bucket 
>> sizes are very predictable due to the nature of the data. I originally used 
>> the get/update/set cycle, but as I approach the end of the interval, reading 
>> and writing 1MB+ objects at a high frequency kills network bandwidth. So 
>> now, I append siblings, and I have a cron that merges the previous siblings 
>> (a simple set union works for me, only entire objects are ever deleted).
>> 
>> I can see how it can be dangerous to insert siblings, bit if you have some 
>> other method of knowing how much data is in one, I don't see size being an 
>> issue. I have also considered using a counter to know how large an object is 
>> without fetching it, which shouldn't be off by more than a few siblings 
>> unless there is a network partition.
>> 
>> So aside from size issues, which can be roughly predicted or worked around, 
>> is there any reason to not create hundreds or thousands of siblings and 
>> resolve them later? I realise sets could work well for my use case, but they 
>> seem overkill for simple append operations when I don't need delete 
>> functionality. Creating your own CRDTs are trivial if you never need to 
>> delete.
>> 
>> Thoughts are welcome,
>> Jason
>> From: John Daily
>> Sent: Wednesday, 13 November 2013 3:10 AM
>> To: Olav Frengstad
>> Cc: riak-users
>> Subject: Re: Forcing Siblings to Occur
>> 
>> Forcing siblings other than for testing purposes is not typically a good 
>> idea; as you indicate, the object size can easily become a problem as all 
>> siblings will live inside the same Riak value.
>> 
>> Your counter-example sounds a lot like a use case for server-side CRDTs; 
>> data structures that allow the application to add values without retrieving 
>> the server-side content first, and siblings are resolved by Riak.
>> 
>> These will arrive with Riak 2.0; see 
>> https://gist.github.com/russelldb/f92f44bdfb619e089a4d for an overview.
>> 
>> -John
>> 
>> On Nov 12, 2013, at 7:13 AM, Olav Frengstad <o...@fwt.no> wrote:
>> 
>>> Do you consider forcing siblings a good idea? I would like to get some 
>>> input on possible use cases and pitfalls.
>>> For instance i have considered to force siblings and then merge them on 
>>> read instead of fetching an object every time i want to update it 
>>> (especially with larger objects).
>>> 
>>> It's not clear from the docs if there are any limitations, will the maximum 
>>> object size be the limitation:?
>>> 
>>> A section of the docs[1] comees comes to mind:
>>> 
>>> "Having an enormous object in your node can cause reads of that object to 
>>> crash the entire node. Other issues are increased cluster latency as the 
>>> object is replicated and out of memory errors."
>>> 
>>> [1] 
>>> http://docs.basho.com/riak/latest/theory/concepts/Vector-Clocks/#Siblings
>>> 
>>> 2013/11/9 Brian Roach <ro...@basho.com>
>>> On Fri, Nov 8, 2013 at 11:38 AM, Russell Brown <russell.br...@me.com> wrote:
>>> 
>>>> If you’re using a well behaved client like the Riak-Java-Client, or any 
>>>> other that gets a vclock before doing a put, use whatever option stops 
>>>> that.
>>> 
>>> for (int i = 0; i < numReplicasWanted; i++) {
>>>    bucket.store("key", "value").withoutFetch().execute();
>>> }
>>> 
>>> :)
>>> 
>>> - Roach
>>> 
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users@lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users@lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
>> 
>> _______________________________________________
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> _______________________________________________
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to