On 13 Nov 2013, at 10:03, Carlos Baquero <c...@di.uminho.pt> wrote: > > Its interesting to see a use case where a grow only set is sufficient. I > believe Riak 2.0 will offer optimized OR-Sets that allow item removal at the > expense of some extra complexity in element storage and logarithmic metadata > growth per operation. But for your case a simple direct set of elements with > server side merge by set union looks perfect. Its not efficient at all to > keep all those siblings if a simple server side merge can reduce them. > > Maybe it is a good idea to not overlook the potential usefulness of simple > grow only sets and add that datatype to the 2.0 server side CRDTs library. > And maybe even 2P-Sets that only allow deleting once, might be useful for > some cases.
We plan to add more data types in future, I don’t think they’ll make them into 2.0. You can use an ORSet as a G-Set, though, just only ever add to it. The overhead is pretty small. the difficulty is exposing different “flavours” of CRDTs in a non-confusing way. We chose to go with the name “data type” and name the implementations generically (set, map, counter.) I wonder if we painted ourselves into a corner. Cheers Russell > > Regards, > Carlos > > ----- > Carlos Baquero > HASLab / INESC TEC & > Universidade do Minho, > Portugal > > c...@di.uminho.pt > http://gsd.di.uminho.pt/cbm > > > > > > On 12/11/2013, at 22:10, Jason Campbell wrote: > >> I am currently forcing siblings for time series data. The maximum bucket >> sizes are very predictable due to the nature of the data. I originally used >> the get/update/set cycle, but as I approach the end of the interval, reading >> and writing 1MB+ objects at a high frequency kills network bandwidth. So >> now, I append siblings, and I have a cron that merges the previous siblings >> (a simple set union works for me, only entire objects are ever deleted). >> >> I can see how it can be dangerous to insert siblings, bit if you have some >> other method of knowing how much data is in one, I don't see size being an >> issue. I have also considered using a counter to know how large an object is >> without fetching it, which shouldn't be off by more than a few siblings >> unless there is a network partition. >> >> So aside from size issues, which can be roughly predicted or worked around, >> is there any reason to not create hundreds or thousands of siblings and >> resolve them later? I realise sets could work well for my use case, but they >> seem overkill for simple append operations when I don't need delete >> functionality. Creating your own CRDTs are trivial if you never need to >> delete. >> >> Thoughts are welcome, >> Jason >> From: John Daily >> Sent: Wednesday, 13 November 2013 3:10 AM >> To: Olav Frengstad >> Cc: riak-users >> Subject: Re: Forcing Siblings to Occur >> >> Forcing siblings other than for testing purposes is not typically a good >> idea; as you indicate, the object size can easily become a problem as all >> siblings will live inside the same Riak value. >> >> Your counter-example sounds a lot like a use case for server-side CRDTs; >> data structures that allow the application to add values without retrieving >> the server-side content first, and siblings are resolved by Riak. >> >> These will arrive with Riak 2.0; see >> https://gist.github.com/russelldb/f92f44bdfb619e089a4d for an overview. >> >> -John >> >> On Nov 12, 2013, at 7:13 AM, Olav Frengstad <o...@fwt.no> wrote: >> >>> Do you consider forcing siblings a good idea? I would like to get some >>> input on possible use cases and pitfalls. >>> For instance i have considered to force siblings and then merge them on >>> read instead of fetching an object every time i want to update it >>> (especially with larger objects). >>> >>> It's not clear from the docs if there are any limitations, will the maximum >>> object size be the limitation:? >>> >>> A section of the docs[1] comees comes to mind: >>> >>> "Having an enormous object in your node can cause reads of that object to >>> crash the entire node. Other issues are increased cluster latency as the >>> object is replicated and out of memory errors." >>> >>> [1] >>> http://docs.basho.com/riak/latest/theory/concepts/Vector-Clocks/#Siblings >>> >>> 2013/11/9 Brian Roach <ro...@basho.com> >>> On Fri, Nov 8, 2013 at 11:38 AM, Russell Brown <russell.br...@me.com> wrote: >>> >>>> If you’re using a well behaved client like the Riak-Java-Client, or any >>>> other that gets a vclock before doing a put, use whatever option stops >>>> that. >>> >>> for (int i = 0; i < numReplicasWanted; i++) { >>> bucket.store("key", "value").withoutFetch().execute(); >>> } >>> >>> :) >>> >>> - Roach >>> >>> _______________________________________________ >>> riak-users mailing list >>> riak-users@lists.basho.com >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>> _______________________________________________ >>> riak-users mailing list >>> riak-users@lists.basho.com >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com