s/metadata/data/ - each sibling is a discrete copy of whatever data you've put in it + metadata.
In the case of the client side indexes, you're right - the bulk of the increased storage will be from metadata. --- Jeremiah Peschka - Founder, Brent Ozar Unlimited MCITP: SQL Server 2008, MVP Cloudera Certified Developer for Apache Hadoop On Wed, Nov 13, 2013 at 8:12 AM, Olav Frengstad <o...@fwt.no> wrote: > Thanks for the input. > > If i understand correctly the only size overhead would be in the extra > metadata added by all the siblings? > > > 2013/11/13 Hector Castro <hec...@basho.com> > >> The `put_index` snippet in the following blog post actually forces the >> creation of siblings (while `get_index` resolves them by doing a set >> union): >> >> http://basho.com/index-for-fun-and-for-profit/ >> >> As John said, you definitely want to be careful not to create too many >> siblings because that'll impact the overall Riak object size. >> >> -- >> Hector >> >> >> On Wed, Nov 13, 2013 at 5:25 AM, Russell Brown <russell.br...@me.com> >> wrote: >> > >> > On 13 Nov 2013, at 10:03, Carlos Baquero <c...@di.uminho.pt> wrote: >> > >> >> >> >> Its interesting to see a use case where a grow only set is sufficient. >> I believe Riak 2.0 will offer optimized OR-Sets that allow item removal at >> the expense of some extra complexity in element storage and logarithmic >> metadata growth per operation. But for your case a simple direct set of >> elements with server side merge by set union looks perfect. Its not >> efficient at all to keep all those siblings if a simple server side merge >> can reduce them. >> >> >> >> Maybe it is a good idea to not overlook the potential usefulness of >> simple grow only sets and add that datatype to the 2.0 server side CRDTs >> library. And maybe even 2P-Sets that only allow deleting once, might be >> useful for some cases. >> > >> > We plan to add more data types in future, I don’t think they’ll make >> them into 2.0. You can use an ORSet as a G-Set, though, just only ever add >> to it. The overhead is pretty small. >> > >> > the difficulty is exposing different “flavours” of CRDTs in a >> non-confusing way. We chose to go with the name “data type” and name the >> implementations generically (set, map, counter.) I wonder if we painted >> ourselves into a corner. >> > >> > Cheers >> > >> > Russell >> > >> >> >> >> Regards, >> >> Carlos >> >> >> >> ----- >> >> Carlos Baquero >> >> HASLab / INESC TEC & >> >> Universidade do Minho, >> >> Portugal >> >> >> >> c...@di.uminho.pt >> >> http://gsd.di.uminho.pt/cbm >> >> >> >> >> >> >> >> >> >> >> >> On 12/11/2013, at 22:10, Jason Campbell wrote: >> >> >> >>> I am currently forcing siblings for time series data. The maximum >> bucket sizes are very predictable due to the nature of the data. I >> originally used the get/update/set cycle, but as I approach the end of the >> interval, reading and writing 1MB+ objects at a high frequency kills >> network bandwidth. So now, I append siblings, and I have a cron that merges >> the previous siblings (a simple set union works for me, only entire objects >> are ever deleted). >> >>> >> >>> I can see how it can be dangerous to insert siblings, bit if you have >> some other method of knowing how much data is in one, I don't see size >> being an issue. I have also considered using a counter to know how large an >> object is without fetching it, which shouldn't be off by more than a few >> siblings unless there is a network partition. >> >>> >> >>> So aside from size issues, which can be roughly predicted or worked >> around, is there any reason to not create hundreds or thousands of siblings >> and resolve them later? I realise sets could work well for my use case, but >> they seem overkill for simple append operations when I don't need delete >> functionality. Creating your own CRDTs are trivial if you never need to >> delete. >> >>> >> >>> Thoughts are welcome, >> >>> Jason >> >>> From: John Daily >> >>> Sent: Wednesday, 13 November 2013 3:10 AM >> >>> To: Olav Frengstad >> >>> Cc: riak-users >> >>> Subject: Re: Forcing Siblings to Occur >> >>> >> >>> Forcing siblings other than for testing purposes is not typically a >> good idea; as you indicate, the object size can easily become a problem as >> all siblings will live inside the same Riak value. >> >>> >> >>> Your counter-example sounds a lot like a use case for server-side >> CRDTs; data structures that allow the application to add values without >> retrieving the server-side content first, and siblings are resolved by Riak. >> >>> >> >>> These will arrive with Riak 2.0; see >> https://gist.github.com/russelldb/f92f44bdfb619e089a4d for an overview. >> >>> >> >>> -John >> >>> >> >>> On Nov 12, 2013, at 7:13 AM, Olav Frengstad <o...@fwt.no> wrote: >> >>> >> >>>> Do you consider forcing siblings a good idea? I would like to get >> some input on possible use cases and pitfalls. >> >>>> For instance i have considered to force siblings and then merge them >> on read instead of fetching an object every time i want to update it >> (especially with larger objects). >> >>>> >> >>>> It's not clear from the docs if there are any limitations, will the >> maximum object size be the limitation:? >> >>>> >> >>>> A section of the docs[1] comees comes to mind: >> >>>> >> >>>> "Having an enormous object in your node can cause reads of that >> object to crash the entire node. Other issues are increased cluster latency >> as the object is replicated and out of memory errors." >> >>>> >> >>>> [1] >> http://docs.basho.com/riak/latest/theory/concepts/Vector-Clocks/#Siblings >> >>>> >> >>>> 2013/11/9 Brian Roach <ro...@basho.com> >> >>>> On Fri, Nov 8, 2013 at 11:38 AM, Russell Brown <russell.br...@me.com> >> wrote: >> >>>> >> >>>>> If you’re using a well behaved client like the Riak-Java-Client, or >> any other that gets a vclock before doing a put, use whatever option stops >> that. >> >>>> >> >>>> for (int i = 0; i < numReplicasWanted; i++) { >> >>>> bucket.store("key", "value").withoutFetch().execute(); >> >>>> } >> >>>> >> >>>> :) >> >>>> >> >>>> - Roach >> >>>> >> >>>> _______________________________________________ >> >>>> riak-users mailing list >> >>>> riak-users@lists.basho.com >> >>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >>>> _______________________________________________ >> >>>> riak-users mailing list >> >>>> riak-users@lists.basho.com >> >>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >>> >> >>> >> >>> _______________________________________________ >> >>> riak-users mailing list >> >>> riak-users@lists.basho.com >> >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >>> _______________________________________________ >> >>> riak-users mailing list >> >>> riak-users@lists.basho.com >> >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> >> >> _______________________________________________ >> >> riak-users mailing list >> >> riak-users@lists.basho.com >> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> > >> > >> > _______________________________________________ >> > riak-users mailing list >> > riak-users@lists.basho.com >> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> > > > > -- > Med Vennlig Hilsen > Olav Frengstad > > Systemutvikler // FWT > +47 920 42 090 > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com