Re: Forcing Siblings to Occur

Jeremiah Peschka Wed, 13 Nov 2013 09:13:22 -0800

s/metadata/data/ - each sibling is a discrete copy of whatever data you've
put in it + metadata.


In the case of the client side indexes, you're right - the bulk of the
increased storage will be from metadata.

---
Jeremiah Peschka - Founder, Brent Ozar Unlimited
MCITP: SQL Server 2008, MVP
Cloudera Certified Developer for Apache Hadoop


On Wed, Nov 13, 2013 at 8:12 AM, Olav Frengstad <o...@fwt.no> wrote:

> Thanks for the input.
>
> If i understand correctly the only size overhead would be in the extra
> metadata added by all the siblings?
>
>
> 2013/11/13 Hector Castro <hec...@basho.com>
>
>> The `put_index` snippet in the following blog post actually forces the
>> creation of siblings (while `get_index` resolves them by doing a set
>> union):
>>
>> http://basho.com/index-for-fun-and-for-profit/
>>
>> As John said, you definitely want to be careful not to create too many
>> siblings because that'll impact the overall Riak object size.
>>
>> --
>> Hector
>>
>>
>> On Wed, Nov 13, 2013 at 5:25 AM, Russell Brown <russell.br...@me.com>
>> wrote:
>> >
>> > On 13 Nov 2013, at 10:03, Carlos Baquero <c...@di.uminho.pt> wrote:
>> >
>> >>
>> >> Its interesting to see a use case where a grow only set is sufficient.
>> I believe Riak 2.0 will offer optimized OR-Sets that allow item removal at
>> the expense of some extra complexity in element storage and logarithmic
>> metadata growth per operation. But for your case a simple direct set of
>> elements with server side merge by set union looks perfect. Its not
>> efficient at all to keep all those siblings if a simple server side merge
>> can reduce them.
>> >>
>> >> Maybe it is a good idea to not overlook the potential usefulness of
>> simple grow only sets and add that datatype to the 2.0 server side CRDTs
>> library. And maybe even 2P-Sets that only allow deleting once, might be
>> useful for some cases.
>> >
>> > We plan to add more data types in future, I don’t think they’ll make
>> them into 2.0. You can use an ORSet as a G-Set, though, just only ever add
>> to it. The overhead is pretty small.
>> >
>> > the difficulty is exposing different “flavours” of CRDTs in a
>> non-confusing way. We chose to go with the name “data type” and name the
>> implementations generically (set, map, counter.) I wonder if we painted
>> ourselves into a corner.
>> >
>> > Cheers
>> >
>> > Russell
>> >
>> >>
>> >> Regards,
>> >> Carlos
>> >>
>> >> -----
>> >> Carlos Baquero
>> >> HASLab / INESC TEC &
>> >> Universidade do Minho,
>> >> Portugal
>> >>
>> >> c...@di.uminho.pt
>> >> http://gsd.di.uminho.pt/cbm
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On 12/11/2013, at 22:10, Jason Campbell wrote:
>> >>
>> >>> I am currently forcing siblings for time series data. The maximum
>> bucket sizes are very predictable due to the nature of the data. I
>> originally used the get/update/set cycle, but as I approach the end of the
>> interval, reading and writing 1MB+ objects at a high frequency kills
>> network bandwidth. So now, I append siblings, and I have a cron that merges
>> the previous siblings (a simple set union works for me, only entire objects
>> are ever deleted).
>> >>>
>> >>> I can see how it can be dangerous to insert siblings, bit if you have
>> some other method of knowing how much data is in one, I don't see size
>> being an issue. I have also considered using a counter to know how large an
>> object is without fetching it, which shouldn't be off by more than a few
>> siblings unless there is a network partition.
>> >>>
>> >>> So aside from size issues, which can be roughly predicted or worked
>> around, is there any reason to not create hundreds or thousands of siblings
>> and resolve them later? I realise sets could work well for my use case, but
>> they seem overkill for simple append operations when I don't need delete
>> functionality. Creating your own CRDTs are trivial if you never need to
>> delete.
>> >>>
>> >>> Thoughts are welcome,
>> >>> Jason
>> >>> From: John Daily
>> >>> Sent: Wednesday, 13 November 2013 3:10 AM
>> >>> To: Olav Frengstad
>> >>> Cc: riak-users
>> >>> Subject: Re: Forcing Siblings to Occur
>> >>>
>> >>> Forcing siblings other than for testing purposes is not typically a
>> good idea; as you indicate, the object size can easily become a problem as
>> all siblings will live inside the same Riak value.
>> >>>
>> >>> Your counter-example sounds a lot like a use case for server-side
>> CRDTs; data structures that allow the application to add values without
>> retrieving the server-side content first, and siblings are resolved by Riak.
>> >>>
>> >>> These will arrive with Riak 2.0; see
>> https://gist.github.com/russelldb/f92f44bdfb619e089a4d for an overview.
>> >>>
>> >>> -John
>> >>>
>> >>> On Nov 12, 2013, at 7:13 AM, Olav Frengstad <o...@fwt.no> wrote:
>> >>>
>> >>>> Do you consider forcing siblings a good idea? I would like to get
>> some input on possible use cases and pitfalls.
>> >>>> For instance i have considered to force siblings and then merge them
>> on read instead of fetching an object every time i want to update it
>> (especially with larger objects).
>> >>>>
>> >>>> It's not clear from the docs if there are any limitations, will the
>> maximum object size be the limitation:?
>> >>>>
>> >>>> A section of the docs[1] comees comes to mind:
>> >>>>
>> >>>> "Having an enormous object in your node can cause reads of that
>> object to crash the entire node. Other issues are increased cluster latency
>> as the object is replicated and out of memory errors."
>> >>>>
>> >>>> [1]
>> http://docs.basho.com/riak/latest/theory/concepts/Vector-Clocks/#Siblings
>> >>>>
>> >>>> 2013/11/9 Brian Roach <ro...@basho.com>
>> >>>> On Fri, Nov 8, 2013 at 11:38 AM, Russell Brown <russell.br...@me.com>
>> wrote:
>> >>>>
>> >>>>> If you’re using a well behaved client like the Riak-Java-Client, or
>> any other that gets a vclock before doing a put, use whatever option stops
>> that.
>> >>>>
>> >>>> for (int i = 0; i < numReplicasWanted; i++) {
>> >>>>    bucket.store("key", "value").withoutFetch().execute();
>> >>>> }
>> >>>>
>> >>>> :)
>> >>>>
>> >>>> - Roach
>> >>>>
>> >>>> _______________________________________________
>> >>>> riak-users mailing list
>> >>>> riak-users@lists.basho.com
>> >>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> >>>> _______________________________________________
>> >>>> riak-users mailing list
>> >>>> riak-users@lists.basho.com
>> >>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> riak-users mailing list
>> >>> riak-users@lists.basho.com
>> >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> >>> _______________________________________________
>> >>> riak-users mailing list
>> >>> riak-users@lists.basho.com
>> >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> >>
>> >> _______________________________________________
>> >> riak-users mailing list
>> >> riak-users@lists.basho.com
>> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> >
>> >
>> > _______________________________________________
>> > riak-users mailing list
>> > riak-users@lists.basho.com
>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>
>
>
> --
> Med Vennlig Hilsen
> Olav Frengstad
>
> Systemutvikler // FWT
> +47 920 42 090
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Forcing Siblings to Occur

Reply via email to