Re: Forcing Siblings to Occur
We have introduced these so that users who are accidentally creating siblings (and large objects) can be notified in their logs. 1) Of course you can choose to change the limits 2) Please Please only do so if you know what you're doing. There's a certain amount of "you're on your own", because we'll set the defaults for these values to what we think is sensible for Riak. Of course, if you think your magical system will work fine, it's entirely up to you. We hope that our Data Types will provide enough useful types such that they cover the vast majority of data models. Riak 2.0 will get counters, sets, and maps, but if you have suggestions for generalised data structures you think would be useful to other Riak users, do send them our way and we'll see what we can do. Sam -- Sam Elliott Engineer sam.elli...@basho.com -- On Wednesday, 13 November 2013 at 12:43AM, Olav Frengstad wrote: > Forgot the link! > > [1] > https://github.com/basho/riak_kv/commit/6981450c5ffc18207b3a1dc057fd3840a0906c42 > > > 2013/11/13 Olav Frengstad mailto:o...@fwt.no)> > > @John, I'm definitely looking forward to CRDT's but at the same time i'm > > looking into alternative approaches for achieving the same thing. > > > > @Jason, your description is close to what i had in mind. Only real > > difference is merge would be on read. I did some testing and m/r seems to > > work by using an initial map phase calling `riak_object:get_values` > > > > > > There's also the addition of maximum number of siblings in riak-2.0[1] > > > > > > > > > > 2013/11/13 John Daily mailto:jda...@basho.com)> > > > Jason, I don’t see any inherent problems, given reasonable management of > > > the situation as you describe. I’d have to chase the code path to see > > > what overhead you’re introducing to Riak’s processing, but if it’s > > > working well for you, then who am I to object? > > > > > > Perhaps someone who’s more familiar with the sibling management code > > > could chime in. > > > > > > -John > > > > > > > > > On Nov 12, 2013, at 5:10 PM, Jason Campbell > > (mailto:xia...@xiaclo.net)> wrote: > > > > I am currently forcing siblings for time series data. The maximum > > > > bucket sizes are very predictable due to the nature of the data. I > > > > originally used the get/update/set cycle, but as I approach the end of > > > > the interval, reading and writing 1MB+ objects at a high frequency > > > > kills network bandwidth. So now, I append siblings, and I have a cron > > > > that merges the previous siblings (a simple set union works for me, > > > > only entire objects are ever deleted). > > > > > > > > I can see how it can be dangerous to insert siblings, bit if you have > > > > some other method of knowing how much data is in one, I don't see size > > > > being an issue. I have also considered using a counter to know how > > > > large an object is without fetching it, which shouldn't be off by more > > > > than a few siblings unless there is a network partition. > > > > > > > > So aside from size issues, which can be roughly predicted or worked > > > > around, is there any reason to not create hundreds or thousands of > > > > siblings and resolve them later? I realise sets could work well for my > > > > use case, but they seem overkill for simple append operations when I > > > > don't need delete functionality. Creating your own CRDTs are trivial if > > > > you never need to delete. > > > > > > > > Thoughts are welcome, > > > > Jason > > > > > > > > From: John Daily > > > > Sent: Wednesday, 13 November 2013 3:10 AM > > > > To: Olav Frengstad > > > > Cc: riak-users > > > > Subject: Re: Forcing Siblings to Occur > > > > > > > > > > > > > > > > > > > > > > > > > > > > Forcing siblings other than for testing purposes is not typically a > > > > good idea; as you indicate, the object size can easily become a problem > > > > as all siblings will live inside the same Riak value. > > > > > > > > Your counter-example sounds a lot like a use case for server-side > > > > CRDTs; data structures that allow the application to add values without > > > > retrieving the server-side content first, and siblings are resolved by > > > > Riak. > > > > > > > > These will arrive with Riak 2.0; see > > > > https://gist.github.com/russelldb/f92f44bdfb619e089a4d for an overview. > > > > > > > > -John > > > > On Nov 12, 2013, at 7:13 AM, Olav Frengstad > > > (mailto:o...@fwt.no)> wrote: > > > > > Do you consider forcing siblings a good idea? I would like to get > > > > > some input on possible use cases and pitfalls. > > > > > For instance i have considered to force siblings and then merge them > > > > > on read instead of fetching an object every time i want to update it > > > > > (especially with larger objects). > > > > > > > > > > It's not clear from the docs if there are any limitations, will the > > > > > maximum object size be the limitation:? > > > >
Re: Forcing Siblings to Occur
Its interesting to see a use case where a grow only set is sufficient. I believe Riak 2.0 will offer optimized OR-Sets that allow item removal at the expense of some extra complexity in element storage and logarithmic metadata growth per operation. But for your case a simple direct set of elements with server side merge by set union looks perfect. Its not efficient at all to keep all those siblings if a simple server side merge can reduce them. Maybe it is a good idea to not overlook the potential usefulness of simple grow only sets and add that datatype to the 2.0 server side CRDTs library. And maybe even 2P-Sets that only allow deleting once, might be useful for some cases. Regards, Carlos - Carlos Baquero HASLab / INESC TEC & Universidade do Minho, Portugal c...@di.uminho.pt http://gsd.di.uminho.pt/cbm On 12/11/2013, at 22:10, Jason Campbell wrote: > I am currently forcing siblings for time series data. The maximum bucket > sizes are very predictable due to the nature of the data. I originally used > the get/update/set cycle, but as I approach the end of the interval, reading > and writing 1MB+ objects at a high frequency kills network bandwidth. So now, > I append siblings, and I have a cron that merges the previous siblings (a > simple set union works for me, only entire objects are ever deleted). > > I can see how it can be dangerous to insert siblings, bit if you have some > other method of knowing how much data is in one, I don't see size being an > issue. I have also considered using a counter to know how large an object is > without fetching it, which shouldn't be off by more than a few siblings > unless there is a network partition. > > So aside from size issues, which can be roughly predicted or worked around, > is there any reason to not create hundreds or thousands of siblings and > resolve them later? I realise sets could work well for my use case, but they > seem overkill for simple append operations when I don't need delete > functionality. Creating your own CRDTs are trivial if you never need to > delete. > > Thoughts are welcome, > Jason > From: John Daily > Sent: Wednesday, 13 November 2013 3:10 AM > To: Olav Frengstad > Cc: riak-users > Subject: Re: Forcing Siblings to Occur > > Forcing siblings other than for testing purposes is not typically a good > idea; as you indicate, the object size can easily become a problem as all > siblings will live inside the same Riak value. > > Your counter-example sounds a lot like a use case for server-side CRDTs; data > structures that allow the application to add values without retrieving the > server-side content first, and siblings are resolved by Riak. > > These will arrive with Riak 2.0; see > https://gist.github.com/russelldb/f92f44bdfb619e089a4d for an overview. > > -John > > On Nov 12, 2013, at 7:13 AM, Olav Frengstad wrote: > >> Do you consider forcing siblings a good idea? I would like to get some input >> on possible use cases and pitfalls. >> For instance i have considered to force siblings and then merge them on read >> instead of fetching an object every time i want to update it (especially >> with larger objects). >> >> It's not clear from the docs if there are any limitations, will the maximum >> object size be the limitation:? >> >> A section of the docs[1] comees comes to mind: >> >> "Having an enormous object in your node can cause reads of that object to >> crash the entire node. Other issues are increased cluster latency as the >> object is replicated and out of memory errors." >> >> [1] http://docs.basho.com/riak/latest/theory/concepts/Vector-Clocks/#Siblings >> >> 2013/11/9 Brian Roach >> On Fri, Nov 8, 2013 at 11:38 AM, Russell Brown wrote: >> >> > If you’re using a well behaved client like the Riak-Java-Client, or any >> > other that gets a vclock before doing a put, use whatever option stops >> > that. >> >> for (int i = 0; i < numReplicasWanted; i++) { >> bucket.store("key", "value").withoutFetch().execute(); >> } >> >> :) >> >> - Roach >> >> ___ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> ___ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com smime.p7s Description: S/MIME cryptographic signature ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.bash
Re: Forcing Siblings to Occur
On 13 Nov 2013, at 10:03, Carlos Baquero wrote: > > Its interesting to see a use case where a grow only set is sufficient. I > believe Riak 2.0 will offer optimized OR-Sets that allow item removal at the > expense of some extra complexity in element storage and logarithmic metadata > growth per operation. But for your case a simple direct set of elements with > server side merge by set union looks perfect. Its not efficient at all to > keep all those siblings if a simple server side merge can reduce them. > > Maybe it is a good idea to not overlook the potential usefulness of simple > grow only sets and add that datatype to the 2.0 server side CRDTs library. > And maybe even 2P-Sets that only allow deleting once, might be useful for > some cases. We plan to add more data types in future, I don’t think they’ll make them into 2.0. You can use an ORSet as a G-Set, though, just only ever add to it. The overhead is pretty small. the difficulty is exposing different “flavours” of CRDTs in a non-confusing way. We chose to go with the name “data type” and name the implementations generically (set, map, counter.) I wonder if we painted ourselves into a corner. Cheers Russell > > Regards, > Carlos > > - > Carlos Baquero > HASLab / INESC TEC & > Universidade do Minho, > Portugal > > c...@di.uminho.pt > http://gsd.di.uminho.pt/cbm > > > > > > On 12/11/2013, at 22:10, Jason Campbell wrote: > >> I am currently forcing siblings for time series data. The maximum bucket >> sizes are very predictable due to the nature of the data. I originally used >> the get/update/set cycle, but as I approach the end of the interval, reading >> and writing 1MB+ objects at a high frequency kills network bandwidth. So >> now, I append siblings, and I have a cron that merges the previous siblings >> (a simple set union works for me, only entire objects are ever deleted). >> >> I can see how it can be dangerous to insert siblings, bit if you have some >> other method of knowing how much data is in one, I don't see size being an >> issue. I have also considered using a counter to know how large an object is >> without fetching it, which shouldn't be off by more than a few siblings >> unless there is a network partition. >> >> So aside from size issues, which can be roughly predicted or worked around, >> is there any reason to not create hundreds or thousands of siblings and >> resolve them later? I realise sets could work well for my use case, but they >> seem overkill for simple append operations when I don't need delete >> functionality. Creating your own CRDTs are trivial if you never need to >> delete. >> >> Thoughts are welcome, >> Jason >> From: John Daily >> Sent: Wednesday, 13 November 2013 3:10 AM >> To: Olav Frengstad >> Cc: riak-users >> Subject: Re: Forcing Siblings to Occur >> >> Forcing siblings other than for testing purposes is not typically a good >> idea; as you indicate, the object size can easily become a problem as all >> siblings will live inside the same Riak value. >> >> Your counter-example sounds a lot like a use case for server-side CRDTs; >> data structures that allow the application to add values without retrieving >> the server-side content first, and siblings are resolved by Riak. >> >> These will arrive with Riak 2.0; see >> https://gist.github.com/russelldb/f92f44bdfb619e089a4d for an overview. >> >> -John >> >> On Nov 12, 2013, at 7:13 AM, Olav Frengstad wrote: >> >>> Do you consider forcing siblings a good idea? I would like to get some >>> input on possible use cases and pitfalls. >>> For instance i have considered to force siblings and then merge them on >>> read instead of fetching an object every time i want to update it >>> (especially with larger objects). >>> >>> It's not clear from the docs if there are any limitations, will the maximum >>> object size be the limitation:? >>> >>> A section of the docs[1] comees comes to mind: >>> >>> "Having an enormous object in your node can cause reads of that object to >>> crash the entire node. Other issues are increased cluster latency as the >>> object is replicated and out of memory errors." >>> >>> [1] >>> http://docs.basho.com/riak/latest/theory/concepts/Vector-Clocks/#Siblings >>> >>> 2013/11/9 Brian Roach >>> On Fri, Nov 8, 2013 at 11:38 AM, Russell Brown wrote: >>> If you’re using a well behaved client like the Riak-Java-Client, or any other that gets a vclock before doing a put, use whatever option stops that. >>> >>> for (int i = 0; i < numReplicasWanted; i++) { >>>bucket.store("key", "value").withoutFetch().execute(); >>> } >>> >>> :) >>> >>> - Roach >>> >>> ___ >>> riak-users mailing list >>> riak-users@lists.basho.com >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>> ___ >>> riak-users mailing list >>> riak-users@lists.basho.com >>> htt
Re: Forcing Siblings to Occur
The `put_index` snippet in the following blog post actually forces the creation of siblings (while `get_index` resolves them by doing a set union): http://basho.com/index-for-fun-and-for-profit/ As John said, you definitely want to be careful not to create too many siblings because that'll impact the overall Riak object size. -- Hector On Wed, Nov 13, 2013 at 5:25 AM, Russell Brown wrote: > > On 13 Nov 2013, at 10:03, Carlos Baquero wrote: > >> >> Its interesting to see a use case where a grow only set is sufficient. I >> believe Riak 2.0 will offer optimized OR-Sets that allow item removal at the >> expense of some extra complexity in element storage and logarithmic metadata >> growth per operation. But for your case a simple direct set of elements with >> server side merge by set union looks perfect. Its not efficient at all to >> keep all those siblings if a simple server side merge can reduce them. >> >> Maybe it is a good idea to not overlook the potential usefulness of simple >> grow only sets and add that datatype to the 2.0 server side CRDTs library. >> And maybe even 2P-Sets that only allow deleting once, might be useful for >> some cases. > > We plan to add more data types in future, I don’t think they’ll make them > into 2.0. You can use an ORSet as a G-Set, though, just only ever add to it. > The overhead is pretty small. > > the difficulty is exposing different “flavours” of CRDTs in a non-confusing > way. We chose to go with the name “data type” and name the implementations > generically (set, map, counter.) I wonder if we painted ourselves into a > corner. > > Cheers > > Russell > >> >> Regards, >> Carlos >> >> - >> Carlos Baquero >> HASLab / INESC TEC & >> Universidade do Minho, >> Portugal >> >> c...@di.uminho.pt >> http://gsd.di.uminho.pt/cbm >> >> >> >> >> >> On 12/11/2013, at 22:10, Jason Campbell wrote: >> >>> I am currently forcing siblings for time series data. The maximum bucket >>> sizes are very predictable due to the nature of the data. I originally used >>> the get/update/set cycle, but as I approach the end of the interval, >>> reading and writing 1MB+ objects at a high frequency kills network >>> bandwidth. So now, I append siblings, and I have a cron that merges the >>> previous siblings (a simple set union works for me, only entire objects are >>> ever deleted). >>> >>> I can see how it can be dangerous to insert siblings, bit if you have some >>> other method of knowing how much data is in one, I don't see size being an >>> issue. I have also considered using a counter to know how large an object >>> is without fetching it, which shouldn't be off by more than a few siblings >>> unless there is a network partition. >>> >>> So aside from size issues, which can be roughly predicted or worked around, >>> is there any reason to not create hundreds or thousands of siblings and >>> resolve them later? I realise sets could work well for my use case, but >>> they seem overkill for simple append operations when I don't need delete >>> functionality. Creating your own CRDTs are trivial if you never need to >>> delete. >>> >>> Thoughts are welcome, >>> Jason >>> From: John Daily >>> Sent: Wednesday, 13 November 2013 3:10 AM >>> To: Olav Frengstad >>> Cc: riak-users >>> Subject: Re: Forcing Siblings to Occur >>> >>> Forcing siblings other than for testing purposes is not typically a good >>> idea; as you indicate, the object size can easily become a problem as all >>> siblings will live inside the same Riak value. >>> >>> Your counter-example sounds a lot like a use case for server-side CRDTs; >>> data structures that allow the application to add values without retrieving >>> the server-side content first, and siblings are resolved by Riak. >>> >>> These will arrive with Riak 2.0; see >>> https://gist.github.com/russelldb/f92f44bdfb619e089a4d for an overview. >>> >>> -John >>> >>> On Nov 12, 2013, at 7:13 AM, Olav Frengstad wrote: >>> Do you consider forcing siblings a good idea? I would like to get some input on possible use cases and pitfalls. For instance i have considered to force siblings and then merge them on read instead of fetching an object every time i want to update it (especially with larger objects). It's not clear from the docs if there are any limitations, will the maximum object size be the limitation:? A section of the docs[1] comees comes to mind: "Having an enormous object in your node can cause reads of that object to crash the entire node. Other issues are increased cluster latency as the object is replicated and out of memory errors." [1] http://docs.basho.com/riak/latest/theory/concepts/Vector-Clocks/#Siblings 2013/11/9 Brian Roach On Fri, Nov 8, 2013 at 11:38 AM, Russell Brown wrote: > If you’re using a well behaved client like the Riak-Java-Client, or any > other that gets a vclock before d
Write to object(s) by index
Hey, I'm trying to come up with an solution to writing child objects where the parent will be reference by some secondary index. The data model contains a list of devices, each with a list of messages beneath them. Currently I have a hierarchical scheme where devices are stored under (devices, ) and messages under (messages/, ). This works fine fine and i can write incoming messages without retrieving the device first. Now I want to add support for replacing devices, that means multiple devices can have the same address. I was thinking to add as a secondary index to the device object, but then i need to do a index query to get the real device id. I intend to keep a secondary index containing the device address, and activation status. The question then, can i somehow reference that index and rewrite the bucket when storing the object? There are a few problems that comes to mind: 1) There might be multiple active devices, this is fine multiple writes can be issued 2) Riak would have to know how to rewrite the bucket name, which should be client logic. One way to solve this issue would be to keep the list of -> mappings in the connection handler. This could work all changes to devices can be propagated to the all the API endpoints, there won't be an issue with API endpoints running out of memory either as it would only amount to 100's of MB's for 10k+ connections. Another approach would be to to have pre-commit hook rewriting the the bucket name, that would require a fetch (possibly from other nodes in the system) so it does not really change anything. Is there any other approaches i'm missing that could be better? for instance it seems logical - at least to me - that one could alias an object by using links; but i'm not aware of such a method. Cheers, Olav -- Med Vennlig Hilsen Olav Frengstad Systemutvikler // FWT +47 920 42 090 ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Forcing Siblings to Occur
Thanks for the input. If i understand correctly the only size overhead would be in the extra metadata added by all the siblings? 2013/11/13 Hector Castro > The `put_index` snippet in the following blog post actually forces the > creation of siblings (while `get_index` resolves them by doing a set > union): > > http://basho.com/index-for-fun-and-for-profit/ > > As John said, you definitely want to be careful not to create too many > siblings because that'll impact the overall Riak object size. > > -- > Hector > > > On Wed, Nov 13, 2013 at 5:25 AM, Russell Brown > wrote: > > > > On 13 Nov 2013, at 10:03, Carlos Baquero wrote: > > > >> > >> Its interesting to see a use case where a grow only set is sufficient. > I believe Riak 2.0 will offer optimized OR-Sets that allow item removal at > the expense of some extra complexity in element storage and logarithmic > metadata growth per operation. But for your case a simple direct set of > elements with server side merge by set union looks perfect. Its not > efficient at all to keep all those siblings if a simple server side merge > can reduce them. > >> > >> Maybe it is a good idea to not overlook the potential usefulness of > simple grow only sets and add that datatype to the 2.0 server side CRDTs > library. And maybe even 2P-Sets that only allow deleting once, might be > useful for some cases. > > > > We plan to add more data types in future, I don’t think they’ll make > them into 2.0. You can use an ORSet as a G-Set, though, just only ever add > to it. The overhead is pretty small. > > > > the difficulty is exposing different “flavours” of CRDTs in a > non-confusing way. We chose to go with the name “data type” and name the > implementations generically (set, map, counter.) I wonder if we painted > ourselves into a corner. > > > > Cheers > > > > Russell > > > >> > >> Regards, > >> Carlos > >> > >> - > >> Carlos Baquero > >> HASLab / INESC TEC & > >> Universidade do Minho, > >> Portugal > >> > >> c...@di.uminho.pt > >> http://gsd.di.uminho.pt/cbm > >> > >> > >> > >> > >> > >> On 12/11/2013, at 22:10, Jason Campbell wrote: > >> > >>> I am currently forcing siblings for time series data. The maximum > bucket sizes are very predictable due to the nature of the data. I > originally used the get/update/set cycle, but as I approach the end of the > interval, reading and writing 1MB+ objects at a high frequency kills > network bandwidth. So now, I append siblings, and I have a cron that merges > the previous siblings (a simple set union works for me, only entire objects > are ever deleted). > >>> > >>> I can see how it can be dangerous to insert siblings, bit if you have > some other method of knowing how much data is in one, I don't see size > being an issue. I have also considered using a counter to know how large an > object is without fetching it, which shouldn't be off by more than a few > siblings unless there is a network partition. > >>> > >>> So aside from size issues, which can be roughly predicted or worked > around, is there any reason to not create hundreds or thousands of siblings > and resolve them later? I realise sets could work well for my use case, but > they seem overkill for simple append operations when I don't need delete > functionality. Creating your own CRDTs are trivial if you never need to > delete. > >>> > >>> Thoughts are welcome, > >>> Jason > >>> From: John Daily > >>> Sent: Wednesday, 13 November 2013 3:10 AM > >>> To: Olav Frengstad > >>> Cc: riak-users > >>> Subject: Re: Forcing Siblings to Occur > >>> > >>> Forcing siblings other than for testing purposes is not typically a > good idea; as you indicate, the object size can easily become a problem as > all siblings will live inside the same Riak value. > >>> > >>> Your counter-example sounds a lot like a use case for server-side > CRDTs; data structures that allow the application to add values without > retrieving the server-side content first, and siblings are resolved by Riak. > >>> > >>> These will arrive with Riak 2.0; see > https://gist.github.com/russelldb/f92f44bdfb619e089a4d for an overview. > >>> > >>> -John > >>> > >>> On Nov 12, 2013, at 7:13 AM, Olav Frengstad wrote: > >>> > Do you consider forcing siblings a good idea? I would like to get > some input on possible use cases and pitfalls. > For instance i have considered to force siblings and then merge them > on read instead of fetching an object every time i want to update it > (especially with larger objects). > > It's not clear from the docs if there are any limitations, will the > maximum object size be the limitation:? > > A section of the docs[1] comees comes to mind: > > "Having an enormous object in your node can cause reads of that > object to crash the entire node. Other issues are increased cluster latency > as the object is replicated and out of memory errors." > > [1] > http://docs.basho.com/riak/latest/theory/concepts/Vector-Clocks/#Siblings >
Insecurities with documentations?
Hi, I just came across the instructions to install Riak again and I think there's some insecurities with the instructions. On this page[1], there is a line that suggests we should do this: curl http://apt.basho.com/gpg/basho.apt.key | sudo apt-key add - This is not https and should be. Additionally, an https version of apt.basho.com does not seem to be available. [1]: http://docs.basho.com/riak/latest/ops/building/installing/debian-ubuntu/ Cheers, Shuhao ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Forcing Siblings to Occur
s/metadata/data/ - each sibling is a discrete copy of whatever data you've put in it + metadata. In the case of the client side indexes, you're right - the bulk of the increased storage will be from metadata. --- Jeremiah Peschka - Founder, Brent Ozar Unlimited MCITP: SQL Server 2008, MVP Cloudera Certified Developer for Apache Hadoop On Wed, Nov 13, 2013 at 8:12 AM, Olav Frengstad wrote: > Thanks for the input. > > If i understand correctly the only size overhead would be in the extra > metadata added by all the siblings? > > > 2013/11/13 Hector Castro > >> The `put_index` snippet in the following blog post actually forces the >> creation of siblings (while `get_index` resolves them by doing a set >> union): >> >> http://basho.com/index-for-fun-and-for-profit/ >> >> As John said, you definitely want to be careful not to create too many >> siblings because that'll impact the overall Riak object size. >> >> -- >> Hector >> >> >> On Wed, Nov 13, 2013 at 5:25 AM, Russell Brown >> wrote: >> > >> > On 13 Nov 2013, at 10:03, Carlos Baquero wrote: >> > >> >> >> >> Its interesting to see a use case where a grow only set is sufficient. >> I believe Riak 2.0 will offer optimized OR-Sets that allow item removal at >> the expense of some extra complexity in element storage and logarithmic >> metadata growth per operation. But for your case a simple direct set of >> elements with server side merge by set union looks perfect. Its not >> efficient at all to keep all those siblings if a simple server side merge >> can reduce them. >> >> >> >> Maybe it is a good idea to not overlook the potential usefulness of >> simple grow only sets and add that datatype to the 2.0 server side CRDTs >> library. And maybe even 2P-Sets that only allow deleting once, might be >> useful for some cases. >> > >> > We plan to add more data types in future, I don’t think they’ll make >> them into 2.0. You can use an ORSet as a G-Set, though, just only ever add >> to it. The overhead is pretty small. >> > >> > the difficulty is exposing different “flavours” of CRDTs in a >> non-confusing way. We chose to go with the name “data type” and name the >> implementations generically (set, map, counter.) I wonder if we painted >> ourselves into a corner. >> > >> > Cheers >> > >> > Russell >> > >> >> >> >> Regards, >> >> Carlos >> >> >> >> - >> >> Carlos Baquero >> >> HASLab / INESC TEC & >> >> Universidade do Minho, >> >> Portugal >> >> >> >> c...@di.uminho.pt >> >> http://gsd.di.uminho.pt/cbm >> >> >> >> >> >> >> >> >> >> >> >> On 12/11/2013, at 22:10, Jason Campbell wrote: >> >> >> >>> I am currently forcing siblings for time series data. The maximum >> bucket sizes are very predictable due to the nature of the data. I >> originally used the get/update/set cycle, but as I approach the end of the >> interval, reading and writing 1MB+ objects at a high frequency kills >> network bandwidth. So now, I append siblings, and I have a cron that merges >> the previous siblings (a simple set union works for me, only entire objects >> are ever deleted). >> >>> >> >>> I can see how it can be dangerous to insert siblings, bit if you have >> some other method of knowing how much data is in one, I don't see size >> being an issue. I have also considered using a counter to know how large an >> object is without fetching it, which shouldn't be off by more than a few >> siblings unless there is a network partition. >> >>> >> >>> So aside from size issues, which can be roughly predicted or worked >> around, is there any reason to not create hundreds or thousands of siblings >> and resolve them later? I realise sets could work well for my use case, but >> they seem overkill for simple append operations when I don't need delete >> functionality. Creating your own CRDTs are trivial if you never need to >> delete. >> >>> >> >>> Thoughts are welcome, >> >>> Jason >> >>> From: John Daily >> >>> Sent: Wednesday, 13 November 2013 3:10 AM >> >>> To: Olav Frengstad >> >>> Cc: riak-users >> >>> Subject: Re: Forcing Siblings to Occur >> >>> >> >>> Forcing siblings other than for testing purposes is not typically a >> good idea; as you indicate, the object size can easily become a problem as >> all siblings will live inside the same Riak value. >> >>> >> >>> Your counter-example sounds a lot like a use case for server-side >> CRDTs; data structures that allow the application to add values without >> retrieving the server-side content first, and siblings are resolved by Riak. >> >>> >> >>> These will arrive with Riak 2.0; see >> https://gist.github.com/russelldb/f92f44bdfb619e089a4d for an overview. >> >>> >> >>> -John >> >>> >> >>> On Nov 12, 2013, at 7:13 AM, Olav Frengstad wrote: >> >>> >> Do you consider forcing siblings a good idea? I would like to get >> some input on possible use cases and pitfalls. >> For instance i have considered to force siblings and then merge them >> on read instead of fetching an object every time i want to update it >> (especiall
Re: Moving the config and log dirs
Hi Jeff, You can modify the location of the data and log files by editing the appropriate entries in app.config Changing the location of app.config and vm.args would involve modifying the lib/env.sh file to change the value of RUNNER_ETC_DIR. As you said, this may introduce other issues since it's untested and unsupported, but there's no harm in trying it out to see for yourself. -- Luke Bakken CSE lbak...@basho.com On Tue, Nov 12, 2013 at 11:38 PM, Jeff Peck wrote: > I would like to start up riak with it's config, data, and log files in a > specific location that is not the default. I know that this can be > accomplished via symlinks, but I would like to do this by passing run-time > params or setting environment variables. > > Are there any specific changes that I could make to lib/env.sh that would > accomplish this without introducing other issues? Or is there any other way > to do this? > > Thanks, > Jeff > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Riak CS results and proxying
Hey guys, I'm working on a dev environment for a riak-cs setup. 2 vms and an external proxy Config of the riak/riak-cs nodes appears to be all complete. I'm encountering two issues I'd like some pointers on where to begin diagnosing before I go around stracing everything. Firstly: When using s3cmd to query riak-cs, I'm receiving differing results on the same commands in succession. Here are the results when going through a proxy: (07:06:09) [andrew/desktop] ~ $ s3cmd -c .s3cfg-riak ls s3://lol 2013-11-13 22:20 84513 s3://lol/kitty.jpg (07:06:10) [andrew/desktop] ~ $ s3cmd -c .s3cfg-riak ls s3://lol 2013-11-13 22:20 84513 s3://lol/kitty.jpg (07:06:11) [andrew/desktop] ~ $ s3cmd -c .s3cfg-riak ls s3://lol 2013-11-13 14:59 93107 s3://lol/_1 2013-11-13 22:20 84513 s3://lol/kitty.jpg (07:06:12) [andrew/desktop] ~ $ s3cmd -c .s3cfg-riak ls s3://lol DIR s3://lol/kitties/ 2013-11-13 14:59 93107 s3://lol/_1 (07:06:13) [andrew/desktop] ~ $ s3cmd -c .s3cfg-riak ls s3://lol 2013-11-13 22:20 84513 s3://lol/kitty.jpg (07:06:14) [andrew/desktop] ~ $ s3cmd -c .s3cfg-riak ls s3://lol DIR s3://lol/kitties/ 2013-11-13 14:59 93107 s3://lol/_1 And here they are querying one of the nodes directly: (07:05:59) [andrew/desktop] ~ $ s3cmd ls s3://lol DIR s3://lol/kitties/ 2013-11-13 14:59 93107 s3://lol/_1 (07:06:00) [andrew/desktop] ~ $ s3cmd ls s3://lol 2013-11-13 14:59 93107 s3://lol/_1 2013-11-13 22:20 84513 s3://lol/kitty.jpg (07:06:01) [andrew/desktop] ~ $ s3cmd ls s3://lol 2013-11-13 22:20 84513 s3://lol/kitty.jpg (07:06:02) [andrew/desktop] ~ $ s3cmd ls s3://lol 2013-11-13 22:20 84513 s3://lol/kitty.jpg (07:06:02) [andrew/desktop] ~ $ s3cmd ls s3://lol DIR s3://lol/kitties/ 2013-11-13 14:59 93107 s3://lol/_1 (07:06:03) [andrew/desktop] ~ $ s3cmd ls s3://lol 2013-11-13 22:20 84513 s3://lol/kitty.jpg (07:06:04) [andrew/desktop] ~ $ s3cmd -c .s3cfg-riak ls s3://lol The same results happen regardless of which node I query directly, within 1-2 seconds of executing the command a repeat execution of it returns different results. (They are the same repetitive results, just missing objects on some of the returns) The other issue I'm encountering is with put's. If I put directly to the node, I see something like: (07:09:09) [andrew/desktop] ~ $ s3cmd put Downloads/CentOS-6.4-x86_64-minimal.iso s3://big Downloads/CentOS-6.4-x86_64-minimal.iso -> s3://big/CentOS-6.4-x86_64-minimal.iso [part 1 of 23, 15MB] 15728640 of 15728640 100% in5s 2.62 MB/s done Downloads/CentOS-6.4-x86_64-minimal.iso -> s3://big/CentOS-6.4-x86_64-minimal.iso [part 2 of 23, 15MB] 15728640 of 15728640 100% in5s 2.86 MB/s done (... Truncated some of the values for brevity ...) Downloads/CentOS-6.4-x86_64-minimal.iso -> s3://big/CentOS-6.4-x86_64-minimal.iso [part 22 of 23, 15MB] 15728640 of 15728640 100% in1s12.06 MB/s done Downloads/CentOS-6.4-x86_64-minimal.iso -> s3://big/CentOS-6.4-x86_64-minimal.iso [part 23 of 23, 12MB] 12929024 of 12929024 100% in1s11.70 MB/s done Which is ideally what should occur. However, when I go through the proxy: It starts great for the first chunk, but hangs: Start: Downloads/CentOS-6.4-x86_64-minimal.iso -> s3://big/cent6.minimal.iso [part 19 of 23, 15MB] 8675328 of 15728640 55% in 1s 8.26 MB/s Finish: Downloads/CentOS-6.4-x86_64-minimal.iso -> s3://big/cent6.minimal.iso [part 19 of 23, 15MB] 15728640 of 15728640 100% in 22s 683.57 kB/s done It immediately jumps to 55% (the % varies) and then pauses, sometimes up to 30 seconds and then jumps to [done]. I assume this is in my nginx configuration somewhere, I thought it was a proxy buffer issue, I've since raised those limits and also tried disabling proxy_buffering entirely to no difference. server { listen 80; server_name cs.domain.com *.cs.domain.com; location / { proxy_pass http://riak-cs; proxy_set_header Host $host; proxy_connect_timeout 59s; proxy_send_timeout 600; proxy_read_timeout 600; #proxy_buffering off; proxy_buffers 16 32k; proxy_buffer_size64k; #return 403; } } (The two nodes are identical in versions) (07:34:47) [riak] ~ $ cat /etc/redhat-release CentOS release 6.4 (Final) (07:45:53) [riak] ~ $ uname -a Linux riak.tyne.io 2.6.32-358.23.2.el6.x86_64 #1 SMP Wed Oct 16 18:37:12 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux (07:46:09) [riak] ~ $ riak version 1.4.2 (07:46:13) [riak] ~ $ riak-cs version 1.4.2 (07:46:25) [riak] ~ $ rpm -qa | grep riak riak-cs-1.4.2-1.el6.x86_64 riak-1.4.2-1.el6.x86_64 All recommended sysctl and ulimit values have been set as described in the docs. I look forward to any assistance with further tracking this d
Re: Riak CS results and proxying
Andy, To try to get a better idea of what might be going on it would be helpful to see what your riak and riak cs app.config files look like. Also the output of riak-admin ring-status and riak-admin member-status could be useful. For the upload issue I am curious if you have changed the port that riak cs is listening on? The default is 8080 and I don't see from your nginx config where you are sending requests to that port. Kelly On November 13, 2013 at 6:49:59 PM, Andrew Tynefield (atynefi...@gmail.com) wrote: Hey guys, I'm working on a dev environment for a riak-cs setup. 2 vms and an external proxy Config of the riak/riak-cs nodes appears to be all complete. I'm encountering two issues I'd like some pointers on where to begin diagnosing before I go around stracing everything. Firstly: When using s3cmd to query riak-cs, I'm receiving differing results on the same commands in succession. Here are the results when going through a proxy: (07:06:09) [andrew/desktop] ~ $ s3cmd -c .s3cfg-riak ls s3://lol 2013-11-13 22:20 84513 s3://lol/kitty.jpg (07:06:10) [andrew/desktop] ~ $ s3cmd -c .s3cfg-riak ls s3://lol 2013-11-13 22:20 84513 s3://lol/kitty.jpg (07:06:11) [andrew/desktop] ~ $ s3cmd -c .s3cfg-riak ls s3://lol 2013-11-13 14:59 93107 s3://lol/_1 2013-11-13 22:20 84513 s3://lol/kitty.jpg (07:06:12) [andrew/desktop] ~ $ s3cmd -c .s3cfg-riak ls s3://lol DIR s3://lol/kitties/ 2013-11-13 14:59 93107 s3://lol/_1 (07:06:13) [andrew/desktop] ~ $ s3cmd -c .s3cfg-riak ls s3://lol 2013-11-13 22:20 84513 s3://lol/kitty.jpg (07:06:14) [andrew/desktop] ~ $ s3cmd -c .s3cfg-riak ls s3://lol DIR s3://lol/kitties/ 2013-11-13 14:59 93107 s3://lol/_1 And here they are querying one of the nodes directly: (07:05:59) [andrew/desktop] ~ $ s3cmd ls s3://lol DIR s3://lol/kitties/ 2013-11-13 14:59 93107 s3://lol/_1 (07:06:00) [andrew/desktop] ~ $ s3cmd ls s3://lol 2013-11-13 14:59 93107 s3://lol/_1 2013-11-13 22:20 84513 s3://lol/kitty.jpg (07:06:01) [andrew/desktop] ~ $ s3cmd ls s3://lol 2013-11-13 22:20 84513 s3://lol/kitty.jpg (07:06:02) [andrew/desktop] ~ $ s3cmd ls s3://lol 2013-11-13 22:20 84513 s3://lol/kitty.jpg (07:06:02) [andrew/desktop] ~ $ s3cmd ls s3://lol DIR s3://lol/kitties/ 2013-11-13 14:59 93107 s3://lol/_1 (07:06:03) [andrew/desktop] ~ $ s3cmd ls s3://lol 2013-11-13 22:20 84513 s3://lol/kitty.jpg (07:06:04) [andrew/desktop] ~ $ s3cmd -c .s3cfg-riak ls s3://lol The same results happen regardless of which node I query directly, within 1-2 seconds of executing the command a repeat execution of it returns different results. (They are the same repetitive results, just missing objects on some of the returns) The other issue I'm encountering is with put's. If I put directly to the node, I see something like: (07:09:09) [andrew/desktop] ~ $ s3cmd put Downloads/CentOS-6.4-x86_64-minimal.iso s3://big Downloads/CentOS-6.4-x86_64-minimal.iso -> s3://big/CentOS-6.4-x86_64-minimal.iso [part 1 of 23, 15MB] 15728640 of 15728640 100% in 5s 2.62 MB/s done Downloads/CentOS-6.4-x86_64-minimal.iso -> s3://big/CentOS-6.4-x86_64-minimal.iso [part 2 of 23, 15MB] 15728640 of 15728640 100% in 5s 2.86 MB/s done (... Truncated some of the values for brevity ...) Downloads/CentOS-6.4-x86_64-minimal.iso -> s3://big/CentOS-6.4-x86_64-minimal.iso [part 22 of 23, 15MB] 15728640 of 15728640 100% in 1s 12.06 MB/s done Downloads/CentOS-6.4-x86_64-minimal.iso -> s3://big/CentOS-6.4-x86_64-minimal.iso [part 23 of 23, 12MB] 12929024 of 12929024 100% in 1s 11.70 MB/s done Which is ideally what should occur. However, when I go through the proxy: It starts great for the first chunk, but hangs: Start: Downloads/CentOS-6.4-x86_64-minimal.iso -> s3://big/cent6.minimal.iso [part 19 of 23, 15MB] 8675328 of 15728640 55% in 1s 8.26 MB/s Finish: Downloads/CentOS-6.4-x86_64-minimal.iso -> s3://big/cent6.minimal.iso [part 19 of 23, 15MB] 15728640 of 15728640 100% in 22s 683.57 kB/s done It immediately jumps to 55% (the % varies) and then pauses, sometimes up to 30 seconds and then jumps to [done]. I assume this is in my nginx configuration somewhere, I thought it was a proxy buffer issue, I've since raised those limits and also tried disabling proxy_buffering entirely to no difference. server { listen 80; server_name cs.domain.com *.cs.domain.com; location / { proxy_pass http://riak-cs; proxy_set_header Host $host; proxy_connect_timeout 59s; proxy_send_timeout 600; proxy_read_timeout 600; #proxy_buffering off; proxy_buffers 16 32k; proxy_buffer_size 64k; #return 403; } } (The two nodes are identical in versi
Re: Riak CS results and proxying
I appreciate the help Kelly! ( And sorry for the double mail you're going to get, accidentally didn't reply to all. ) I've provided the requested information below. app.config: http://pastebin.centos.org/5716/ app.config for riak-cs is managed by puppet, installing the same file on both nodes. riak-admin data: (10:58:46) [riak] ~ $ riak-admin ring-status == Claimant === Claimant: 'r...@riak.tyne.io' Status: up Ring Ready: true == Ownership Handoff == No pending changes. == Unreachable Nodes == All nodes are up and reachable (10:58:54) [riak] ~ $ riak-admin member-status = Membership == Status RingPendingNode --- valid 50.0% -- 'r...@riak.tyne.io' valid 50.0% -- 'r...@riak1.tyne.io' --- Valid:2 / Leaving:0 / Exiting:0 / Joining:0 / Down:0 network data: (11:05:51) [riak] ~ $ ip a | grep /24 inet 192.168.122.90/24 brd 192.168.122.255 scope global eth0 inet 192.168.1.19/24 brd 192.168.1.255 scope global eth1 (10:59:43) [riak] ~ $ netstat -tunap | grep :8080 tcp0 0 0.0.0.0:80800.0.0.0:* LISTEN 29437/beam.smp (11:03:20) [riak] ~ $ ps auxf | grep 29437 root 1507 0.0 0.0 103236 820 pts/4S+ 23:03 0:00 \_ grep 29437 riakcs 29437 0.9 1.8 768480 35364 pts/3Ssl+ 17:23 3:08 \_ /usr/lib64/riak-cs/erts-5.9.1/bin/beam.smp -K true -A 64 -W w -- -root /usr/lib64/riak-cs -progname riak-cs -- -home /var/lib/riak-cs-control -- -boot /usr/lib64/riak-cs/releases/1.4.2/riak-cs -config /etc/riak-cs/app.config -pa /usr/lib64/riak-cs/lib/basho-patches -name riak...@riak.domain.com -setcookie [redacted] -- console nginx config: [added the proxy_pass_header to ensure I was reaching riak-cs] upstream riak-cs { server 192.168.1.19:8080; } server { listen 80; server_name cs.domain.com *.cs.domain.com; location / { proxy_pass http://riak-cs; proxy_set_header Host $host; proxy_connect_timeout 59s; proxy_send_timeout 600; proxy_read_timeout 600; #proxy_buffering off; proxy_buffers 16 32k; proxy_buffer_size64k; proxy_pass_header Server; #return 403; } } (11:10:36) [andrew/desktop] ~ $ curl -I cs.domain.com/buckets HTTP/1.1 404 Object Not Found Date: Thu, 14 Nov 2013 05:10:38 GMT Content-Type: application/xml Connection: keep-alive Server: Riak CS Content-Length: 185 Please let me know if there's anything else I can provide, I'm more than willing to do so. Also, it may be worthy to note that domain.com in this case is an actual registered and resolving domain that has been sed'd out, cause archive. Thank you so much, Andrew On Wed, Nov 13, 2013 at 10:57 PM, Kelly McLaughlin wrote: > Andy, > > To try to get a better idea of what might be going on it would be helpful > to see what your riak and riak cs app.config files look like. Also the > output of riak-admin ring-status and riak-admin member-status could be > useful. For the upload issue I am curious if you have changed the port that > riak cs is listening on? The default is 8080 and I don't see from your > nginx config where you are sending requests to that port. > > Kelly > > On November 13, 2013 at 6:49:59 PM, Andrew Tynefield > (atynefi...@gmail.com) > wrote: > > Hey guys, > > I'm working on a dev environment for a riak-cs setup. > > 2 vms and an external proxy > > Config of the riak/riak-cs nodes appears to be all complete. I'm > encountering two issues I'd like some pointers on where to begin diagnosing > before I go around stracing everything. > > Firstly: > When using s3cmd to query riak-cs, I'm receiving differing results on the > same commands in succession. Here are the results when going through a > proxy: > > (07:06:09) [andrew/desktop] ~ $ s3cmd -c .s3cfg-riak ls s3://lol > 2013-11-13 22:20 84513 s3://lol/kitty.jpg > (07:06:10) [andrew/desktop] ~ $ s3cmd -c .s3cfg-riak ls s3://lol > 2013-11-13 22:20 84513 s3://lol/kitty.jpg > (07:06:11) [andrew/desktop] ~ $ s3cmd -c .s3cfg-riak ls s3://lol > 2013-11-13 14:59 93107 s3://lol/_1 > 2013-11-13 22:20 84513 s3://lol/kitty.jpg > (07:06:12) [andrew/desktop] ~ $ s3cmd -c .s3cfg-riak ls s3://lol >DIR s3://lol/kitties/ > 2013-11-13 14:59 93107 s3://lol/_1 > (07:06:13) [andrew/desktop] ~ $ s3cmd -c .s3cfg-riak ls s3://lol > 2013-11-13 22:20 84513 s3://lol/kitty.jpg > (07:06:14) [andrew/desktop] ~ $ s3cmd -c .s3cfg-riak ls s3://lol >