riak 2 - how much space is needed for online resizing?
Hi, We have 5 riak nodes running riak-2.0.0pre20-1.el6.x86_64 with a ringsize of 64. We would like to do a ring resize because the distribution of content is very uneven (64/5 has a left over of 4 parts that all end up on the same node). The documentation says riak-2 can do this online (http://docs.basho.com/riak/2.0.0/ops/advanced/ring-resizing/) and warns 'Make sure that you have sufficient storage to complete the resize operation'. Could anyone tell me how much is 'sufficient'? And in addition, some of the nodes in the cluster have more free space available than other nodes (some are at 40% used disk, others at 60%). Is the location of the space important? Thank you, Max ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
RE: riak 2 - how much space is needed for online resizing?
Hi Jordan, Thank you for your response. With a ringsize of 64 on 5 nodes we have a remainder of 4 parts on one node. These parts are 1/64th each, so about 6,25% of the total size ending up on 1 node in addition to the equally spread out content. When we would move to 128 parts this would become a remainder of 3, but since the ring is 2x as large, each part is only 1/128 in size. So about 2,3% of the total size will end up on 1 node in addition to the equally spread out content. For 256 it becomes 0,4%. To me that looks like a very big improvement. You said it wouldn’t make much difference, am I making a mistake in my reasoning? I did a google on claim_v3 and it looks like it could help, so I’m going to try it. Rebuilding the cluster is not really an option at this time. Would changing the claim method improve the situation for an existing cluster also? I would think that because of the automatic rebalancing, the vnodes would move due to the different claiming mechanism, am I right? Yes, now that 2.0.0 has been shipped we are looking to upgrade before making any changes. Thanks, Max From: Jordan West [mailto:jw...@basho.com] Sent: dinsdag 2 september 2014 23:17 To: Max Vernimmen Cc: riak-users@lists.basho.com Subject: Re: riak 2 - how much space is needed for online resizing? Hi Max, A ring resize won't make things much better. It is intended to change the number of partitions from 64, in your case, to 32 or 256, for example. While these rings sizes may have better distributions with 5 nodes they will not be perfect. The quickest solution using the existing cluster and settings would be to add 3 nodes (for a total of 8) or remove one (for a total of 4) -- we don't suggest the latter, you can read more about why in [1], but decide based on your application's needs. There are a few other options but they are more complicated. Somewhat related, since you are using a pre-build, is this development/test data? Do you have the option of re-building the cluster? If you would like to stick with 5 nodes and can re-build the cluster from scratch, another alternative is to try "claim_v3" (the default is v2). See wants_claim_fun and choose_claim_fun in [2]. You'll want to set these to wants_claim_v3 and choose_claim_v3, repsectively, in the riak_core section of your advanced.config. It may result in a better, albeit not perfect, balance. To answer your original question about capacity, a conservative rule is, below 50% capacity on every node. I would also suggest upgrading to a more recent build. Jordan [1] http://basho.com/why-your-riak-cluster-should-have-at-least-five-nodes/ [2] http://docs.basho.com/riak/1.4.10/ops/advanced/configs/configuration-files/ On Tue, Sep 2, 2014 at 5:21 AM, Max Vernimmen mailto:m.vernim...@comparegroup.eu>> wrote: Hi, We have 5 riak nodes running riak-2.0.0pre20-1.el6.x86_64 with a ringsize of 64. We would like to do a ring resize because the distribution of content is very uneven (64/5 has a left over of 4 parts that all end up on the same node). The documentation says riak-2 can do this online (http://docs.basho.com/riak/2.0.0/ops/advanced/ring-resizing/) and warns 'Make sure that you have sufficient storage to complete the resize operation'. Could anyone tell me how much is ‘sufficient’? And in addition, some of the nodes in the cluster have more free space available than other nodes (some are at 40% used disk, others at 60%). Is the location of the space important? Thank you, Max ___ riak-users mailing list riak-users@lists.basho.com<mailto:riak-users@lists.basho.com> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
replacing node results in error with diag
Hi, Today I finished upgrading 2.0.0-pre20 to 2.0.0-1. Once that was done I did a node replace according to the instructions at http://docs.basho.com/riak/latest/ops/running/nodes/replacing/ Once the replacing was done, our monitoring notified us about a problem with the cluster. Our monitoring does a 'riak-admin diag' and each of the nodes is now giving the output I've posted here: https://gist.github.com/anonymous/a313a07b0cd1da1c There is a node being referenced in the diag, which is the replaced node. It is no longer in the cluster. I confirmed the ring was settled and in the web interface of the cluster the replaced node is no longer listed neither is it in the `riak-admin status` output. Only a restart of the riak service on each of the nodes resolves the problem. Doing a restart on only one node fixes the diag status only for that node. To me it seems like there is some state left in the cluster nodes after a node is replaced, causing the `riak-admin diag` command to fail. Has anyone else seen this? Would this classify as a bug or did I simply do something wrong ? :) Best regards, Max Vernimmen ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
RE: replacing node results in error with diag
Hi Sargun, The debug output can be found here: https://gist.github.com/anonymous/7e82fa3a62595fbd2cc7 Indeed your suggested command resolves the problem nicely, that saves me a lot of restarting. Thank for your help! Best regards, Max Vernimmen > -Original Message- > From: Sargun Dhillon [mailto:sar...@sargun.me] > Sent: dinsdag 30 september 2014 21:57 > To: Max Vernimmen > Cc: riak-users@lists.basho.com > Subject: Re: replacing node results in error with diag > > So, I don't have a ton of experience with Riaknostic, but taking a > casual glance at the source code, it appears that Riaknostic caches > some node-local data about the ring (see: > https://github.com/basho/riaknostic/blob/2.0.0/src/riaknostic_node.erl#L192 > -L208). > You should be able to unset this by attaching to a node "riak attach" > and running application:unset_env(riaknostic, local_stats). -- > although, it'd be nice to get a dump of your local env first for > debugging purposes, you can get that via io:format("Local env: ~p~n", > [application:get_all_env(riaknostic)]). (including the period). > > If that clears one node, you can do it on all of your nodes by issuing > rpc:multicall(application, unset_env, [riaknostic, local_stats]). on > one node. > > On Tue, Sep 30, 2014 at 12:39 PM, Max Vernimmen > wrote: > > Hi, > > > > > > > > Today I finished upgrading 2.0.0-pre20 to 2.0.0-1. Once that was done I did > > a node replace according to the instructions at > > http://docs.basho.com/riak/latest/ops/running/nodes/replacing/ > > > > Once the replacing was done, our monitoring notified us about a problem > with > > the cluster. Our monitoring does a ‘riak-admin diag’ and each of the nodes > > is now giving the output I’ve posted here: > > https://gist.github.com/anonymous/a313a07b0cd1da1c > > > > There is a node being referenced in the diag, which is the replaced node. It > > is no longer in the cluster. I confirmed the ring was settled and in the web > > interface of the cluster the replaced node is no longer listed neither is it > > in the `riak-admin status` output. Only a restart of the riak service on > > each of the nodes resolves the problem. Doing a restart on only one node > > fixes the diag status only for that node. > > > > > > > > To me it seems like there is some state left in the cluster nodes after a > > node is replaced, causing the `riak-admin diag` command to fail. Has anyone > > else seen this? Would this classify as a bug or did I simply do something > > wrong ? J > > > > > > > > Best regards, > > > > > > > > > > > > Max Vernimmen > > > > > > > > > > ___ > > riak-users mailing list > > riak-users@lists.basho.com > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
increasing N value
Hi, I’d like to increase our cluster’s n value from 2 to n=3. The documentation says this is ‘not recommended’ but it doesn’t say why and the functionality is there so… ☺ Once I change the setting for a specific bucket There seem to be 2 ways of making sure existing objects in the bucket get a 3rd copy: - Force a read repair - Wait for the Active Anti-Entropy to resolve all missing replicas Some questions about this: - Is there a way to know that AAE is done with a bucket and all content is now stored with n=3? - If I do a read with r=1 (default?), is there a chance that a node will respond with ‘content not found’ and will it be left at that, or will riak continue searching for the object on a different node? - Will it automatically do a repair when a ‘not found’ is triggered? I guess what I’m trying to find out is…. What ways are there to make sure all content has achieved 3 replica’s after changing to n=3? Best regards, Max ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com