Hi Anthony,

You mentioned you're planning on growing to around 100 nodes. I'm curious what 
ring_creation_size you used? Also, how much data capacity per node are you 
planning on?

I've been spending a lot of time lately working through what happens when a 
node joins. There are a couple of big issues you will want to look out for in 
addition to what you've discovered, which boil down to essentially:
After a node joins, the data on partitions which change nodes in the new ring 
will be unavailable until the handoffs are complete. Currently this comes back 
as a 404 that's indistinguishable from a "true" 404.
At certain points in the progression of ring states from 1 to 100 nodes, a LOT 
more partitions move around than you'd expect from a consistent hashing scheme.

#2 obviously exacerbates #1, and if -- like us -- you plan to have a lot of 
data in the cluster, having most of it move around after a node joins is 
unrealistic.

I'm still trying to work through exactly what's happening with #2, but it seems 
like once you have more nodes than target_n_val, when adding a new node you 
usually get the consistent hashing property you want: that the new node takes 
some partitions from each of the other nodes, and that's it. But every once in 
a while (and really, not all that rarely), shit hits the fan and it decides to 
re-balance and completely change the ring. >95% of partitions will move, in 
certain cases!

I have some erlang console code I've been using with riak_core to simulate our 
cluster, to get a deeper understanding of the rings at each phase. I might be 
able to clean that up and put it into a script to share.

-Greg 
On Saturday, May 21, 2011 at 9:31 AM, Anthony Molinaro wrote:
As I asked this question I thought I would pipe in with my experience (comments 
inline).
> 
> On May 20, 2011, at 3:17 PM, Mark Phillips <m...@basho.com> wrote:
> 
> > 4) Q --- Lets say I have several new nodes to add, is the recommended
> > procedure to add them one at a time and wait for all transfers to
> > finish, or can you actually add several?
> > 
> > A --- The current recommended procedure is to add one node at a time
> > and wait for the partition transfers to finish before proceeding to
> > the next node addition.
> 
> I found that adding them one at a time would have taken about 4 hours per 
> node and as I was doubling the size I felt there would be less shuffling of 
> data if I added all at once (as suggested by aphyr on IRC). This proved to be 
> exactly correct as I was able to add 4 new nodes in about 4 hours instead of 
> 16.
> 
> > Specifically:
> > 
> > * Use the "riak-admin join" command to kick off the cluster expansion
> > * Run "riak-admin transfers" periodically to keep an eye on the nodes
> > awaiting or passing off partitions (this may take a bit to complete);
> > an alternate (and less expensive) way to keep an eye on on this is to
> > just watch the logs.
> 
> Running "riak-admin transfers" hardly ever works I would say it times out 95% 
> of the time when attempting to add a new node. I don't know why this is and I 
> hope it is fixed someday but I would recommend never running it.
> 
> Unfortunately grepping logs is also tricky as you have to deal with lots of 
> false positives if you done something like I did where you had a bunch of 
> nodes crash then brought them up, only to realize you need to add capacity, 
> so you add nodes. But now the logs on the first nodes have messages for 
> transfers from the restart and the node addition.
> 
> > * When "riak-admin ringready" prints "TRUE ..." to let you know that
> > all nodes agree on the ring state, you're good to go.
> 
> This actually returned true before transfers were complete IIRC so I think 
> this may not quite be right.
> 
> > (It's worth nothing that making this process smoother and more fluid
> > is high on our list of priorities.)
> 
> Good to know I look forward to this as I expect to be increasing my cluster 
> up to close to 100 nodes by the end of this year.
> 
>  -Anthony
> 
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to