Hi Anthony, You mentioned you're planning on growing to around 100 nodes. I'm curious what ring_creation_size you used? Also, how much data capacity per node are you planning on?
I've been spending a lot of time lately working through what happens when a node joins. There are a couple of big issues you will want to look out for in addition to what you've discovered, which boil down to essentially: After a node joins, the data on partitions which change nodes in the new ring will be unavailable until the handoffs are complete. Currently this comes back as a 404 that's indistinguishable from a "true" 404. At certain points in the progression of ring states from 1 to 100 nodes, a LOT more partitions move around than you'd expect from a consistent hashing scheme. #2 obviously exacerbates #1, and if -- like us -- you plan to have a lot of data in the cluster, having most of it move around after a node joins is unrealistic. I'm still trying to work through exactly what's happening with #2, but it seems like once you have more nodes than target_n_val, when adding a new node you usually get the consistent hashing property you want: that the new node takes some partitions from each of the other nodes, and that's it. But every once in a while (and really, not all that rarely), shit hits the fan and it decides to re-balance and completely change the ring. >95% of partitions will move, in certain cases! I have some erlang console code I've been using with riak_core to simulate our cluster, to get a deeper understanding of the rings at each phase. I might be able to clean that up and put it into a script to share. -Greg On Saturday, May 21, 2011 at 9:31 AM, Anthony Molinaro wrote: As I asked this question I thought I would pipe in with my experience (comments inline). > > On May 20, 2011, at 3:17 PM, Mark Phillips <m...@basho.com> wrote: > > > 4) Q --- Lets say I have several new nodes to add, is the recommended > > procedure to add them one at a time and wait for all transfers to > > finish, or can you actually add several? > > > > A --- The current recommended procedure is to add one node at a time > > and wait for the partition transfers to finish before proceeding to > > the next node addition. > > I found that adding them one at a time would have taken about 4 hours per > node and as I was doubling the size I felt there would be less shuffling of > data if I added all at once (as suggested by aphyr on IRC). This proved to be > exactly correct as I was able to add 4 new nodes in about 4 hours instead of > 16. > > > Specifically: > > > > * Use the "riak-admin join" command to kick off the cluster expansion > > * Run "riak-admin transfers" periodically to keep an eye on the nodes > > awaiting or passing off partitions (this may take a bit to complete); > > an alternate (and less expensive) way to keep an eye on on this is to > > just watch the logs. > > Running "riak-admin transfers" hardly ever works I would say it times out 95% > of the time when attempting to add a new node. I don't know why this is and I > hope it is fixed someday but I would recommend never running it. > > Unfortunately grepping logs is also tricky as you have to deal with lots of > false positives if you done something like I did where you had a bunch of > nodes crash then brought them up, only to realize you need to add capacity, > so you add nodes. But now the logs on the first nodes have messages for > transfers from the restart and the node addition. > > > * When "riak-admin ringready" prints "TRUE ..." to let you know that > > all nodes agree on the ring state, you're good to go. > > This actually returned true before transfers were complete IIRC so I think > this may not quite be right. > > > (It's worth nothing that making this process smoother and more fluid > > is high on our list of priorities.) > > Good to know I look forward to this as I expect to be increasing my cluster > up to close to 100 nodes by the end of this year. > > -Anthony > > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com